This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28379D: CLA Execution speed

Part Number:

Hi,

I am executing a piece of code in CLA. I found that the execution time is different in CLA and CPU. The code took around 20uS in CPU while it took around 200uS in CLA. Is this expected behavior. Please let me know on this.

Regards

Sundar

  • Sundara,

    That does not look right.  You should get almost same performance from CLA and CPU unless there is some kind of handshake between CPU and CLA and CLA is waiting for some ack from CPU. From number, it look like incase of CLA, PLL is not locked and running in bypass mode. Can you check that.

    Regards,

    Vivek Singh

  • Sundara Narasimhan Kalattur said:

    Part Number: TMS320F28379D

    Hi,

    I am executing a piece of code in CLA. I found that the execution time is different in CLA and CPU. The code took around 20uS in CPU while it took around 200uS in CLA. Is this expected behavior. Please let me know on this.

    Regards

    Sundar

    Edit: fixed typo 

    Sundar,

    That is quite a large difference.  I would first ask some questions:

    • Is the code largely floating-point calculations or fixed-point?  The CLA is very good at floating-point but not so great at non floating-point (fixed point math, string manipulations, general code, moving things around etc).  The CLA doesn't have a integer multiply. The C28x includes instructions which make it better at performing fixed-point math and other general code. 
    • Is the code heavy on pointers? If the code requires a lot of pointers then the CLA may become overburdened.  The C28x has more pointers available. 
    • Is there a lot of branching? The CLA is great for straight line computation, not so good for branching code (if-elseif-else, switch, ()?():()).
    • CLA does not have a repeat block (RPTB) if you have a loop this will impact performance from C28x to CLA.
    • Is the code on the C28x making use of the TMU ?  The TMU enables very fast floating-point sin, cos, and other floating-point computations on the C28x. The CLA does not have equivalent TMU instructions.
    • How are you measuring the performance of the CLA code?  

    Additional resources you may find interesting:

    Click here for more CLA FAQs and resources.

    Regards

    Lori

  • Hi Lori,

    Thanks for your quick response. Below are my answers to your questions.

    1. The current CLA code contains mostly floating operations.

    2. No pointers are used in the code.

    3. Some branching code is used like if-elseif (10-20% of my code)

    4. No TMU is made use off

    5. A gpio pin is toggled to measure the performance. Example as shown below:

       GPIO_Writepin(8,1)

           Control_Algorithm() 

      GPIO_Writepin(8,0)

      A high precision oscilloscope is used to measure the toggling time. 

    I just want to bring one more thing to your notice. For experimental purpose, I executed the following piece of code on CPU and CLA one at a time. I found the measuring times as below:

    on CLA: Exection time = 1400uS

    __interrupt void Cla1Task1(void)

    {

        int delay = 10000;

    GPIO_Writepin(8,1)

          while(delay-- > 0) 

      GPIO_Writepin(8,0)

    }

    CPU: Execution time = 1100uS

    main ()

    {

        int delay = 10000;

        initGPIO();

       

       GPIO_Writepin(8,1)

            while(delay-- > 0) 

       GPIO_Writepin(8,0)

    }

    is this time difference expected?

    Regards

    Sundar

  • Hi Lori,

    One more doubt I have is a modulos operator (%) allowed in CLA?

    Regards
    Sundar
  • Sundara Narasimhan Kalattur said:

    CPU: Execution time = 1100uS

    main ()

    {

        int delay = 10000;

        initGPIO();

       

       GPIO_Writepin(8,1)

            while(delay-- > 0) 

       GPIO_Writepin(8,0)

    }

    Sundar,

    What frequency is the device running at?  At 200 MHz that seems like a long time.   For the C28x I would put a rough estimate of 8-10 CPU cycles per loop + overhead.  For the sake of an estimate  ~ 10,000 loops x 10 cycles/loop x 5ns = ~ 500us + overhead 

    Looking at the disassembly (view->disassembly) will give you more insight into what is actually being executed in both cases.  The CLA's branch for the loop will likely have 3 NOPs (basically wasted cycles) before the branch.  Because the loop isn't doing anything the compiler has little option to make use of these cycles.  It may also have to work a little harder to subtract 1 from the delay.

    For your own control code, I suggest trying some of the compiler optimization settings -02, or -03.  Right click on the .cla source file, select properties, C2000 compiler -> Optimization.  This will apply the optimization to only the .cla source.  

    Regards

    Lori

  • Sundara Narasimhan Kalattur said:
    One more doubt I have is a modulos operator (%) allowed in CLA?

    It is allowed.  The CLA instruction set doesn't have a specific instruction to enable % so it takes more cycles than the C28x. 

    Regards

    Lori

  • Hi Lori,

    I observed that the lines which contain the % operator are consuming a lot of time in my CLA code. I have the below lines in my code

    x = y%1000;

    z = a%1000;

    These 2 lines are consuming maximum time in my code say about 95% in CLA. My execution time including these 2 lines of code is about while 232uS, but when I comment these 2 lines of code, the execution time is 20uS. However, the execution time is pretty low in CPU which is 16uS for the complete code exection.

    Regards

    Sundar

  • Sundar,

    Thank you for the feedback.  I am glad to hear that you have narrowed down where the issue is.  The C28x has an instruction which enables the modulus operation (SUBCU).  This is an architecture tradeoff for the CLA. 

    I did some searching and found a colleague had posted a unsigned modulus in CLA assembly here.  I don't know whether it worked for the customer or if it will work in your case.  

    Best Regards

    Lori