Hi,
I am executing a piece of code in CLA. I found that the execution time is different in CLA and CPU. The code took around 20uS in CPU while it took around 200uS in CLA. Is this expected behavior. Please let me know on this.
Regards
Sundar
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Sundara,
That does not look right. You should get almost same performance from CLA and CPU unless there is some kind of handshake between CPU and CLA and CLA is waiting for some ack from CPU. From number, it look like incase of CLA, PLL is not locked and running in bypass mode. Can you check that.
Regards,
Vivek Singh
Sundara Narasimhan Kalattur said:Part Number: TMS320F28379D
Hi,
I am executing a piece of code in CLA. I found that the execution time is different in CLA and CPU. The code took around 20uS in CPU while it took around 200uS in CLA. Is this expected behavior. Please let me know on this.
Regards
Sundar
Edit: fixed typo
Sundar,
That is quite a large difference. I would first ask some questions:
Additional resources you may find interesting:
Click here for more CLA FAQs and resources.
Regards
Lori
Hi Lori,
Thanks for your quick response. Below are my answers to your questions.
1. The current CLA code contains mostly floating operations.
2. No pointers are used in the code.
3. Some branching code is used like if-elseif (10-20% of my code)
4. No TMU is made use off
5. A gpio pin is toggled to measure the performance. Example as shown below:
GPIO_Writepin(8,1)
Control_Algorithm()
GPIO_Writepin(8,0)
A high precision oscilloscope is used to measure the toggling time.
I just want to bring one more thing to your notice. For experimental purpose, I executed the following piece of code on CPU and CLA one at a time. I found the measuring times as below:
on CLA: Exection time = 1400uS
__interrupt void Cla1Task1(void)
{
int delay = 10000;
GPIO_Writepin(8,1)
while(delay-- > 0)
GPIO_Writepin(8,0)
}
CPU: Execution time = 1100uS
main ()
{
int delay = 10000;
initGPIO();
GPIO_Writepin(8,1)
while(delay-- > 0)
GPIO_Writepin(8,0)
}
is this time difference expected?
Regards
Sundar
Sundara Narasimhan Kalattur said:CPU: Execution time = 1100uS
main ()
{
int delay = 10000;
initGPIO();
GPIO_Writepin(8,1)
while(delay-- > 0)
GPIO_Writepin(8,0)
}
Sundar,
What frequency is the device running at? At 200 MHz that seems like a long time. For the C28x I would put a rough estimate of 8-10 CPU cycles per loop + overhead. For the sake of an estimate ~ 10,000 loops x 10 cycles/loop x 5ns = ~ 500us + overhead
Looking at the disassembly (view->disassembly) will give you more insight into what is actually being executed in both cases. The CLA's branch for the loop will likely have 3 NOPs (basically wasted cycles) before the branch. Because the loop isn't doing anything the compiler has little option to make use of these cycles. It may also have to work a little harder to subtract 1 from the delay.
For your own control code, I suggest trying some of the compiler optimization settings -02, or -03. Right click on the .cla source file, select properties, C2000 compiler -> Optimization. This will apply the optimization to only the .cla source.
Regards
Lori
Sundara Narasimhan Kalattur said:One more doubt I have is a modulos operator (%) allowed in CLA?
It is allowed. The CLA instruction set doesn't have a specific instruction to enable % so it takes more cycles than the C28x.
Regards
Lori
Hi Lori,
I observed that the lines which contain the % operator are consuming a lot of time in my CLA code. I have the below lines in my code
x = y%1000;
z = a%1000;
These 2 lines are consuming maximum time in my code say about 95% in CLA. My execution time including these 2 lines of code is about while 232uS, but when I comment these 2 lines of code, the execution time is 20uS. However, the execution time is pretty low in CPU which is 16uS for the complete code exection.
Regards
Sundar
Sundar,
Thank you for the feedback. I am glad to hear that you have narrowed down where the issue is. The C28x has an instruction which enables the modulus operation (SUBCU). This is an architecture tradeoff for the CLA.
I did some searching and found a colleague had posted a unsigned modulus in CLA assembly here. I don't know whether it worked for the customer or if it will work in your case.
Best Regards
Lori