Hi,
I am having a custom algorithm plugin in rtos, dsp core. But the algorithm is taking more time. On debugging it is observed that 4 "for" loops are running sequentially causing this error. How can i optimise the code running in DSP?
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi,
I am having a custom algorithm plugin in rtos, dsp core. But the algorithm is taking more time. On debugging it is observed that 4 "for" loops are running sequentially causing this error. How can i optimise the code running in DSP?
Soumya,
There is a good training series at: https://training.ti.com/c6000-embedded-design-workshop
Also, this is an old appnote but holds up really well and will be a good companion for the training series -- http://www.ti.com/lit/an/sprabg7/sprabg7.pdf
For AM57 in general, we have a great deal of training material that can be reviewed at https://training.ti.com/am57x-sitara-processors-training-series
Some references that should be useful:
TMS320C6000 Optimizing C Compiler Tutorial (Rev. A) http://www.ti.com/lit/pdf/spru425
TMS320C6000 Programmer's Guide (Rev. K) http://www.ti.com/lit/pdf/spru198
Processor SDK DSP section - http://software-dl.ti.com/processor-sdk-rtos/esd/docs/latest/rtos/DSP_Software.html
Best regards,
Dave
Thanks Dave.
I have one doubt regarding #pragma MUST_ITERATE.
My code contains 4 number of "nested for" loop running sequentially.
for (row...)
{
for (col)
{
//some operations
}
}
The loop iterates 450X1800 times.
The loop takes 114ms. If i use this pragma will there be any improvement in execution time?
Soumya,
The MUST_ITERATE pragma give the minimum number of iterations through the loop. So the number of nested loops doesn't necessarily apply there but if the outer loop is iterating 450x1800 times you should be able to apply it.
Note that you can look to additionally use _nassert() on the loop counters for the nested and outer loops, and also on your address pointers to give hints to the compiler for ways to unroll and parallelize.
Another optimization technique is to merge two (nested) loops into one.
Best regards,
Dave