TI E2E Community
Digital Signal Processors (DSP)
C6000 Single Core DSP
C67x Single Core DSP Forum
Optimize processing speed for C6748 for math operations.
Dear TI Experts,
We used to have our image processing device developed on ARM11 500MHz. Our image processing library is optimized to a performance that is enough to do live evaluation on a video stream.
We have decided to have the similar device developed on C6748 300Mhz. A TI expert has kindly advised us that with a heavy math-based library like ours, C6748 will be a best fit which produces a competitive performance.
We have successfully had our library run on C6748 but unfortunately the speed is 6 times slower than the one from ARM11. I refer to do some optimization suggestions from TI such as --opt_level (3), --program_level_compile, --opt_for_speed (5), --call_assumptions (0). The speed improves a little bit but not near to our expectation. We do know that TI supports IMAGELIB, DSPLIB, etc. but after referring to their APIs, we decide to keep our code the same since we will not utilize much of them.
We think that there must be something wrong with the way we use your DSP fixed and floating point features. Do we need to do some special settings to enable fixed and floating point features or they are already by default enabled in TI's compiler/linker?
One more thing, when I try to change ABI (application binary interface) from eabi to coff, the speed is improve quite a bit (like 2, 3 times) however the system becomes unstable with coffabi (algorithm execution suddenly goes wrong, appear trash numbers, etc.). Does ABI affect the speed?
We really appreciate if you can advise us some optimization approaches to improve our performance speed.
Thank you very much,
Dependent on where your PS and DS are located, on chip or off chip and the use of caching in your system, you will need to use all necessary optimization techniques aside from -o3 to reach maximum performance on this device.
You may want to download and do a quick review of TMS320C6000 Optimization Workshop from TI.
Since you are using Natural C code, you are hoping to get the best out of box performance from the C674x device using the compiler. We have a quick introduction to optimizing your code for the C674X device which is discussed in the following document.
Ensure that you use appropriate linker command scripts and cache settings while evaluating the performance on the hardware.
---------------------------------------------------------------------------------Please click the Verify Answer button on this post if it answers your question.---------------------------------------------------------------------------------
Tuyen Nguyenwhen I try to change ABI (application binary interface) from eabi to coff, the speed is improve quite a bit (like 2, 3 times) however the system becomes unstable with coffabi (algorithm execution suddenly goes wrong, appear trash numbers, etc.). Does ABI affect the speed?
This doesn't make any sense. All other things being equal, changing the ABI will have very little impact on performance. Something else must be changing at the same time, and it only appears that ABI is the cause. I'm not sure what could make such a difference. But, whatever it is, it is worth finding. Because you can probably apply it to the EABI build, and get all that performance back.
Thanks and regards,
TI C/C++ Compiler Forum ModeratorPlease click Verify Answer on the best reply to your question.The Compiler Wiki answers most common questions.Track an issue with SDOWP. Enter your bug id in the "Find Record ID" box.
First this come to mind is the DSP PLL/DDR etc... initialization difference. You may want to inspect the registers in both cases and compare.
Also the state of L1P and L1D caching (left over from Boot ROM).
Thanks a lot,
Because the urgency of the project, I wanted to have quick techniques to optimize the speed without touching the code. But apparently, I cannot. So I spend time on digging into the code to apply compiler optimization for functions and loops and it improves a little bit.
ABI change: yes I found it weird too. All I did was that I set "output format" to "legacy COFF" and --abi to coffabi, other settings were kept the same. And suddenly the algorithms ran faster with unstable behaviors. I read somewhere in one of TI's documents saying that COFF is obsolete but gives higher performance due to some reduced overhead. Not sure if it is true. But apparently I cannot use COFF due to the unstable problem.
Cache: I have been reading documents about TI's cache concept and settings. And to be honest, because I am new in TI chip, I don't know exactly how to enable it using CCS 5. I am using Sys/Bios. I try to config it through .cfg file but it seems there is no change on the speed. I suspect that I am doing it wrong. There is a document guiding cache setting using CCS4 through .tcf file. But I don't know how to do it in CCS 5 because in my CCS 5 version, there is no tool to generate .tcf. I am using the default ti.platforms.evmOMAPL138. But when the project is finalized, I have to create my own customized platform. I don't know exactly which tool should I use.
It would be very nice of you to give me some initial instructions for cache. Plus, do I need to do cache setting for .lib project or only for the application project which uses that .lib?
Thank you very much for all your supports,
The unstable nature of the binary with COFF is weird. Can you specify the compiler flags that you have and the version of compiler that you are using? If you can extract the piece of code that shows the unstable behaviors when compiled in COFF format, we can examine what is causing this issue.
WIth regards to cache, you need to do cache settings only for the application project and not for the lib project. SYSBIOS enables device cache by default while configuring the platform but if you wish to explicitly turn on the cache use the following lines in your .cfg file
/* Enable MAR bits for Cache */var Cache = xdc.useModule('ti.sysbios.family.c64p.Cache');Cache.MAR128_159 = 0xFFFFFFFF;
By default OMAPL138 EVM uses 32K L1P program cache and 32K L1D data cache. You can also use Cache_enable() API to enable all levels of cache in sysbios.
Please refer to section 5.6 and section 6.4 for all SYSBIOS Cache configuration options.
Here is where you set them in CCS5.x SYS/BIOS
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.