Other Parts Discussed in Thread: SYSBIOS, OMAPL138
Dear TI Experts,
We used to have our image processing device developed on ARM11 500MHz. Our image processing library is optimized to a performance that is enough to do live evaluation on a video stream.
We have decided to have the similar device developed on C6748 300Mhz. A TI expert has kindly advised us that with a heavy math-based library like ours, C6748 will be a best fit which produces a competitive performance.
We have successfully had our library run on C6748 but unfortunately the speed is 6 times slower than the one from ARM11. I refer to do some optimization suggestions from TI such as --opt_level (3), --program_level_compile, --opt_for_speed (5), --call_assumptions (0). The speed improves a little bit but not near to our expectation. We do know that TI supports IMAGELIB, DSPLIB, etc. but after referring to their APIs, we decide to keep our code the same since we will not utilize much of them.
We think that there must be something wrong with the way we use your DSP fixed and floating point features. Do we need to do some special settings to enable fixed and floating point features or they are already by default enabled in TI's compiler/linker?
One more thing, when I try to change ABI (application binary interface) from eabi to coff, the speed is improve quite a bit (like 2, 3 times) however the system becomes unstable with coffabi (algorithm execution suddenly goes wrong, appear trash numbers, etc.). Does ABI affect the speed?
We really appreciate if you can advise us some optimization approaches to improve our performance speed.
Thank you very much,
Best regards,
Tuyen Nguyen
