I am looking to port a TMS320F28335 application to the OMAP, and initially I ported a small floating-point and pointer intensive body of code in order to measure the improvement in execution time. My linker-command file is just the default created by the hello-world example app, so I know there is much to improve, but I was surprised that the OMAP took over 3-times the clock cycles to execute an identical piece of code in the 28335.My 28335 is running at 150Mhz and the OMAP is running at the default clock, which I believe is 300Mhz (not verified yet).
My initial question is general -- where should I start to improve the OMAP performance? I know the OMAP code is running from L2RAM, which is not the fastest, but from the docs, it is unclear how much better L1 will be (and I haven't figured out the tcf syntax). I enabled full-optimization, which did not help much. Is it more likely the RAM, or should I investigate cache-use or ?
Also I am using the CCS debugger clock-cycle counter to evaluate the execution times, which I assume is valid.
Thanks.