This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6748: DSP Code execution slower than on OMAP-L138

Part Number: TMS320C6748
Other Parts Discussed in Thread: OMAP-L138, , OMAPL138

I'm having a puzzling problem. We've started development on a product using the MityDSP-L138F eval boards from Critical Link. We don't have a need for the ARM9 in the OMAP so in our design, we used the TMS320C6784. We've noticed that code executes roughly 27 times slower on the TMS320C6748 than it does on the OMAP-L138. 

We have the PLLs configured for 456 MHz on PLL0_SYSCLK1. We wrote some test code the does nothing but turn on all the modules and configure the PLLs, and then run an assembly code delay loop flipping the state of a GPIO pin to measure how fast code is executing.

The exact same build of the exact same code running in the OMAP is around 27 times faster than in the TMS320C6784. Our project needs the code to execute at the same speed as in the OMAP and we've run out of ideas. 

Thank you!

  • The team is notified. They will post their feedback directly here.

    BR
    Tsvetolin Shulev
  • Joshua,

    BAsed on your observation, I suspect that there is definitely some difference in the two setups that is causing this issue. Are you using MityDSP for OMAPL138 benchmarking and your platform for C6748 benchmarking? Can you please confirm that you are running the same binary on C6748 and OMAPL138? How are you benchmarking the code using CCS clock or TSCL/TSCH registers on the DSP? Does your design use the same clock and DDR settings as the Mity DSP.

    Is your code running from external memory or onchip memory? Have you checked to see if the compiler settings are the same if you are using two different project to compiler the code. What is your DSP cache settings if you are running code from shared RAM or DDR ? if you are using an OS, have you ensured that your GPIO toggle task has highest priority and is not pending on some other ISR or thread completion.

    Regards,
    Rahul
  • Rahul,

    Cache settings were the issue. I was benchmarking the code using the TSCL/TSCH registers. The code is running from internal ram, I used a JTAG debug probe to load a .out file to the internal ram on the DSP from CCS . The oversight was each processor had a different bootloader that ran BEFORE I loaded my code over JTAG.

    The bootloader on the MityDSP OMAPL138 left L1P and L1D default as cache, the bootloader that ran on the C6748 disabled L1P and L1D cache. My benchmark code didn't explicitly configure the cache. Once I explicitly configured L1 caches, the code executes at the same speed on both processors.

    Thank you,
    Joshua