This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

difference between the execution time of the DSP code on code composer studio and the execution time of the DSP code on DSP/ARM system (OMAP-L137)

Other Parts Discussed in Thread: OMAP-L137, OMAPL138, SYSBIOS

Dear Sir/Madam,

I have designed a system included ARM and DSP processors on OMAP-L137 EVM. I designed and debugged the DSP side by code composer and calculated the execution time in terms of number of clocks. Then, I imported the DSP code in the ARM/DSP system and developed the required communication channel. I checked the execution time of the DSP code in the DSP/ARM system. The execution time of my main function is about 3-4 times more than its execution time on code composer studio????!!!!

What is the problem? I would be very thankful if you can help me.

Thanks,

Jone.

  • Dear Jone,

    I have designed a system included ARM and DSP processors on OMAP-L137 EVM. I designed and debugged the DSP side by code composer and calculated the execution time in terms of number of clocks. Then, I imported the DSP code in the ARM/DSP system and developed the required communication channel. I checked the execution time of the DSP code in the DSP/ARM system. The execution time of my main function is about 3-4 times more than its execution time on code composer studio????!!!!

    You are not getting the same result once you flashed the DSP code into OMAPL137 EVM board ?
    Yes, you can see some difference while you run the code on CCS vs booting from flash.

    What boot mode are you using ?
  • Dear Titus ,
    I am using booting from flash.
  • The ARM is booted from flash not DSP.
  • The ARM is booted from flash and the ARM loads and executes the code on the DSP.
  • Dear Jone,
    Which application do you want to boot ?
    ARM or DSP ?

    Actually, OMAPL137 is DSP boot device (master), after reset, DSP comes out of reset and wake up the ARM through user bootloader code (UBL)

    OMAPL138 is ARM boot device (master), after reset, ARM comes out of reset and wake up the DSP through user bootloader code (UBL)

    But in emulation boot mode , both ARM & DSP are got enabled by RBL.
    So, your observation is correct and expected.

    If you want to boot ARM code in OMAPL137 EVM board then you have to write UBL code to enable the ARM core and run the application.

    If you want to boot DSP code in OMAPL137 EVM board then you can run the application directly and you won't see the much delay than CCS running.
  • Dear Titus ,
    Thank you for your quick response
    As you said OMAPL137 is DSP boot device. ARM load the DSP code on the DSP and DSP starts to run my application. However, the execution is too slow.

    How can I resolve the problem. I want to run DSP with 300 MHZ.

    regards,
  • It seems that the DSP frequency is 45 MHZ instead of 300 MHz.
  • Dear John,
    As you said OMAPL137 is DSP boot device. ARM load the DSP code on the DSP and DSP starts to run my application. However, the execution is too slow.

    Sorry, not able to understand.
    What application need to boot ?
    ARM or DSP ?


    It seems that the DSP frequency is 45 MHZ instead of 300 MHz.

    How did you measure the DSP clock ?
  • Dear Titus,
    Our system is based on DSP-Link communication. We have modified an example in EVM package. The DSP code is loaded by the ARM processor.
    To calculate the clock frequency, we make a GP pin High and Low. We can estimate the frequency by this action.

  • To calculate the clock frequency, we make a GP pin High and Low. We can estimate the frequency by this action.

    You can use OBSCLK pin to measure the SYSCLK.
  • It sounds like the only difference is the communication channel code. That code could be blocking for relatively long periods of time. Your CCS DSP side project could be compiled into a an image and loaded into the DSP from the ARM. That would be a better comparision. Or stub out the comm code in your imported DSP code.
  • Dear Titus,

    Thank you for your response

    The problem still exists. The main question is that

    Why does the DSP program takes longer in the DSP/ARM system? When the DSP program is run standalone on the DSP by code composer studio it takes shorter.


    The architecture of our system as follows:

    • The data between ARM and DSP is communicated by DSPlink.
    • an .out file that is on the OS of the ARM is loaded by ARM on DSP and DSP starts to run the program

    The runtime of the program on the DSP is about 3-4 times more than its runtime on standalone system (the system without ARM that runs on the code composer).

    Regards,

    Jone,

     

  • As Norman said, I also hope that the DSPlink is consuming some amount of time to communicate between ARM &DSP.

    processors.wiki.ti.com/.../DSPLink_FAQs

    processors.wiki.ti.com/.../DSPLink_Application_on_OMAP-L1x

    You can try to run the ARM & DSP in maximum CPU clock 456MHz.
  • It is not a problem. I am sure that the DSPlink is not issue. The project in the code composer is RTSC. May It be the issue? How can I use RTSC on the linux ?
  • RTSC is a package used to build the SYSBIOS based applications.
  • Dear Titus,
    I know. How can I say to the compiler in the linux system that I want to use RTSC package to compile DSP-side code?
  • To confirm:

    Scenario A
    ARM
    - Bare metal idle? Suspended? Linux?
    DSP
    - Custom application code
    - DSPBIOS
    - RTSC
    - Image is loaded via CCS/JTAG

    Scenario B
    ARM
    - Linux
    - Custom app that uses DSPLINK to load dsp.out into DSP
    - Custom app also communicates to DSP? Via what DSPLINK mechanism? MSGQ?
    DSP
    - Custom application code + DSPLINK comm code
    - DSPLINK comm code uses what DSPLINK mechanism?
    - DSPBIOS
    - DSPLINK
    - RTSC
    - Image is saved to dsp.out.
    - Main function is about 3-4 times more than its execution time in Scenario A?

    Scenario C
    - Same as Scenario B but loaded via CCS.

    Are you comparing Scenario A with Scenario B? Or are you comparing Scenario B with Scenario C.
  • Dear Wong,

    I compared the two following Scenarios:

    Scenario 1:

    - I load dsp.out via CCS.

    - I calculate the number of clocks elapsed for one particular function by CCS facilities.

    - The ARM is not enabled.

    Scenario 2:

    -ARM is enabled

    -Custom app uses DSPLINK to load dsp.out into DSP.

    -That particular function is run slower than Scenario 1.

    -The DSP directly write into the shared memory space by PROC_WRITE and read from the shared memory by PROC_READ.

    -To synchronizeb between ARM and DSP the reading and writing a Message Queue is used.

    I am sure that the DSPlink is not the cause of slowness. In the function, the DSP does not have any communication with ARM. Therefore, the code is exclusively run

    on DSP.

    How can I activate RTSC for DSP in Scenario 2? I think it may be a issue. By the way, the map files in two Scenarios are the same.

    What is the problem in your idea?

    Regards,

    Jone,

  • I assume that your "one particular function" is exactly identical between scenarios. So far, I have been guessing that your function had gotten some DSPLink calls added to it. Calls like MSGQ_get(), MSGQ_put().

    If your "one particular function" is exactly identical between scenarios then you do indeed have a puzzle.

    Maybe the DSP processor clock is different. The CCS side is probably defined by the GEL script. The DSPLINK compile usually defaults to whatever DSPBIOS has been configured with. I can't remember where that is configured.

    If your DSP code executing from DSP local memory or shared DRAM? Access to shared memory is arbitrated by the HW. Accesses from the ARM could slow down accesses from the DSP side.

    No idea about RTSC. Not even sure if it is even available on the Linux host. Seems more of a Windows host Eclipse thing.

    That's all I got. Suggest you reduce your code to a simple test case and submit it to the TI guys.
  • The problem was solved. It is because of compiler version. I download the linux version of the CCS 5.0.2 and port its compiler to the OMAP-L137 tool set. The problem was solved
    Jone,