This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/TMS320C6748: Compilator difference between C6748 and C6742

Part Number: TMS320C6748
Other Parts Discussed in Thread: TMS320C6742

Tool/software: Code Composer Studio

Hi,

we are using the C6748 target for developing an application. When we measure the execution time is around 120 us.

However, our application will run in the future in a C6742. 

If we change the compilation option Device->Variant from C6748 to C6742 (--define compilation option) the execution time decreases from 120 us to 28 us using the same TMS320C6748 target.

What are the difference between the compilation option --define=6742 and --define=6748? in the future, when we use the TMS320C6742, what is the most reliable time measurement 120 us or 28 us?

Best regards,

Lucía

  • Hi Lucia,

    This has been forwarded to the corresponding expert. However, response will be delayed, because he is OoO.

    Best Regards,
    Yordan
  • Lucia,

    There has to be some other difference in the setup. C6742 is a cut down (reduced peripheral set) version of the C6748 so the DSP performance will be exactly the same. Check to ensure clock setup for the two DSPs, cache setting, code placement L2 vs DDR are all matching between the two setups and also ensure that the same code path is executed between then two settings.

    What you are reporting is not possible unless you changed the compiler optimization level or some other setting is different so please verify the setup and get back to us.

    Regards,
    Rahul
  • Rahul,

    You are right. The configuration for sections in the cmd is different in both DSP.

    I have changed the sections from SHRAM to  SHDSPL2RAM in the 6748 configuration and now the execution time is similar in both compilation.

    However, I would like to understand why the time is improved changing the sections from SHRAM to SHDSPL2RAM. Could you explain?I check in the available documents but I am not understand.

    We need the faster execution time, what is the best configuration?

    Thanks,

    Lucía

  • Lucia,

    Most SOCs don`t support a flat memory model. The memory is organized as Level 1, L2, Level3/shared memory and external memory.

    Executing code from L1 and L2 is the fastest as these memories are closest to the DSP . Execution from SHRAM And DDR takes the longest as access to these memories goes through more inter connect bridges and memory controllers. the tradeoff is that L1 and L2 memory is small so you need to place critical sections in L2 while non critical sections needs to place in SHRAM and DDR memory.

    If your code base is large then you can use L1 and L2 memory as cache which improves the performance when executing from SHRAM and DDR memory. Warm cache can give same performance as L2 memory but if there is lot of data being moved in an out there is a performance penalty associated with cache eviction like most processors.

    Introduction to DSP Optimization:
    e2echina.ti.com/.../Introduction-to-TMS320C6000-DSP-Optimization.pdf

    Demystifying DSP optimization :
    www.ti.com/.../spry281.pdf


    Regards,
    Rahul