This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6745 Performance

Hello.

I have the next problem:

I have C6745 DSP with PLL configuration on 300MHz.

I am tested  [ DSPF_sp_cfftr2_dit_cn ]  function [N=512]. Execution of this function is 320 us = (0,32 ms)  [Tektronix oscilloscope]

It is 96000 clocks on 300MHz. I think that  it is too long.

1) How many cycles  must execution DSPF_sp_cfftr2_dit_cn  function ?

2) How check DSP Performance?

Thanks.

  • I am using Profile clock in CCS and calculated that execution of this function is 97147 cycles. Why it is to long ?
    I am using dsplib_c674x_3_4_0_0 without BIOS. I don't use external SDRAM.

    This is my C6745.cmd file:

    MEMORY
    {
    DSPL2ROM o = 0x00700000 l = 0x00100000 /* 1MB L2 Internal ROM */
    DSPL2RAM o = 0x00800000 l = 0x00040000 /* 256kB L2 Internal RAM */
    DSPL1PRAM o = 0x00E00000 l = 0x00008000 /* 32kB L1 Internal Program RAM */
    DSPL1DRAM o = 0x00F00000 l = 0x00008000 /* 32kB L1 Internal Data RAM */
    SHDSPL2ROM o = 0x11700000 l = 0x00100000 /* 1MB L2 Shared Internal ROM */
    SHDSPL2RAM o = 0x11800000 l = 0x00040000 /* 256kB L2 Shared Internal RAM */
    SHDSPL1PRAM o = 0x11E00000 l = 0x00008000 /* 32kB L1 Shared Internal Program RAM */
    SHDSPL1DRAM o = 0x11F00000 l = 0x00008000 /* 32kB L1 Shared Internal Data RAM */
    EMIFACS2 o = 0x60000000 l = 0x02000000 /* 32MB Async Data (CS2) */
    EMIFACS3 o = 0x62000000 l = 0x02000000 /* 32MB Async Data (CS3) */
    EMIFACS4 o = 0x64000000 l = 0x02000000 /* 32MB Async Data (CS4) */
    EMIFACS5 o = 0x66000000 l = 0x02000000 /* 32MB Async Data (CS5) */
    EMIFBSDRAM o = 0xC0000000 l = 0x10000000 /* 256MB SDRAM Data */
    }

    SECTIONS
    {
    .isr_vectors > SHDSPL2RAM
    .text > SHDSPL2RAM
    .stack > SHDSPL2RAM
    .bss > SHDSPL2RAM
    .cio > SHDSPL2RAM
    .const > SHDSPL2RAM
    .data > SHDSPL2RAM
    .switch > SHDSPL2RAM
    .sysmem > SHDSPL2RAM
    .far > SHDSPL2RAM
    .args > SHDSPL2RAM
    .ppinfo > SHDSPL2RAM
    .ppdata > SHDSPL2RAM

    /* COFF sections */
    .pinit > SHDSPL2RAM
    .cinit > SHDSPL2RAM

    /* EABI sections */
    .binit > SHDSPL2RAM
    .init_array > SHDSPL2RAM
    .neardata > SHDSPL2RAM
    .fardata > SHDSPL2RAM
    .rodata > SHDSPL2RAM
    .c6xabi.exidx > SHDSPL2RAM
    .c6xabi.extab > SHDSPL2RAM
    }

    Thanks.
  • Hi,

    Thanks for your post.

    in general, to benchmark the DSPLIB kernels, TI recommends the use of the C674x Cycle Accurate Simulator, which is included in Code Composer Studio and after loading CCS, select Profile->Clock->Enable. This would allow the kernel demonstration apps to accurately display cycle counts. The performance of the optimized C kernels should be better than or comparable to the performance of their ASM counterparts. The test report of cycle benchmarks for all the DSP kernel functios as well the DSPLIB user manual can be found in the doc. folder after C674x DSPLIB installation and can be located at below path:

    ~\ti\dsplib_c674x_3_2_0_1\docs\DSPLib_c674xTest_Report.html

    ~\ti\dsplib_c674x_3_2_0_1\docs\DSPLIB_Users_Manual.html

    The cycle count benchmarks for the below FFT kernels on c674x DSPLIB are below:

    DSPF_sp_cfftr2_dit_674LE_LE_COFF consumes "4156 (N=256)" cycle counts

    DSPF_sp_cfftr2_dit_674LE_LE_ELF consumes "4156 (N=256)" " cycle counts

    You could enhance the DSP performance through improved customizability on exploring the below features:

    • Full source for all kernels included with library

    • “Natural C” implementation for each kernel included

    • For kernels provided in ASM, corresponding optimized C version of kernel also included to allow customization

    • For kernels provided in C, source for legacy ASM versions (from C67x DSPLIB) included to serve as reference for customers

    • Test bench and demonstration app provided for each kernel

    • Compare performance between C, ASM, and Natural C

    • Try out modified kernels without rebuilding the entire DSPLIB

    Please refer the c67x DSPLIB programmer's reference guide as below:

    http://www.ti.com/lit/ug/spru657c/spru657c.pdf

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------

  • Hello, Sivaraj.

    Thanks for your quick answer.
    I am using C674x Cycle Accurate Simulator. Execution of DSPF_sp_cfftr2_dit_cn [N=512] is 96827 cycles when enabled L1 and L2 cache,
    and 98441 cycles when L1 and L2 cache are disabled. It is not 4156 cycles as you write below.


    I am also testing DSPF_sp_fir_gen function
    - Nh = 64
    - Nr = 128

    10/16*Nr*Nh + 55 = 10/16*128*64 = 5175
    My really result is 6319 cycles.

    May be problem in DSPF_sp_cfftr2_dit_cn function [dsplib_c674x_3_4_0_0].
    May be is an alternative function of cfftr2.
    Thanks.
  • Hello.

    I'm tested function  DSPF_sp_fftSPxSP   -  it's works good. Execution of this function is 3407 cycles [N=256].

    But  DSPF_sp_cfftr2_dit_cn function doesn't work good. Spectrum after cfftr2  is not correct and execution = 96827 cycles. Maybe I do not understand something.

    This is my main function.

    main.rar

    Thanks.