This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6748: Dose C674x DSPLIB FFT benchmark depend on a parameter MAXN?

Part Number: TMS320C6748

Hi Champs,

I've got C674x DSPLIB FFT benchmarks below on LCDKC6748.  It seems that the results depend on a parameter MAXN in DSPF_sp_fftSPxSP_d.c.
Is that correct?  Is there anything else I should do for the benchmarking?

  o Case 1: MAXN is set to 256 (default)

    DSPF_sp_fftSPxSP    Iter#: 1    Result Successful     N = 8      radix = 2    natC: 283     optC: 171
    DSPF_sp_fftSPxSP    Iter#: 2    Result Successful     N = 16     radix = 4    natC: 388     optC: 188
    DSPF_sp_fftSPxSP    Iter#: 3    Result Successful     N = 32     radix = 2    natC: 891     optC: 342
    DSPF_sp_fftSPxSP    Iter#: 4    Result Successful     N = 64     radix = 4    natC: 1635    optC: 538
    DSPF_sp_fftSPxSP    Iter#: 5    Result Successful     N = 128    radix = 2    natC: 4122    optC: 1221
    DSPF_sp_fftSPxSP    Iter#: 6    Result Successful     N = 256    radix = 4    natC: 8140    optC: 2239

  o Case 2: MAXN is set to 512

    DSPF_sp_fftSPxSP    Iter#: 1    Result Successful     N = 8      radix = 2    natC: 297      optC: 187
    DSPF_sp_fftSPxSP    Iter#: 2    Result Successful     N = 16     radix = 4    natC: 387      optC: 188
    DSPF_sp_fftSPxSP    Iter#: 3    Result Successful     N = 32     radix = 2    natC: 891      optC: 342
    DSPF_sp_fftSPxSP    Iter#: 4    Result Successful     N = 64     radix = 4    natC: 1635     optC: 538
    DSPF_sp_fftSPxSP    Iter#: 5    Result Successful     N = 128    radix = 2    natC: 4122     optC: 1221
    DSPF_sp_fftSPxSP    Iter#: 6    Result Successful     N = 256    radix = 4    natC: 8140     optC: 2239
    DSPF_sp_fftSPxSP    Iter#: 7    Result Successful     N = 512    radix = 2    natC: 20099    optC: 5358

  o Case 3: MAXN is set to 1024

    DSPF_sp_fftSPxSP    Iter#: 1    Result Successful     N = 8       radix = 2    natC: 306      optC: 191
    DSPF_sp_fftSPxSP    Iter#: 2    Result Successful     N = 16      radix = 4    natC: 387      optC: 188
    DSPF_sp_fftSPxSP    Iter#: 3    Result Successful     N = 32      radix = 2    natC: 890      optC: 343
    DSPF_sp_fftSPxSP    Iter#: 4    Result Successful     N = 64      radix = 4    natC: 1637     optC: 538
    DSPF_sp_fftSPxSP    Iter#: 5    Result Successful     N = 128     radix = 2    natC: 4169     optC: 1294
    DSPF_sp_fftSPxSP    Iter#: 6    Result Successful     N = 256     radix = 4    natC: 8262     optC: 2425
    DSPF_sp_fftSPxSP    Iter#: 7    Result Successful     N = 512     radix = 2    natC: 20328    optC: 5667
    DSPF_sp_fftSPxSP    Iter#: 8    Result Successful     N = 1024    radix = 4    natC: 40677    optC: 10976

  o Case 4: MAXN is set to 2048


    DSPF_sp_fftSPxSP    Iter#: 1    Result Successful     N = 8       radix = 2    natC: 301      optC: 191
    DSPF_sp_fftSPxSP    Iter#: 2    Result Successful     N = 16      radix = 4    natC: 387      optC: 188
    DSPF_sp_fftSPxSP    Iter#: 3    Result Successful     N = 32      radix = 2    natC: 900      optC: 343
    DSPF_sp_fftSPxSP    Iter#: 4    Result Successful     N = 64      radix = 4    natC: 1669     optC: 598
    DSPF_sp_fftSPxSP    Iter#: 5    Result Successful     N = 128     radix = 2    natC: 4271     optC: 1411
    DSPF_sp_fftSPxSP    Iter#: 6    Result Successful     N = 256     radix = 4    natC: 8384     optC: 2711
    DSPF_sp_fftSPxSP    Iter#: 7    Result Successful     N = 512     radix = 2    natC: 20724    optC: 6429
    DSPF_sp_fftSPxSP    Iter#: 8    Result Successful     N = 1024    radix = 4    natC: 41544    optC: 12425
    DSPF_sp_fftSPxSP    Iter#: 9    Result Successful     N = 2048    radix = 2    natC: 98854    optC: 28673

The below is my tools and settings.

  - C674x DSPLIB: v3.4.0.0
  - CCS         : v7.2.0.00012
  - CGTools     : v7.4.21

  - L2 : 256KB SRAM
  - L1P:  32KB Cache
  - L1D:  32KB Cache

  - Build Configurations: Release

And I attached the cmd file.

/* ======================================================================= */

-c

-heap  0x1000
-stack 0x1000

-lC:/ti/dsplib_c674x_3_4_0_0/packages/ti/dsplib/lib/dsplib.lib
-lC:/ti/dsplib_c674x_3_4_0_0/packages/ti/dsplib/lib/dsplib_cn.lib

/* MODULE MEM */
MEMORY
{
   L2SRAM (RWX)     : origin = 0x800000,  len = 0x40000
}

SECTIONS
{

    .kernel: {
      *.obj (.text:optimized) { SIZE(_kernel_size) }
    }

    .text:       load >> L2SRAM
    .text:touch: load >> L2SRAM
    
    GROUP (NEAR_DP)
    {
    .neardata
    .rodata
    .bss
    } load > L2SRAM
   
    .far:        load >> L2SRAM
    .fardata:    load >> L2SRAM
    .data:       load >> L2SRAM
    .switch:     load >> L2SRAM
    .stack:      load >  L2SRAM
    .args:       load >  L2SRAM align = 0x4, fill = 0 {_argsize = 0x200; }
    .sysmem:     load >  L2SRAM
    .cinit:      load >  L2SRAM
    .const:      load >  L2SRAM START(const_start) SIZE(const_size)
    .pinit:      load >  L2SRAM
    .cio:        load >> L2SRAM
    xdc.meta:    load >> L2SRAM, type = COPY
    .init_array: load >  L2SRAM
}

/* ======================================================================= */

Regards,
j-breeze

  • Hi,

    I've asked the factory team to have a look at this. Their feedback will be posted here.

    Best Regards,
    Yordan
  • Yes, MAXN is utilized to determine how many iterations of the benchmark test will run. In the final iteration it uses the MAXN value as number of input samples to compute the complex FFT. Most of the updates you mentioned for getting bench marking on this platform are accurate. For best benchmark results data and code should be placed in L2.

    Please note for higher values of MAXN like 32K and 64K, the data, code and twiddle factors will not all fit inside the L2 memory so you will need to move some data or code sections to mDDR/DDR(external memory) and enable cache but data and code access time will be higher so this will have some impact on the benchmarks.

    Regards,
    Rahul
  • Hi,

    I'm sorry I didn't make my question clear enough.  My question was the number of optC benchmark results seemed to depend on the parameter MAXN like below.
    Is that correct?

      - MAXN = 256  : N = 256  optC: 2239
      - MAXN = 512  : N = 256  optC: 2239
      - MAXN = 1024 : N = 256  optC: 2425
      - MAXN = 2048 : N = 256  optC: 2711

      - MAXN = 512  : N = 512  optC: 5358
      - MAXN = 1024 : N = 512  optC: 5667
      - MAXN = 2048 : N = 512  optC: 6429

    Could you please check it out again?

    Regards,
    j-breeze

  • J-Breeze,

    With all program and data loaded into L2 SRAM, the only thing that would affect operation from one pass to another is L1 cache. Since the program size is small, L1P is not likely to be a problem, but please check what the size of your total .text space is just to confirm that.

    The most likely cause of the numbers changing is that by increasing MAXN, you increase the distance between the several buffers that are being used. L1D is a 2-way set-associative cache, so it can readily handle two sets of data up to 16KBytes, if those were placed exactly at the right places.

    The benchmark is a good method of showing what performance you can expect. If the numbers look good for the size FFT you need to run, then you would work to improve that one. There is not much to be gained by trying to analyze and optimize the different iterations of a generalized benchmark, in my opinion. But if you implement your FFT in you application and your numbers are closer to the higher optC result, then you will know you could work on optimizing it.

    If I recall correctly, the Cache User Guide has some useful discussions about the details of cache access issues and what things you can do to improve how your algorithm operates.

    Regards,
    RandyP
  • Hi RandyP,

    Thank you for your advice.  I will take a look at the U/G.

    Regards,
    j-breeze