TMS320C6748: Dose C674x DSPLIB FFT benchmark depend on a parameter MAXN?

j-breeze

Part Number: TMS320C6748

Hi Champs,

I've got C674x DSPLIB FFT benchmarks below on LCDKC6748. It seems that the results depend on a parameter MAXN in DSPF_sp_fftSPxSP_d.c.
Is that correct? Is there anything else I should do for the benchmarking?

o Case 1: MAXN is set to 256 (default)

    DSPF_sp_fftSPxSP    Iter#: 1    Result Successful    N = 8      radix = 2   natC: 283   optC: 171
    DSPF_sp_fftSPxSP    Iter#: 2   Result Successful    N = 16   radix = 4   natC: 388   optC: 188
    DSPF_sp_fftSPxSP    Iter#: 3   Result Successful    N = 32   radix = 2   natC: 891   optC: 342
    DSPF_sp_fftSPxSP    Iter#: 4   Result Successful    N = 64   radix = 4   natC: 1635   optC: 538
    DSPF_sp_fftSPxSP    Iter#: 5   Result Successful    N = 128   radix = 2   natC: 4122   optC: 1221
    DSPF_sp_fftSPxSP    Iter#: 6   Result Successful    N = 256   radix = 4   natC: 8140   optC: 2239

o Case 2: MAXN is set to 512

    DSPF_sp_fftSPxSP   Iter#: 1   Result Successful    N = 8      radix = 2   natC: 297      optC: 187
    DSPF_sp_fftSPxSP   Iter#: 2   Result Successful    N = 16   radix = 4   natC: 387      optC: 188
    DSPF_sp_fftSPxSP   Iter#: 3   Result Successful    N = 32   radix = 2   natC: 891      optC: 342
    DSPF_sp_fftSPxSP   Iter#: 4   Result Successful    N = 64   radix = 4   natC: 1635   optC: 538
    DSPF_sp_fftSPxSP   Iter#: 5   Result Successful    N = 128   radix = 2   natC: 4122   optC: 1221
    DSPF_sp_fftSPxSP   Iter#: 6   Result Successful    N = 256   radix = 4   natC: 8140   optC: 2239
    DSPF_sp_fftSPxSP   Iter#: 7   Result Successful    N = 512   radix = 2   natC: 20099   optC: 5358

o Case 3: MAXN is set to 1024

    DSPF_sp_fftSPxSP   Iter#: 1   Result Successful    N = 8       radix = 2   natC: 306      optC: 191
    DSPF_sp_fftSPxSP   Iter#: 2   Result Successful    N = 16      radix = 4   natC: 387      optC: 188
    DSPF_sp_fftSPxSP   Iter#: 3   Result Successful    N = 32      radix = 2   natC: 890      optC: 343
    DSPF_sp_fftSPxSP   Iter#: 4   Result Successful    N = 64      radix = 4   natC: 1637   optC: 538
    DSPF_sp_fftSPxSP   Iter#: 5   Result Successful    N = 128   radix = 2   natC: 4169   optC: 1294
    DSPF_sp_fftSPxSP   Iter#: 6   Result Successful    N = 256   radix = 4   natC: 8262   optC: 2425
    DSPF_sp_fftSPxSP   Iter#: 7   Result Successful    N = 512   radix = 2   natC: 20328   optC: 5667
    DSPF_sp_fftSPxSP   Iter#: 8   Result Successful    N = 1024   radix = 4   natC: 40677   optC: 10976

o Case 4: MAXN is set to 2048

    DSPF_sp_fftSPxSP   Iter#: 1   Result Successful    N = 8       radix = 2   natC: 301      optC: 191
    DSPF_sp_fftSPxSP   Iter#: 2   Result Successful    N = 16      radix = 4   natC: 387      optC: 188
    DSPF_sp_fftSPxSP   Iter#: 3   Result Successful    N = 32      radix = 2   natC: 900      optC: 343
    DSPF_sp_fftSPxSP   Iter#: 4   Result Successful    N = 64      radix = 4   natC: 1669   optC: 598
    DSPF_sp_fftSPxSP   Iter#: 5   Result Successful    N = 128   radix = 2   natC: 4271   optC: 1411
    DSPF_sp_fftSPxSP   Iter#: 6   Result Successful    N = 256   radix = 4   natC: 8384   optC: 2711
    DSPF_sp_fftSPxSP   Iter#: 7   Result Successful    N = 512   radix = 2   natC: 20724   optC: 6429
    DSPF_sp_fftSPxSP   Iter#: 8   Result Successful    N = 1024   radix = 4   natC: 41544   optC: 12425
    DSPF_sp_fftSPxSP   Iter#: 9   Result Successful    N = 2048   radix = 2   natC: 98854   optC: 28673

The below is my tools and settings.

- C674x DSPLIB: v3.4.0.0
- CCS         : v7.2.0.00012
- CGTools     : v7.4.21

- L2 : 256KB SRAM
- L1P: 32KB Cache
- L1D: 32KB Cache

- Build Configurations: Release

And I attached the cmd file.

/* ======================================================================= */

-c
-heap 0x1000
-stack 0x1000

-lC:/ti/dsplib_c674x_3_4_0_0/packages/ti/dsplib/lib/dsplib.lib
-lC:/ti/dsplib_c674x_3_4_0_0/packages/ti/dsplib/lib/dsplib_cn.lib

/* MODULE MEM */
MEMORY
{
   L2SRAM (RWX)     : origin = 0x800000, len = 0x40000
}

SECTIONS
{

    .kernel: {
      *.obj (.text:optimized) { SIZE(_kernel_size) }
    }

    .text:       load >> L2SRAM
    .text:touch: load >> L2SRAM

    GROUP (NEAR_DP)
    {
    .neardata
    .rodata
    .bss
    } load > L2SRAM

    .far:        load >> L2SRAM
    .fardata:    load >> L2SRAM
    .data:       load >> L2SRAM
    .switch:     load >> L2SRAM
    .stack:      load > L2SRAM
    .args:       load > L2SRAM align = 0x4, fill = 0 {_argsize = 0x200; }
    .sysmem:     load > L2SRAM
    .cinit:      load > L2SRAM
    .const:      load > L2SRAM START(const_start) SIZE(const_size)
    .pinit:      load > L2SRAM
    .cio:        load >> L2SRAM
    xdc.meta:    load >> L2SRAM, type = COPY
    .init_array: load > L2SRAM
}

/* ======================================================================= */

Regards,
j-breeze

over 6 years ago

0 Yordan Kovachev over 6 years ago

TI__Guru**** 161600 points

Hi,

I've asked the factory team to have a look at this. Their feedback will be posted here.

Best Regards,
Yordan

0 Rahul Prabhu over 6 years ago

TI__Guru** 114410 points

Yes, MAXN is utilized to determine how many iterations of the benchmark test will run. In the final iteration it uses the MAXN value as number of input samples to compute the complex FFT. Most of the updates you mentioned for getting bench marking on this platform are accurate. For best benchmark results data and code should be placed in L2.

Please note for higher values of MAXN like 32K and 64K, the data, code and twiddle factors will not all fit inside the L2 memory so you will need to move some data or code sections to mDDR/DDR(external memory) and enable cache but data and code access time will be higher so this will have some impact on the benchmarks.

Regards,
Rahul

0 j-breeze over 6 years ago in reply to Rahul Prabhu

Mastermind 7560 points

Hi,

I'm sorry I didn't make my question clear enough. My question was the number of optC benchmark results seemed to depend on the parameter MAXN like below.
Is that correct?

- MAXN = 256 : N = 256 optC: 2239
- MAXN = 512 : N = 256 optC: 2239
- MAXN = 1024 : N = 256 optC: 2425
- MAXN = 2048 : N = 256 optC: 2711

- MAXN = 512 : N = 512 optC: 5358
- MAXN = 1024 : N = 512 optC: 5667
- MAXN = 2048 : N = 512 optC: 6429

Could you please check it out again?

Regards,
j-breeze

0 RandyP over 6 years ago in reply to j-breeze

TI__Guru* 84110 points

J-Breeze,

With all program and data loaded into L2 SRAM, the only thing that would affect operation from one pass to another is L1 cache. Since the program size is small, L1P is not likely to be a problem, but please check what the size of your total .text space is just to confirm that.

The most likely cause of the numbers changing is that by increasing MAXN, you increase the distance between the several buffers that are being used. L1D is a 2-way set-associative cache, so it can readily handle two sets of data up to 16KBytes, if those were placed exactly at the right places.

The benchmark is a good method of showing what performance you can expect. If the numbers look good for the size FFT you need to run, then you would work to improve that one. There is not much to be gained by trying to analyze and optimize the different iterations of a generalized benchmark, in my opinion. But if you implement your FFT in you application and your numbers are closer to the higher optC result, then you will know you could work on optimizing it.

If I recall correctly, the Cache User Guide has some useful discussions about the details of cache access issues and what things you can do to improve how your algorithm operates.

Regards,
RandyP

0 j-breeze over 6 years ago in reply to RandyP

Mastermind 7560 points

Hi RandyP,

Thank you for your advice. I will take a look at the U/G.

Regards,
j-breeze

Processors

Processors forum

TMS320C6748: Dose C674x DSPLIB FFT benchmark depend on a parameter MAXN?