Part Number: AM5728
Tool/software: Code Composer Studio
Hi, kind TI and everyone!
I tested the ti\processor_sdk_rtos_am57xx_5_00_00_15\demos\audio-benchmark-starterkit using CCS8.2, AM5728 IDK.
I mapped all memory sections into DDR and tested L2SRAM cache as bellow:
CACHE_setL1PSize(CACHE_L1_32KCACHE);
CACHE_setL1DSize(CACHE_L1_32KCACHE);
CACHE_setL2Size(CACHE_128KCACHE);
CACHE_enableCaching(64);
CACHE_enableCaching(128);
CACHE_enableCaching(129);
And result is good.
DSPF_sp_fftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 1387 optC: 2953 SA: 355
DSPF_sp_fftSPxSP Iter#: 2 Intrinsic Successful SA Successful N = 16 radix = 4 natC: 606 optC: 2491 SA: 206
DSPF_sp_fftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 1273 optC: 6336 SA: 284
DSPF_sp_fftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 2326 optC: 13252 SA: 424
DSPF_sp_fftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 5378 optC: 34218 SA: 891
DSPF_sp_fftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 10577 optC: 71518 SA: 1600
But after I inserted the DDR test, cycles increased to previous result when no cache using.
#define TEST_BUFF_SZ 8*1024*1024
#pragma DATA_ALIGN(a, CACHE_L2_LINESIZE)
#pragma DATA_ALIGN(b, CACHE_L2_LINESIZE)
#pragma DATA_ALIGN(c, CACHE_L2_LINESIZE)
static short a[TEST_BUFF_SZ], b[TEST_BUFF_SZ], c[TEST_BUFF_SZ];
main()
{
Board_init(boardCfg);
for (i = 0; i < TEST_BUFF_SZ; i++)
{
a[i] = b[i] = i << 2;
}
TSCL= 0,TSCH=0;
/* Compute the overhead of calling _itoll(TSCH, TSCL) twice to get timing info */
/* ---------------------------------------------------------------- */
t_start = _itoll(TSCH, TSCL);
t_stop = _itoll(TSCH, TSCL);
t_overhead = t_stop - t_start;
t_start = _itoll(TSCH, TSCL);
for (i = 0; i < TEST_BUFF_SZ; i++)
{
c[i] = a[i] + b[i];
}
t_stop = _itoll(TSCH, TSCL);
t_cn = (t_stop - t_start) - t_overhead;
AUDIO_log("DDR test:%d,%d\n", t_cn, c[1]);
CACHE_setL1PSize(CACHE_L1_32KCACHE);
CACHE_setL1DSize(CACHE_L1_32KCACHE);
CACHE_setL2Size(CACHE_128KCACHE); //USer defined.
CACHE_enableCaching(64);
CACHE_enableCaching(128);
CACHE_enableCaching(129);
CACHE_invL2((void *)0, 128*1024, CACHE_WAIT); // invalidate entire L2SRAM
CACHE_invL2Wait();
for (N = 8, k = 1; N <= MAXN; N = N * 2, k++)
{
// run benchmark
...
}
DSPF_sp_fftSPxSP Iter#: 1 Intrinsic Successful SA Successful N = 8 radix = 2 natC: 5120 optC: 29480 SA: 4999
DSPF_sp_fftSPxSP Iter#: 2 Intrinsic Successful SA Successful N = 16 radix = 4 natC: 8438 optC: 58262 SA: 7156
DSPF_sp_fftSPxSP Iter#: 3 Intrinsic Successful SA Successful N = 32 radix = 2 natC: 25211 optC: 168056 SA: 16102
DSPF_sp_fftSPxSP Iter#: 4 Intrinsic Successful SA Successful N = 64 radix = 4 natC: 49016 optC: 348002 SA: 29614
DSPF_sp_fftSPxSP Iter#: 5 Intrinsic Successful SA Successful N = 128 radix = 2 natC: 133709 optC: 913610 SA: 69847
DSPF_sp_fftSPxSP Iter#: 6 Intrinsic Successful SA Successful N = 256 radix = 4 natC: 265388 optC: 1882187 SA: 140677
Please tell me why benchmark performance dropped.
Thanks.
Regards.
Aither.