TDA4VM: Peak DDR Bandwidth Calculation and DDR Performance Counter.

inho Myung

Part Number: TDA4VM

HI,

We are using PSDK 08.06.00.11.

appPerfStatsDdrStatsPrintAll() ( vision_apps/utils/perf_stats/src/app_perf_stats_api.c ) shows peak bandwidth too large, exceeding TDA4VM EVM hardware specification ( 14.9GB/s ).

DDR performance statistics,
===========================
DDR: READ BW: AVG = 7555 MB/s, PEAK = 29782 MB/s
DDR: WRITE BW: AVG = 8623 MB/s, PEAK = 33800 MB/s
DDR: TOTAL BW: AVG = 16178 MB/s, PEAK = 63582 MB/s

appPerfStatsDddrStatsUpdate() function that calculates the bandwidth looks like this

====

...

appPerfStatsDdrStatsReadCounters(&val0, &val1, &val2, &val3, false);

uint64_t write_bytes = val0 * APP_PERF_DDR_BURST_SIZE_BYTES;
uint64_t read_bytes = val1 * APP_PERF_DDR_BURST_SIZE_BYTES;
..

uint32_t read_bw_peak = read_bytes/elapsed_time; /* in MB/s */
uint32_t write_bw_peak = write_bytes/elapsed_time; /* in MB/s */

====

If we print the above variables when peak bandwidth is too large,
elapsed_time is small and DDR performace counter value is too large, so bandwidth is calculated too large.

Example)
elapsed_time : 49 us , read_bytes : 3302400
read_bw_peak = 67395 MB/s

In this case, should the DDR performace counter and peak values be considered normal?

According to TDA4VM EVM hardware specification (14.9GB/s), peak should be up to 735KB in 49 us, right?

Any ideas why the reported peak would exceed the theoretical limit of the device?

Is there a way to accurately measure the peak DDR bandwidth?

over 1 year ago

0 Nikhil Dasan over 1 year ago

TI__Guru* 84866 points

Hi,

May I know under what scenarios are you seeing this issue?

Is it while running any SDK out of the box demo?

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

We are running our ADAS Applications running on ARM along with a TIDL which also uses the DSP/MMA to run.

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

Let me check this internally and get back to you

Regards,

Nikhil

0 Nikhil Dasan over 1 year ago in reply to Nikhil Dasan

TI__Guru* 84866 points

Hi,

Could you make every variable involved here to uint64_t ?

Like, elapsed_time, read_bw_peak, write_bw_peak etc?

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

I have tried making all variables involved in this as uint64_t, but the results are similar.
( elapsed_time is small and write_bytes value is too large )

uint64_t elapsed_time;

...
uint64_t read_bw_peak = read_bytes/elapsed_time; /* in MB/s */

uint64_t write_bw_peak = write_bytes/elapsed_time; /* in MB/s */

...

printf("elapsed_time:%4lld, RD: %8lld, WR: %8lld,, RD PEAK: %6lld, WR PEAK: %6lld\n",elapsed_time,read_bytes,write_bytes,read_bw_peak,write_bw_peak);

result )

[MCU2_1] 78.562996 s: elapsed_time: 46, RD: 1470464, WR: 992896,, RD PEAK: 31966, WR PEAK: 21584

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

Could you please do a reset using the API appPerfStatsDdrStatsResetAll() after every read?

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

The above results are the result of calling appPerfStatsDdrStatsPrintAll(), which already calls appPerfStatsDdrStatsResetAll() at the end.

appPerfStatsDdrStatsPrintAll() ( vision_apps/utils/perf_stats/src/app_perf_stats_api.c )

int32_t appPerfStatsPrintAll()

{

appPerfStatsCpuLoadPrintAll();

appPerfStatsHwaLoadPrintAll();

appPerfStatsDdrStatsPrintAll();

appPerfStatsCpuStatsPrintAll();

appPerfStatsResetAll();

return 0;

}

void appPerfStatsResetAll()

{

appPerfStatsCpuLoadResetAll();

appPerfStatsHwaLoadResetAll();

appPerfStatsDdrStatsResetAll();

}

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

inho Myung said:
[MCU2_1] 78.562996 s: elapsed_time: 46, RD: 1470464, WR: 992896,, RD PEAK: 31966, WR PEAK: 21584

May I know at what rate of interval is this being called?

Could you send the full logs, (i.e. the good logs above and below this log)

May I also know about the application you are running (i.e. nodes and cores being used in the graph)

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

We call the following perf_periodic() function on application ( on A72 ) at 1 second intervals.

void perf_periodic(void)
{
    app_perf_stats_ddr_stats_t ddr_stats = {0,};
    appPerfStatsDdrStatsGet(&ddr_stats);
    appPerfStatsDdrStatsResetAll();

    app_perf_stats_ddr_stats_t* ddr_load = &ddr_stats;

    printf("DDR: READ  BW: AVG = %6d MB/s, PEAK = %6d MB/s",
           ddr_load->read_bw_avg,
           ddr_load->read_bw_peak);
    printf(" WRITE BW: AVG = %6d MB/s, PEAK = %6d MB/s",
           ddr_load->write_bw_avg,
           ddr_load->write_bw_peak);
    printf(" TOTAL BW: AVG = %6d MB/s, PEAK = %6d MB/s\n",
           ddr_load->read_bw_avg + ddr_load->write_bw_avg,
           ddr_load->write_bw_peak + ddr_load->read_bw_peak);
}

In appPerfStatsDddrStatsUpdate(), printf was only called to print logs when elapsed_time was 100 or less or on the 1000th call.

static int __cnt;
__cnt++;
if(__cnt>=1000) {
	__cnt=0;
}

if(elapsed_time<100 || __cnt==0) {
	printf("elapsed_time:%d, RD: %lld, WR: %lld,, RD PEAK: %d, WR PEAK: %d\n",elapsed_time,read_bytes,write_bytes,read_bw_peak,write_bw_peak);
}

When we don't run the application, the interval(elapsed_time) is a constant 1000us.
If we run the application, it will output <10us ~ 1000us as a non-constant interval.

[MCU2_1]    534.238634 s: elapsed_time:46, RD: 875520, WR: 388992,, RD PEAK: 19033, WR PEAK: 8456
[MCU2_1]    534.484371 s: elapsed_time:1000, RD: 5190784, WR: 1914880,, RD PEAK: 5190, WR PEAK: 1914
[MCU2_1]    534.628432 s: elapsed_time:59, RD: 357312, WR: 145024,, RD PEAK: 6056, WR PEAK: 2458
[MCU2_1]    534.634435 s: elapsed_time:50, RD: 249216, WR: 136704,, RD PEAK: 4984, WR PEAK: 2734
[MCU2_1]    534.799468 s: elapsed_time:66, RD: 309376, WR: 201792,, RD PEAK: 4687, WR PEAK: 3057
[MCU2_1]    534.802422 s: elapsed_time:50, RD: 150848, WR: 101120,, RD PEAK: 3016, WR PEAK: 2022
[MCU2_1]    535.026412 s: elapsed_time:85, RD: 254272, WR: 132672,, RD PEAK: 2991, WR PEAK: 1560
DDR: READ  BW: AVG =   3824 MB/s, PEAK =  19033 MB/s WRITE BW: AVG =   2257 MB/s, PEAK =   8456 MB/s TOTAL BW: AVG =   6081 MB/s, PEAK =  27489 MB/s
[MCU2_1]    535.242990 s: elapsed_time:48, RD: 3734080, WR: 975232,, RD PEAK: 77793, WR PEAK: 20317
[MCU2_1]    535.304418 s: elapsed_time:42, RD: 222528, WR: 59520,, RD PEAK: 5298, WR PEAK: 1417
[MCU2_1]    535.311405 s: elapsed_time:35, RD: 173376, WR: 6144,, RD PEAK: 4953, WR PEAK: 175
[MCU2_1]    535.456401 s: elapsed_time:1000, RD: 3360256, WR: 1922560,, RD PEAK: 3360, WR PEAK: 1922
[MCU2_1]    535.651410 s: elapsed_time:86, RD: 165120, WR: 37184,, RD PEAK: 1920, WR PEAK: 432
[MCU2_1]    536.122421 s: elapsed_time:49, RD: 204928, WR: 104256,, RD PEAK: 4182, WR PEAK: 2127
[MCU2_1]    536.130408 s: elapsed_time:31, RD: 42752, WR: 1728,, RD PEAK: 1379, WR PEAK: 55
DDR: READ  BW: AVG =   4018 MB/s, PEAK =  77793 MB/s WRITE BW: AVG =   2363 MB/s, PEAK =  20317 MB/s TOTAL BW: AVG =   6381 MB/s, PEAK =  98110 MB/s
[MCU2_1]    536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443
[MCU2_1]    536.432405 s: elapsed_time:1000, RD: 3121856, WR: 1816896,, RD PEAK: 3121, WR PEAK: 1816
[MCU2_1]    536.956418 s: elapsed_time:97, RD: 388480, WR: 74240,, RD PEAK: 4004, WR PEAK: 765
[MCU2_1]    537.057412 s: elapsed_time:76, RD: 169920, WR: 40128,, RD PEAK: 2235, WR PEAK: 528
[MCU2_1]    537.061412 s: elapsed_time:60, RD: 127104, WR: 16960,, RD PEAK: 2118, WR PEAK: 282
DDR: READ  BW: AVG =   3916 MB/s, PEAK =  35739 MB/s WRITE BW: AVG =   2348 MB/s, PEAK =  27443 MB/s TOTAL BW: AVG =   6264 MB/s, PEAK =  63182 MB/s
[MCU2_1]    537.242993 s: elapsed_time:54, RD: 1509120, WR: 266240,, RD PEAK: 27946, WR PEAK: 4930
[MCU2_1]    537.290407 s: elapsed_time:59, RD: 148864, WR: 67968,, RD PEAK: 2523, WR PEAK: 1152
[MCU2_1]    537.298431 s: elapsed_time:76, RD: 596480, WR: 57472,, RD PEAK: 7848, WR PEAK: 756
[MCU2_1]    537.408395 s: elapsed_time:1000, RD: 3045376, WR: 973696,, RD PEAK: 3045, WR PEAK: 973
[MCU2_1]    537.492427 s: elapsed_time:84, RD: 458496, WR: 354304,, RD PEAK: 5458, WR PEAK: 4217
[MCU2_1]    537.494459 s: elapsed_time:92, RD: 388608, WR: 252480,, RD PEAK: 4224, WR PEAK: 2744
[MCU2_1]    537.934427 s: elapsed_time:76, RD: 434368, WR: 290816,, RD PEAK: 5715, WR PEAK: 3826
DDR: READ  BW: AVG =   3893 MB/s, PEAK =  27946 MB/s WRITE BW: AVG =   2305 MB/s, PEAK =   6504 MB/s TOTAL BW: AVG =   6198 MB/s, PEAK =  34450 MB/s
[MCU2_1]    538.243349 s: elapsed_time:48, RD: 1440192, WR: 956224,, RD PEAK: 30004, WR PEAK: 19921
[MCU2_1]    538.311414 s: elapsed_time:56, RD: 181568, WR: 145728,, RD PEAK: 3242, WR PEAK: 2602
[MCU2_1]    538.379432 s: elapsed_time:1018, RD: 5685184, WR: 1938496,, RD PEAK: 5584, WR PEAK: 1904
[MCU2_1]    538.517391 s: elapsed_time:77, RD: 284032, WR: 104064,, RD PEAK: 3688, WR PEAK: 1351
DDR: READ  BW: AVG =   3812 MB/s, PEAK =  30004 MB/s WRITE BW: AVG =   2228 MB/s, PEAK =  19921 MB/s TOTAL BW: AVG =   6040 MB/s, PEAK =  49925 MB/s
[MCU2_1]    539.243741 s: elapsed_time:43, RD: 726400, WR: 434176,, RD PEAK: 16893, WR PEAK: 10097
[MCU2_1]    539.355377 s: elapsed_time:1000, RD: 7586688, WR: 743296,, RD PEAK: 7586, WR PEAK: 743
[MCU2_1]    539.396412 s: elapsed_time:25, RD: 39488, WR: 2368,, RD PEAK: 1579, WR PEAK: 94
[MCU2_1]    539.608410 s: elapsed_time:66, RD: 124672, WR: 9152,, RD PEAK: 1888, WR PEAK: 138
[MCU2_1]    540.040418 s: elapsed_time:64, RD: 265408, WR: 68544,, RD PEAK: 4147, WR PEAK: 1071
DDR: READ  BW: AVG =   3833 MB/s, PEAK =  16893 MB/s WRITE BW: AVG =   2240 MB/s, PEAK =  10097 MB/s TOTAL BW: AVG =   6073 MB/s, PEAK =  26990 MB/s
[MCU2_1]    540.248115 s: elapsed_time:49, RD: 3302400, WR: 1917568,, RD PEAK: 67395, WR PEAK: 39134
[MCU2_1]    540.281411 s: elapsed_time:43, RD: 185472, WR: 76608,, RD PEAK: 4313, WR PEAK: 1781
[MCU2_1]    540.324406 s: elapsed_time:1000, RD: 4516800, WR: 682752,, RD PEAK: 4516, WR PEAK: 682
[MCU2_1]    540.634413 s: elapsed_time:92, RD: 214592, WR: 62016,, RD PEAK: 2332, WR PEAK: 674
[MCU2_1]    540.638408 s: elapsed_time:67, RD: 102528, WR: 8832,, RD PEAK: 1530, WR PEAK: 131
[MCU2_1]    541.103426 s: elapsed_time:45, RD: 293120, WR: 49152,, RD PEAK: 6513, WR PEAK: 1092
DDR: READ  BW: AVG =   4028 MB/s, PEAK =  67395 MB/s WRITE BW: AVG =   2394 MB/s, PEAK =  39134 MB/s TOTAL BW: AVG =   6422 MB/s, PEAK = 106529 MB/s
[MCU2_1]    541.244210 s: elapsed_time:68, RD: 639936, WR: 619264,, RD PEAK: 9410, WR PEAK: 9106
[MCU2_1]    541.297395 s: elapsed_time:1000, RD: 2790656, WR: 125568,, RD PEAK: 2790, WR PEAK: 125
[MCU2_1]    541.454406 s: elapsed_time:67, RD: 205952, WR: 87488,, RD PEAK: 3073, WR PEAK: 1305
[MCU2_1]    541.459410 s: elapsed_time:87, RD: 183040, WR: 62528,, RD PEAK: 2103, WR PEAK: 718
[MCU2_1]    541.618428 s: elapsed_time:59, RD: 418176, WR: 111296,, RD PEAK: 7087, WR PEAK: 1886
[MCU2_1]    541.880415 s: elapsed_time:77, RD: 403968, WR: 42048,, RD PEAK: 5246, WR PEAK: 546
[MCU2_1]    542.081418 s: elapsed_time:82, RD: 248192, WR: 379712,, RD PEAK: 3026, WR PEAK: 4630
[MCU2_1]    542.087464 s: elapsed_time:38, RD: 145024, WR: 83584,, RD PEAK: 3816, WR PEAK: 2199
[MCU2_1]    542.090407 s: elapsed_time:52, RD: 72128, WR: 2816,, RD PEAK: 1387, WR PEAK: 54

Here are the graphs and nodes, cores we are using. (The name of node cannot be revealed )

GRAPH:        graph_136 (#nodes =   5,
 NODE:          A72-1:              
 NODE:          DSP-1:                
 NODE:      VPAC_LDC1:                
 NODE:      VPAC_LDC1:                
 NODE:          DSP-2:                

GRAPH:        graph_137 (#nodes =  12,
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:          A72-2:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:          A72-2:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:          A72-2:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:          A72-2:                

GRAPH:        graph_138 (#nodes =   1,
 NODE:          A72-3:                

GRAPH:        graph_139 (#nodes =  12,
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:       DSP_C7-1:                
 NODE:          A72-4:                
 NODE:          A72-5:                

GRAPH:        graph_140 (#nodes =   1,
 NODE:          DSP-1:          

GRAPH:        graph_141 (#nodes =   1,
 NODE:          A72-6:                

GRAPH:        graph_142 (#nodes =   1,
 NODE:          A72-8:                

GRAPH:        graph_143 (#nodes =  16,
 NODE:      VPAC_LDC1:                
 NODE:          DSP-1:                
 NODE:          DSP-1:               
 NODE:      VPAC_MSC1:             
 NODE:      DMPAC_DOF:              
 NODE:      DMPAC_DOF:                
 NODE:      VPAC_MSC1:             
 NODE:      DMPAC_DOF:              
 NODE:      DMPAC_DOF:                
 NODE:      VPAC_MSC1:             
 NODE:      DMPAC_DOF:              
 NODE:      DMPAC_DOF:                
 NODE:      VPAC_MSC1:             
 NODE:      DMPAC_DOF:              
 NODE:      DMPAC_DOF:                
 NODE:          A72-7:               

GRAPH:        graph_144 (#nodes =   1,
 NODE:         A72-12:             

GRAPH:        graph_145 (#nodes =   2,
 NODE:          A72-9:                
 NODE:       DISPLAY1:                

GRAPH:        graph_146 (#nodes =   1,
 NODE:          A72-1:                

GRAPH:        graph_147 (#nodes =   1,
 NODE:         A72-15:

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

Thank you for sharing the logs.

I see a pattern in the issue here.

The peak increases everytime you read from A72, i.e. the first log of MCU2_1 after the log from A72 as shown below

DDR: READ BW: AVG = 4018 MB/s, PEAK = 77793 MB/s WRITE BW: AVG = 2363 MB/s, PEAK = 20317 MB/s TOTAL BW: AVG = 6381 MB/s, PEAK = 98110 MB/s
[MCU2_1] 536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443

Just to confirm the same, could you share the same logs by not doing a reset to the DDR stats, after you have read from the application? i.e. do not call appPerfStatsDdrStatsResetAll()

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

While we can't share the logs right now, we can see that the abnormal peak values only occur immediately after calling appPerfStatsDdrStatsResetAll().
However, as you know, without calling reset, we can't measure the average and peak values over a certain period of time.

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

Itseems to be an issue with GTC. Could you modify the api reading the gtc timer as mentioned below

(54) PROCESSOR-SDK-J721S2: The GTC read is abnormal - Processors forum - Processors - TI E2E support forums

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

Hi,

Please let me know which api related to gtc timer should be modified.

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

The logic must be implemented in appLogGetGlobalTimeInUsec() API.

This logic ensures that the upper and lower bit of GTC are in sync.

Please try the same at your end.

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

HI,

I modified the code as follows, but the result is the same.
The peak bandwidth value is still too large.

#define GET_GTC_VALUE_LO32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0x8U))
#define GET_GTC_VALUE_HI32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0xCU))

uint64_t appLogGetGlobalTimeInUsec()
{
    uint64_t cur_ts = 0; /* Returning ts in usecs */

    if (((uintptr_t)NULL != GTC_BASE_ADDR) &&
        (0 != mhzFreq) )
    {
#if 1 // modified 
        uint32_t vct_lo, vct_hi, tmp_hi;
        uint64_t gtc_value64;

        do {
            vct_hi = GET_GTC_VALUE_HI32;
            vct_lo = GET_GTC_VALUE_LO32;
            tmp_hi = GET_GTC_VALUE_HI32;
        } while (vct_hi != tmp_hi);

        gtc_value64 = ((uint64_t) vct_hi << 32) | vct_lo;
        cur_ts = gtc_value64 / mhzFreq;
#else
        cur_ts = GET_GTC_VALUE64 / mhzFreq;
#endif
    }

    return cur_ts;
}

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

I believe the avg value of the DDR B/W is in specified range right?

The below thread discusses the same

(+) TDA4VL-Q1: peak DDR bandwidth calculation - Processors forum - Processors - TI E2E support forums

Please refer the above suggestion in this thread.

Regards,

Nikhil

0 inho Myung over 1 year ago in reply to Nikhil Dasan

Prodigy 10 points

HI Nikhil,
I read the thread you pointed me to, but I still don't understand why the peak bandwidth value can be greater than the h/w specification.
Should I consider that peak bandwidth value as invalid and ignore it?

0 Nikhil Dasan over 1 year ago in reply to inho Myung

TI__Guru* 84866 points

Hi,

Do you have a way for me to reproduce this issue at my end?

Do you see this only when you call the API continuously in a short interval, or do you see this even when called with a long interval in between?

Meanwhile, I would suggest taking the avg. values into consideration

Regards,

Nikhil