This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
HI,
We are using PSDK 08.06.00.11.
appPerfStatsDdrStatsPrintAll() ( vision_apps/utils/perf_stats/src/app_perf_stats_api.c ) shows peak bandwidth too large, exceeding TDA4VM EVM hardware specification ( 14.9GB/s ).
DDR performance statistics,
===========================
DDR: READ BW: AVG = 7555 MB/s, PEAK = 29782 MB/s
DDR: WRITE BW: AVG = 8623 MB/s, PEAK = 33800 MB/s
DDR: TOTAL BW: AVG = 16178 MB/s, PEAK = 63582 MB/s
appPerfStatsDddrStatsUpdate() function that calculates the bandwidth looks like this
====
...
appPerfStatsDdrStatsReadCounters(&val0, &val1, &val2, &val3, false);
uint64_t write_bytes = val0 * APP_PERF_DDR_BURST_SIZE_BYTES;
uint64_t read_bytes = val1 * APP_PERF_DDR_BURST_SIZE_BYTES;
..
uint32_t read_bw_peak = read_bytes/elapsed_time; /* in MB/s */
uint32_t write_bw_peak = write_bytes/elapsed_time; /* in MB/s */
====
If we print the above variables when peak bandwidth is too large,
elapsed_time is small and DDR performace counter value is too large, so bandwidth is calculated too large.
Example)
elapsed_time : 49 us , read_bytes : 3302400
read_bw_peak = 67395 MB/s
In this case, should the DDR performace counter and peak values be considered normal?
According to TDA4VM EVM hardware specification (14.9GB/s), peak should be up to 735KB in 49 us, right?
Any ideas why the reported peak would exceed the theoretical limit of the device?
Is there a way to accurately measure the peak DDR bandwidth?
Hi,
May I know under what scenarios are you seeing this issue?
Is it while running any SDK out of the box demo?
Regards,
Nikhil
We are running our ADAS Applications running on ARM along with a TIDL which also uses the DSP/MMA to run.
Hi,
Could you make every variable involved here to uint64_t ?
Like, elapsed_time, read_bw_peak, write_bw_peak etc?
Regards,
Nikhil
I have tried making all variables involved in this as uint64_t, but the results are similar.
( elapsed_time is small and write_bytes value is too large )
...
result )
[MCU2_1] 78.562996 s: elapsed_time: 46, RD: 1470464, WR: 992896,, RD PEAK: 31966, WR PEAK: 21584
Hi,
Could you please do a reset using the API appPerfStatsDdrStatsResetAll() after every read?
Regards,
Nikhil
Hi,
[MCU2_1] 78.562996 s: elapsed_time: 46, RD: 1470464, WR: 992896,, RD PEAK: 31966, WR PEAK: 21584
May I know at what rate of interval is this being called?
Could you send the full logs, (i.e. the good logs above and below this log)
May I also know about the application you are running (i.e. nodes and cores being used in the graph)
Regards,
Nikhil
We call the following perf_periodic() function on application ( on A72 ) at 1 second intervals.
void perf_periodic(void) { app_perf_stats_ddr_stats_t ddr_stats = {0,}; appPerfStatsDdrStatsGet(&ddr_stats); appPerfStatsDdrStatsResetAll(); app_perf_stats_ddr_stats_t* ddr_load = &ddr_stats; printf("DDR: READ BW: AVG = %6d MB/s, PEAK = %6d MB/s", ddr_load->read_bw_avg, ddr_load->read_bw_peak); printf(" WRITE BW: AVG = %6d MB/s, PEAK = %6d MB/s", ddr_load->write_bw_avg, ddr_load->write_bw_peak); printf(" TOTAL BW: AVG = %6d MB/s, PEAK = %6d MB/s\n", ddr_load->read_bw_avg + ddr_load->write_bw_avg, ddr_load->write_bw_peak + ddr_load->read_bw_peak); }
In appPerfStatsDddrStatsUpdate(), printf was only called to print logs when elapsed_time was 100 or less or on the 1000th call.
static int __cnt; __cnt++; if(__cnt>=1000) { __cnt=0; } if(elapsed_time<100 || __cnt==0) { printf("elapsed_time:%d, RD: %lld, WR: %lld,, RD PEAK: %d, WR PEAK: %d\n",elapsed_time,read_bytes,write_bytes,read_bw_peak,write_bw_peak); }
When we don't run the application, the interval(elapsed_time) is a constant 1000us.
If we run the application, it will output <10us ~ 1000us as a non-constant interval.
[MCU2_1] 534.238634 s: elapsed_time:46, RD: 875520, WR: 388992,, RD PEAK: 19033, WR PEAK: 8456 [MCU2_1] 534.484371 s: elapsed_time:1000, RD: 5190784, WR: 1914880,, RD PEAK: 5190, WR PEAK: 1914 [MCU2_1] 534.628432 s: elapsed_time:59, RD: 357312, WR: 145024,, RD PEAK: 6056, WR PEAK: 2458 [MCU2_1] 534.634435 s: elapsed_time:50, RD: 249216, WR: 136704,, RD PEAK: 4984, WR PEAK: 2734 [MCU2_1] 534.799468 s: elapsed_time:66, RD: 309376, WR: 201792,, RD PEAK: 4687, WR PEAK: 3057 [MCU2_1] 534.802422 s: elapsed_time:50, RD: 150848, WR: 101120,, RD PEAK: 3016, WR PEAK: 2022 [MCU2_1] 535.026412 s: elapsed_time:85, RD: 254272, WR: 132672,, RD PEAK: 2991, WR PEAK: 1560 DDR: READ BW: AVG = 3824 MB/s, PEAK = 19033 MB/s WRITE BW: AVG = 2257 MB/s, PEAK = 8456 MB/s TOTAL BW: AVG = 6081 MB/s, PEAK = 27489 MB/s [MCU2_1] 535.242990 s: elapsed_time:48, RD: 3734080, WR: 975232,, RD PEAK: 77793, WR PEAK: 20317 [MCU2_1] 535.304418 s: elapsed_time:42, RD: 222528, WR: 59520,, RD PEAK: 5298, WR PEAK: 1417 [MCU2_1] 535.311405 s: elapsed_time:35, RD: 173376, WR: 6144,, RD PEAK: 4953, WR PEAK: 175 [MCU2_1] 535.456401 s: elapsed_time:1000, RD: 3360256, WR: 1922560,, RD PEAK: 3360, WR PEAK: 1922 [MCU2_1] 535.651410 s: elapsed_time:86, RD: 165120, WR: 37184,, RD PEAK: 1920, WR PEAK: 432 [MCU2_1] 536.122421 s: elapsed_time:49, RD: 204928, WR: 104256,, RD PEAK: 4182, WR PEAK: 2127 [MCU2_1] 536.130408 s: elapsed_time:31, RD: 42752, WR: 1728,, RD PEAK: 1379, WR PEAK: 55 DDR: READ BW: AVG = 4018 MB/s, PEAK = 77793 MB/s WRITE BW: AVG = 2363 MB/s, PEAK = 20317 MB/s TOTAL BW: AVG = 6381 MB/s, PEAK = 98110 MB/s [MCU2_1] 536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443 [MCU2_1] 536.432405 s: elapsed_time:1000, RD: 3121856, WR: 1816896,, RD PEAK: 3121, WR PEAK: 1816 [MCU2_1] 536.956418 s: elapsed_time:97, RD: 388480, WR: 74240,, RD PEAK: 4004, WR PEAK: 765 [MCU2_1] 537.057412 s: elapsed_time:76, RD: 169920, WR: 40128,, RD PEAK: 2235, WR PEAK: 528 [MCU2_1] 537.061412 s: elapsed_time:60, RD: 127104, WR: 16960,, RD PEAK: 2118, WR PEAK: 282 DDR: READ BW: AVG = 3916 MB/s, PEAK = 35739 MB/s WRITE BW: AVG = 2348 MB/s, PEAK = 27443 MB/s TOTAL BW: AVG = 6264 MB/s, PEAK = 63182 MB/s [MCU2_1] 537.242993 s: elapsed_time:54, RD: 1509120, WR: 266240,, RD PEAK: 27946, WR PEAK: 4930 [MCU2_1] 537.290407 s: elapsed_time:59, RD: 148864, WR: 67968,, RD PEAK: 2523, WR PEAK: 1152 [MCU2_1] 537.298431 s: elapsed_time:76, RD: 596480, WR: 57472,, RD PEAK: 7848, WR PEAK: 756 [MCU2_1] 537.408395 s: elapsed_time:1000, RD: 3045376, WR: 973696,, RD PEAK: 3045, WR PEAK: 973 [MCU2_1] 537.492427 s: elapsed_time:84, RD: 458496, WR: 354304,, RD PEAK: 5458, WR PEAK: 4217 [MCU2_1] 537.494459 s: elapsed_time:92, RD: 388608, WR: 252480,, RD PEAK: 4224, WR PEAK: 2744 [MCU2_1] 537.934427 s: elapsed_time:76, RD: 434368, WR: 290816,, RD PEAK: 5715, WR PEAK: 3826 DDR: READ BW: AVG = 3893 MB/s, PEAK = 27946 MB/s WRITE BW: AVG = 2305 MB/s, PEAK = 6504 MB/s TOTAL BW: AVG = 6198 MB/s, PEAK = 34450 MB/s [MCU2_1] 538.243349 s: elapsed_time:48, RD: 1440192, WR: 956224,, RD PEAK: 30004, WR PEAK: 19921 [MCU2_1] 538.311414 s: elapsed_time:56, RD: 181568, WR: 145728,, RD PEAK: 3242, WR PEAK: 2602 [MCU2_1] 538.379432 s: elapsed_time:1018, RD: 5685184, WR: 1938496,, RD PEAK: 5584, WR PEAK: 1904 [MCU2_1] 538.517391 s: elapsed_time:77, RD: 284032, WR: 104064,, RD PEAK: 3688, WR PEAK: 1351 DDR: READ BW: AVG = 3812 MB/s, PEAK = 30004 MB/s WRITE BW: AVG = 2228 MB/s, PEAK = 19921 MB/s TOTAL BW: AVG = 6040 MB/s, PEAK = 49925 MB/s [MCU2_1] 539.243741 s: elapsed_time:43, RD: 726400, WR: 434176,, RD PEAK: 16893, WR PEAK: 10097 [MCU2_1] 539.355377 s: elapsed_time:1000, RD: 7586688, WR: 743296,, RD PEAK: 7586, WR PEAK: 743 [MCU2_1] 539.396412 s: elapsed_time:25, RD: 39488, WR: 2368,, RD PEAK: 1579, WR PEAK: 94 [MCU2_1] 539.608410 s: elapsed_time:66, RD: 124672, WR: 9152,, RD PEAK: 1888, WR PEAK: 138 [MCU2_1] 540.040418 s: elapsed_time:64, RD: 265408, WR: 68544,, RD PEAK: 4147, WR PEAK: 1071 DDR: READ BW: AVG = 3833 MB/s, PEAK = 16893 MB/s WRITE BW: AVG = 2240 MB/s, PEAK = 10097 MB/s TOTAL BW: AVG = 6073 MB/s, PEAK = 26990 MB/s [MCU2_1] 540.248115 s: elapsed_time:49, RD: 3302400, WR: 1917568,, RD PEAK: 67395, WR PEAK: 39134 [MCU2_1] 540.281411 s: elapsed_time:43, RD: 185472, WR: 76608,, RD PEAK: 4313, WR PEAK: 1781 [MCU2_1] 540.324406 s: elapsed_time:1000, RD: 4516800, WR: 682752,, RD PEAK: 4516, WR PEAK: 682 [MCU2_1] 540.634413 s: elapsed_time:92, RD: 214592, WR: 62016,, RD PEAK: 2332, WR PEAK: 674 [MCU2_1] 540.638408 s: elapsed_time:67, RD: 102528, WR: 8832,, RD PEAK: 1530, WR PEAK: 131 [MCU2_1] 541.103426 s: elapsed_time:45, RD: 293120, WR: 49152,, RD PEAK: 6513, WR PEAK: 1092 DDR: READ BW: AVG = 4028 MB/s, PEAK = 67395 MB/s WRITE BW: AVG = 2394 MB/s, PEAK = 39134 MB/s TOTAL BW: AVG = 6422 MB/s, PEAK = 106529 MB/s [MCU2_1] 541.244210 s: elapsed_time:68, RD: 639936, WR: 619264,, RD PEAK: 9410, WR PEAK: 9106 [MCU2_1] 541.297395 s: elapsed_time:1000, RD: 2790656, WR: 125568,, RD PEAK: 2790, WR PEAK: 125 [MCU2_1] 541.454406 s: elapsed_time:67, RD: 205952, WR: 87488,, RD PEAK: 3073, WR PEAK: 1305 [MCU2_1] 541.459410 s: elapsed_time:87, RD: 183040, WR: 62528,, RD PEAK: 2103, WR PEAK: 718 [MCU2_1] 541.618428 s: elapsed_time:59, RD: 418176, WR: 111296,, RD PEAK: 7087, WR PEAK: 1886 [MCU2_1] 541.880415 s: elapsed_time:77, RD: 403968, WR: 42048,, RD PEAK: 5246, WR PEAK: 546 [MCU2_1] 542.081418 s: elapsed_time:82, RD: 248192, WR: 379712,, RD PEAK: 3026, WR PEAK: 4630 [MCU2_1] 542.087464 s: elapsed_time:38, RD: 145024, WR: 83584,, RD PEAK: 3816, WR PEAK: 2199 [MCU2_1] 542.090407 s: elapsed_time:52, RD: 72128, WR: 2816,, RD PEAK: 1387, WR PEAK: 54
Here are the graphs and nodes, cores we are using. (The name of node cannot be revealed )
GRAPH: graph_136 (#nodes = 5, NODE: A72-1: NODE: DSP-1: NODE: VPAC_LDC1: NODE: VPAC_LDC1: NODE: DSP-2: GRAPH: graph_137 (#nodes = 12, NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: A72-2: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: A72-2: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: A72-2: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: A72-2: GRAPH: graph_138 (#nodes = 1, NODE: A72-3: GRAPH: graph_139 (#nodes = 12, NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: DSP_C7-1: NODE: A72-4: NODE: A72-5: GRAPH: graph_140 (#nodes = 1, NODE: DSP-1: GRAPH: graph_141 (#nodes = 1, NODE: A72-6: GRAPH: graph_142 (#nodes = 1, NODE: A72-8: GRAPH: graph_143 (#nodes = 16, NODE: VPAC_LDC1: NODE: DSP-1: NODE: DSP-1: NODE: VPAC_MSC1: NODE: DMPAC_DOF: NODE: DMPAC_DOF: NODE: VPAC_MSC1: NODE: DMPAC_DOF: NODE: DMPAC_DOF: NODE: VPAC_MSC1: NODE: DMPAC_DOF: NODE: DMPAC_DOF: NODE: VPAC_MSC1: NODE: DMPAC_DOF: NODE: DMPAC_DOF: NODE: A72-7: GRAPH: graph_144 (#nodes = 1, NODE: A72-12: GRAPH: graph_145 (#nodes = 2, NODE: A72-9: NODE: DISPLAY1: GRAPH: graph_146 (#nodes = 1, NODE: A72-1: GRAPH: graph_147 (#nodes = 1, NODE: A72-15:
Hi,
Thank you for sharing the logs.
I see a pattern in the issue here.
The peak increases everytime you read from A72, i.e. the first log of MCU2_1 after the log from A72 as shown below
DDR: READ BW: AVG = 4018 MB/s, PEAK = 77793 MB/s WRITE BW: AVG = 2363 MB/s, PEAK = 20317 MB/s TOTAL BW: AVG = 6381 MB/s, PEAK = 98110 MB/s [MCU2_1] 536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443
Just to confirm the same, could you share the same logs by not doing a reset to the DDR stats, after you have read from the application? i.e. do not call appPerfStatsDdrStatsResetAll()
Regards,
Nikhil
While we can't share the logs right now, we can see that the abnormal peak values only occur immediately after calling appPerfStatsDdrStatsResetAll().
However, as you know, without calling reset, we can't measure the average and peak values over a certain period of time.
Hi,
Itseems to be an issue with GTC. Could you modify the api reading the gtc timer as mentioned below
Regards,
Nikhil
Hi,
The logic must be implemented in appLogGetGlobalTimeInUsec() API.
This logic ensures that the upper and lower bit of GTC are in sync.
Please try the same at your end.
Regards,
Nikhil
HI,
I modified the code as follows, but the result is the same.
The peak bandwidth value is still too large.
#define GET_GTC_VALUE_LO32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0x8U)) #define GET_GTC_VALUE_HI32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0xCU)) uint64_t appLogGetGlobalTimeInUsec() { uint64_t cur_ts = 0; /* Returning ts in usecs */ if (((uintptr_t)NULL != GTC_BASE_ADDR) && (0 != mhzFreq) ) { #if 1 // modified uint32_t vct_lo, vct_hi, tmp_hi; uint64_t gtc_value64; do { vct_hi = GET_GTC_VALUE_HI32; vct_lo = GET_GTC_VALUE_LO32; tmp_hi = GET_GTC_VALUE_HI32; } while (vct_hi != tmp_hi); gtc_value64 = ((uint64_t) vct_hi << 32) | vct_lo; cur_ts = gtc_value64 / mhzFreq; #else cur_ts = GET_GTC_VALUE64 / mhzFreq; #endif } return cur_ts; }
Hi,
I believe the avg value of the DDR B/W is in specified range right?
The below thread discusses the same
Please refer the above suggestion in this thread.
Regards,
Nikhil
HI Nikhil,
I read the thread you pointed me to, but I still don't understand why the peak bandwidth value can be greater than the h/w specification.
Should I consider that peak bandwidth value as invalid and ignore it?
Hi,
Do you have a way for me to reproduce this issue at my end?
Do you see this only when you call the API continuously in a short interval, or do you see this even when called with a long interval in between?
Meanwhile, I would suggest taking the avg. values into consideration
Regards,
Nikhil