This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Peak DDR Bandwidth Calculation and DDR Performance Counter.

Part Number: TDA4VM

HI,

We are using PSDK 08.06.00.11.

appPerfStatsDdrStatsPrintAll() ( vision_apps/utils/perf_stats/src/app_perf_stats_api.c ) shows peak bandwidth too large, exceeding TDA4VM EVM hardware specification ( 14.9GB/s ).

DDR performance statistics,
===========================
DDR: READ BW: AVG = 7555 MB/s, PEAK = 29782 MB/s
DDR: WRITE BW: AVG = 8623 MB/s, PEAK = 33800 MB/s
DDR: TOTAL BW: AVG = 16178 MB/s, PEAK = 63582 MB/s

appPerfStatsDddrStatsUpdate() function that calculates the bandwidth looks like this

====

...

appPerfStatsDdrStatsReadCounters(&val0, &val1, &val2, &val3, false);

uint64_t write_bytes = val0 * APP_PERF_DDR_BURST_SIZE_BYTES;
uint64_t read_bytes = val1 * APP_PERF_DDR_BURST_SIZE_BYTES;
..

uint32_t read_bw_peak = read_bytes/elapsed_time; /* in MB/s */
uint32_t write_bw_peak = write_bytes/elapsed_time; /* in MB/s */

====

If we print the above variables when peak bandwidth is too large,
elapsed_time is small and DDR performace counter value is too large, so bandwidth is calculated too large.

Example)
elapsed_time : 49 us , read_bytes : 3302400
read_bw_peak = 67395 MB/s

In this case, should the DDR performace counter and peak values be considered normal?

According to TDA4VM EVM hardware specification (14.9GB/s), peak should be up to 735KB in 49 us, right?

Any ideas why the reported peak would exceed the theoretical limit of the device?

Is there a way to accurately measure the peak DDR bandwidth?

  • Hi,

    May I know under what scenarios are you seeing this issue? 

    Is it while running any SDK out of the box demo?

    Regards,

    Nikhil

  • We are running our ADAS Applications running on ARM along with a TIDL which also uses the DSP/MMA to run.

  • Hi,

    Let me check this internally and get back to you

    Regards,

    Nikhil

  • Hi,

    Could you make every variable involved here to uint64_t ? 

    Like, elapsed_time, read_bw_peak, write_bw_peak etc?

    Regards,

    Nikhil

  • I have tried making all variables involved in this as uint64_t, but the results are similar.
    elapsed_time is small and write_bytes value is too large )

    uint64_t elapsed_time;
    ...
    uint64_t read_bw_peak = read_bytes/elapsed_time; /* in MB/s */
    uint64_t write_bw_peak = write_bytes/elapsed_time; /* in MB/s */

    ...

    printf("elapsed_time:%4lld, RD: %8lld, WR: %8lld,, RD PEAK: %6lld, WR PEAK: %6lld\n",elapsed_time,read_bytes,write_bytes,read_bw_peak,write_bw_peak);



    result )

    [MCU2_1]     78.562996 s: elapsed_time:  46, RD:  1470464, WR:   992896,, RD PEAK:  31966, WR PEAK:  21584

  • Hi,

    Could you please do a reset using the API appPerfStatsDdrStatsResetAll() after every read?

    Regards,

    Nikhil


  • The above results are the result of calling appPerfStatsDdrStatsPrintAll(), which already calls appPerfStatsDdrStatsResetAll() at the end.

    appPerfStatsDdrStatsPrintAll() ( vision_apps/utils/perf_stats/src/app_perf_stats_api.c ) 

    int32_t
    appPerfStatsPrintAll()
    {
        appPerfStatsCpuLoadPrintAll();
        appPerfStatsHwaLoadPrintAll();
        appPerfStatsDdrStatsPrintAll();
        appPerfStatsCpuStatsPrintAll();
        appPerfStatsResetAll();
        return 0;
    }

    void appPerfStatsResetAll()
    {
        appPerfStatsCpuLoadResetAll();
        appPerfStatsHwaLoadResetAll();
        appPerfStatsDdrStatsResetAll();
    }
  • Hi,

    [MCU2_1]     78.562996 s: elapsed_time:  46, RD:  1470464, WR:   992896,, RD PEAK:  31966, WR PEAK:  21584

    May I know at what rate of interval is this being called?

    Could you send the full logs, (i.e. the good logs above and below this log) 

    May I also know about the application you are running (i.e. nodes and cores being used in the graph)

    Regards,

    Nikhil

  • We call the following perf_periodic() function on application ( on A72 ) at 1 second intervals.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    void perf_periodic(void)
    {
    app_perf_stats_ddr_stats_t ddr_stats = {0,};
    appPerfStatsDdrStatsGet(&ddr_stats);
    appPerfStatsDdrStatsResetAll();
    app_perf_stats_ddr_stats_t* ddr_load = &ddr_stats;
    printf("DDR: READ BW: AVG = %6d MB/s, PEAK = %6d MB/s",
    ddr_load->read_bw_avg,
    ddr_load->read_bw_peak);
    printf(" WRITE BW: AVG = %6d MB/s, PEAK = %6d MB/s",
    ddr_load->write_bw_avg,
    ddr_load->write_bw_peak);
    printf(" TOTAL BW: AVG = %6d MB/s, PEAK = %6d MB/s\n",
    ddr_load->read_bw_avg + ddr_load->write_bw_avg,
    ddr_load->write_bw_peak + ddr_load->read_bw_peak);
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    In appPerfStatsDddrStatsUpdate(), printf was only called to print logs when elapsed_time was 100 or less or on the 1000th call.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    static int __cnt;
    __cnt++;
    if(__cnt>=1000) {
    __cnt=0;
    }
    if(elapsed_time<100 || __cnt==0) {
    printf("elapsed_time:%d, RD: %lld, WR: %lld,, RD PEAK: %d, WR PEAK: %d\n",elapsed_time,read_bytes,write_bytes,read_bw_peak,write_bw_peak);
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    When we don't run the application, the interval(elapsed_time) is a constant 1000us.
    If we run the application, it will output <10us ~ 1000us as a non-constant interval.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [MCU2_1] 534.238634 s: elapsed_time:46, RD: 875520, WR: 388992,, RD PEAK: 19033, WR PEAK: 8456
    [MCU2_1] 534.484371 s: elapsed_time:1000, RD: 5190784, WR: 1914880,, RD PEAK: 5190, WR PEAK: 1914
    [MCU2_1] 534.628432 s: elapsed_time:59, RD: 357312, WR: 145024,, RD PEAK: 6056, WR PEAK: 2458
    [MCU2_1] 534.634435 s: elapsed_time:50, RD: 249216, WR: 136704,, RD PEAK: 4984, WR PEAK: 2734
    [MCU2_1] 534.799468 s: elapsed_time:66, RD: 309376, WR: 201792,, RD PEAK: 4687, WR PEAK: 3057
    [MCU2_1] 534.802422 s: elapsed_time:50, RD: 150848, WR: 101120,, RD PEAK: 3016, WR PEAK: 2022
    [MCU2_1] 535.026412 s: elapsed_time:85, RD: 254272, WR: 132672,, RD PEAK: 2991, WR PEAK: 1560
    DDR: READ BW: AVG = 3824 MB/s, PEAK = 19033 MB/s WRITE BW: AVG = 2257 MB/s, PEAK = 8456 MB/s TOTAL BW: AVG = 6081 MB/s, PEAK = 27489 MB/s
    [MCU2_1] 535.242990 s: elapsed_time:48, RD: 3734080, WR: 975232,, RD PEAK: 77793, WR PEAK: 20317
    [MCU2_1] 535.304418 s: elapsed_time:42, RD: 222528, WR: 59520,, RD PEAK: 5298, WR PEAK: 1417
    [MCU2_1] 535.311405 s: elapsed_time:35, RD: 173376, WR: 6144,, RD PEAK: 4953, WR PEAK: 175
    [MCU2_1] 535.456401 s: elapsed_time:1000, RD: 3360256, WR: 1922560,, RD PEAK: 3360, WR PEAK: 1922
    [MCU2_1] 535.651410 s: elapsed_time:86, RD: 165120, WR: 37184,, RD PEAK: 1920, WR PEAK: 432
    [MCU2_1] 536.122421 s: elapsed_time:49, RD: 204928, WR: 104256,, RD PEAK: 4182, WR PEAK: 2127
    [MCU2_1] 536.130408 s: elapsed_time:31, RD: 42752, WR: 1728,, RD PEAK: 1379, WR PEAK: 55
    DDR: READ BW: AVG = 4018 MB/s, PEAK = 77793 MB/s WRITE BW: AVG = 2363 MB/s, PEAK = 20317 MB/s TOTAL BW: AVG = 6381 MB/s, PEAK = 98110 MB/s
    [MCU2_1] 536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443
    [MCU2_1] 536.432405 s: elapsed_time:1000, RD: 3121856, WR: 1816896,, RD PEAK: 3121, WR PEAK: 1816
    [MCU2_1] 536.956418 s: elapsed_time:97, RD: 388480, WR: 74240,, RD PEAK: 4004, WR PEAK: 765
    [MCU2_1] 537.057412 s: elapsed_time:76, RD: 169920, WR: 40128,, RD PEAK: 2235, WR PEAK: 528
    [MCU2_1] 537.061412 s: elapsed_time:60, RD: 127104, WR: 16960,, RD PEAK: 2118, WR PEAK: 282
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Here are the graphs and nodes, cores we are using. (The name of node cannot be revealed )

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    GRAPH: graph_136 (#nodes = 5,
    NODE: A72-1:
    NODE: DSP-1:
    NODE: VPAC_LDC1:
    NODE: VPAC_LDC1:
    NODE: DSP-2:
    GRAPH: graph_137 (#nodes = 12,
    NODE: DSP_C7-1:
    NODE: DSP_C7-1:
    NODE: A72-2:
    NODE: DSP_C7-1:
    NODE: DSP_C7-1:
    NODE: A72-2:
    NODE: DSP_C7-1:
    NODE: DSP_C7-1:
    NODE: A72-2:
    NODE: DSP_C7-1:
    NODE: DSP_C7-1:
    NODE: A72-2:
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    Thank you for sharing the logs.

    I see a pattern in the issue here.

    The peak increases everytime you read from A72, i.e. the first log of MCU2_1 after the log from A72 as shown below

    Fullscreen
    1
    2
    DDR: READ BW: AVG = 4018 MB/s, PEAK = 77793 MB/s WRITE BW: AVG = 2363 MB/s, PEAK = 20317 MB/s TOTAL BW: AVG = 6381 MB/s, PEAK = 98110 MB/s
    [MCU2_1] 536.241273 s: elapsed_time:52, RD: 1858432, WR: 1427072,, RD PEAK: 35739, WR PEAK: 27443
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Just to confirm the same, could you share the same logs by not doing a reset to the DDR stats, after you have read from the application? i.e. do not call appPerfStatsDdrStatsResetAll()

    Regards,

    Nikhil

  • While we can't share the logs right now, we can see that the abnormal peak values only occur immediately after calling appPerfStatsDdrStatsResetAll().
    However, as you know, without calling reset, we can't measure the average and peak values over a certain period of time.

  • Hi,

    Itseems to be an issue with GTC. Could you modify the api reading the gtc timer as mentioned below 

    (54) PROCESSOR-SDK-J721S2: The GTC read is abnormal - Processors forum - Processors - TI E2E support forums

    Regards,

    Nikhil

  • Hi,

    Please let me know which api related to gtc timer should be modified.

  • Hi,

    The logic must be implemented in appLogGetGlobalTimeInUsec() API.

    This logic ensures that the upper and lower bit of GTC are in sync.

    Please try the same at your end.

    Regards,

    Nikhil

  • HI,

    I modified the code as follows, but the result is the same.
    The peak bandwidth value is still too large.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    #define GET_GTC_VALUE_LO32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0x8U))
    #define GET_GTC_VALUE_HI32 (*(volatile uint32_t*)(GTC_BASE_ADDR + 0xCU))
    uint64_t appLogGetGlobalTimeInUsec()
    {
    uint64_t cur_ts = 0; /* Returning ts in usecs */
    if (((uintptr_t)NULL != GTC_BASE_ADDR) &&
    (0 != mhzFreq) )
    {
    #if 1 // modified
    uint32_t vct_lo, vct_hi, tmp_hi;
    uint64_t gtc_value64;
    do {
    vct_hi = GET_GTC_VALUE_HI32;
    vct_lo = GET_GTC_VALUE_LO32;
    tmp_hi = GET_GTC_VALUE_HI32;
    } while (vct_hi != tmp_hi);
    gtc_value64 = ((uint64_t) vct_hi << 32) | vct_lo;
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    I believe the avg value of the DDR B/W is in specified range right?

    The below thread discusses the same

    (+) TDA4VL-Q1: peak DDR bandwidth calculation - Processors forum - Processors - TI E2E support forums

    Please refer the above suggestion in this thread.

    Regards,

    Nikhil

  • HI Nikhil,
    I read the thread you pointed me to, but I still don't understand why the peak bandwidth value can be greater than the h/w specification.
    Should I consider that peak bandwidth value as invalid and ignore it?

  • Hi,

    Do you have a way for me to reproduce this issue at my end? 

    Do you see this only when you call the API continuously in a short interval, or do you see this even when called with a long interval in between?

    Meanwhile, I would suggest taking the avg. values into consideration

    Regards,

    Nikhil