This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to measure memory bandwidth usage on an EVMK2H?

Other Parts Discussed in Thread: 66AK2H14

I would like to measure the operational intensity (= #operations / # DRAM bytes transferred) of a piece of code running on the DSP cores of an EVMK2H evaluation board.  For this, I need to now how many bytes are read and written from/to memory, e.g., like this:

start_DRAM_counting();

do_work();

stop_DRAM_counting();

long DRAM_bytes_transferred = DRAM_bytes_read() + DRAM_bytes_written();

or

long DRAM_bytes_transferred = - DRAM_bytes_read() - DRAM_bytes_written();

do_work();

DRAM_bytes_transferred += DRAM_bytes_read() + DRAM_bytes_written();

.How can I measure the memory bandwidth usage?  Is there example code that shows how it works?

Thanks in Advance,   John

  • Hi John,

    one way do do that would be use the DDR3 controller.

    It provides a set of performance counter registers which can be used

    to monitor or calculate the bandwidth and efficiency of the DDR traffic. The counters

    can be configured to count events such as total number of SDAM accesses, SDRAM

    activates, reads, write and so on. The Performance Counter 1 and 2 Registers

    (PERF_CNT_1 and PERF_CNT_2) act as two 32-bit counters that are able to count

    events independent of each other.

    Please see the User's Guide for more information.

    Kind regards,

    one and zero

  • I tried, but I get weird results. Even reading back the config register yields random values:

    if (__core_num() == 7) {
    hEmif->PERF_CNT_CFG = 0x00010002;

    for (unsigned i = 0; i < 3; i ++)
    printf("config: %08X\n", hEmif->PERF_CNT_CFG);
    }

    results in:

    [core 7] config: 0000001E
    [core 7] config: 21010088
    [core 7] config: 21010088

    Am I missing something? Or does it interfere with the OpenCL runtime, from which I spawn DSP threads?

    Thanks in Advance, John
  • Hi,

    these counters are directly within the DDR3 IP block, so it's not dependent on whatever SW you run.
    However one thing that might spoil your result is the printf. The printf execution can take up a significant amount of time/cycles.

    Kind regards,
    one and zero
  • Actually, I am not printing a counter, but the configuration register itself: it does not read back what I just wrote.

    I tried printing the counters as well, but their values looked pretty random as well.

    Thanks, John
  • ... try reading the counter in your loop and store the result in an array.
    Do the printf outside the loop.

    Kind regards,
    one and zero
  • It still does not read back the value that I just wrote to the configuration register:

    if (__core_num() == 7) {
    uint32_t values[5];

    hEmif->PERF_CNT_CFG = 0x00010002;

    for (unsigned i = 0; i < 5; i ++)
    values[i] = hEmif->PERF_CNT_CFG;

    for (unsigned i = 0; i < 5; i ++)
    printf("config: %08X\n", values[i]);

    printf("hEmif->PERF_CNT_CFG is at 0x%p\n", &hEmif->PERF_CNT_CFG);
    }

    yields:

    [core 7] config: 0C042180
    [core 7] config: 0C042180
    [core 7] config: 0C042180
    [core 7] config: 0C042180
    [core 7] config: 0C042180
    [core 7] hEmif->PERF_CNT_CFG is at 0x21010088

    Any clue?

    Thanks, John
  • Hi John,

    the config register has reserved bit fields so you won't necessarily read back the same value.
    Please try reading the counter register and see if the values now make more sense.

    kind regards,
    one and zero
  • I know, but even the nonreserved fields of the configuration register do not read back the bits that I wrote.
    I also tried reading hEmif->PERF_CNT_1, but it returns 0x0C042180 independent of the number of bytes I copy in between.

    Can I somehow verify that I am really talking to the DDR3 controller? Is there a register that should always read some predefined magic number or so? The base address (0x21010000) looks ok (according to the 66AK2H14 manual) but this does not feel like I am talking to the DDR3 controller.

    Thanks, John
  • You could try profiling your code. Among other things, the profiler gives detailed information about memory read/write on different sections (L1, L2, L3, DDR, etc). It has been a few years since I last used a profiler for C6000, but I remember it was quite accurate.
  • I just found that there is no DDR3 memory controller at 0x2101000 (as mentioned in the 66AK2H14 manual), but there is one at 0x2102000. Reading the value from address 0x2102000 yields 0x40463401, and this looks like a Module ID and Revision Register. The value returned from address 0x2101000 is different every time I try and does not look like an MIDR.

    Starting from offset 0x2102000, reading from PERF_CNT_TIM makes sense (it continuously increases, so looks like a clock), but after setting PERF_CNT_CFG to 0x00020003, PERF_CNT_1 and PERF_CNT_2 still hardly increase after copying large amounts of data. Can anybody tell me if this is the right DRAM controller? Should I do something else?

    Thanks, John
  • Apparently, DDR3A is remapped to 1:2101:0000 (not even in the 32-bit address range of the DSP!) After mapping this area backinto the 32-bit address space of the DSP (by adding an XMC MPAX entry), I can now finally use the performance counters.

    Please fix the documentation! From SPRS866E, table 6-1, part 11/13 (page 91, bottom entry), it is absolutely not clear that DDR3A can be remapped!