This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Significance of printf in memory throughput test

Other Parts Discussed in Thread: SYSBIOS

Hi,

I'm working on finding out the memory throughput of writing to and reading from DDR3 on my C6670 board. I have an application (written based on Example 2 tasking application in BIOS MCSDK2.0 User Guide)  loaded onto the four cores, but with a slight variation such that each core writes to or read from a different part of the DDR3 memory. The application carries out different tests, each writing or reading different amount of data in one write/read, and then work out the memory throughput achieved by each core.

What is bothering me now is that a line of printf (or without) around the start of my application, within the task, seems to have significant impact on the memory throughput measurement of my first test (subsequent tests that follow are more like what I expect). For example without that line of printf, the memory throughput suffers in the first test somehow. This happens consistently, including after a power cycle.

Can anyone explain how this printf manages to do this? Initially I thought the processor needs some time to settle down before it can do perform some credible measurements. But putting seconds of delay before my first test does not seem to change anything.

This aside, is there a way to bring the four cores back into step before running subsequent test? There is something called osWaitForAllCores() that my colleague used on Freescale. I wonder if there is something similar on TI?

Thank you in advance.

Regards,

Chiang

  • Firstly, there is printf() and System_printf().  I'm not sure what printf() might do outside of CCS, but in CCS, it works with breakpoints -- and that should theoretically provide almost no overhead or effect on the program.  System_printf() is the "normal" printf in that it writes to a buffer that gets printed out later.  Using System_printf() might reduce overhead.

    Also, I myself noted that even though printf() doesn't necessarily affect the results of the code, in my particular experience, it can temporarily use a lot of resources -- but it shouldn't have a lasting effect (e.g.  multiple tests, but it shouldn't affect the first one much either)

     

    I myself have written a core-synchronization code in software that I've found pretty useful (it's not intended to be very high performance, but it does synchronize cores). It uses BIOS and IPC -- but if you cannot use them, then IPC's MultiProc could be replaced by any sort of core-identifier.. although BIOS's Cache routines would need to be replaced with writing into the cache-invalidation/writeback registers (which CSL could certainly help with)

     


    #include <ti/ipc/MultiProc.h>

    #include <ti/sysbios/hal/Cache.h>

    // As a global variable

     

    #pragma DATA_SECTION(syncLine, ".myMSMC")

    int syncLine[8] = {0,0,0,0, 0,0,0,0};


    void barrierSync() {

    int clusterSize = MultiProc_getNumProcessors();

    int clusterBaseId = 0;

    int selfId = MultiProc_self();

    int i;

    Cache_inv(syncLine, sizeof(int)*8, Cache_Type_ALL, TRUE);

    for(i = 0; i < clusterSize; ) {

    if (i == (selfId - clusterBaseId)) {

    syncLine[i] = 1;

    Cache_wbInv(syncLine, sizeof(int)*8, Cache_Type_ALL, TRUE);

    ++i;

    }


    if (syncLine[i] == 1) {

    ++i;

    }

    Cache_inv(syncLine, sizeof(int)*8, Cache_Type_ALL, TRUE);

    }

    for(i = 0; i < clusterSize; ) {

    if (i == (selfId - clusterBaseId)) {

    syncLine[i] = 0;

    Cache_wbInv(syncLine, sizeof(int)*8, Cache_Type_ALL, TRUE);

    ++i;

    }


    if (syncLine[i] == 0) {

    ++i;

    }

    Cache_inv(syncLine, sizeof(int)*8, Cache_Type_ALL, TRUE);

    }

    }

     

     

    And then put in the linker file (create a file named "lnk.cmd")

    SECTIONS

    {

    .myMSMC > MSMCSRAM

    }

  • Thanks Tim. The code is really useful. Think I'll give up trying to understand the weird behaviour I saw with this printf. Synchronising the cores first before carrying out my tests is the way forward.

    cheers,

    Chiang