This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Questions about memory bandwidth

Part Number: TDA4VM


Hi TI,

I wrote a simple application to test datarate of different memories on TDA4VM SoC. This includes the following memories:

  1. 512 kB SRAM (Path: R5FSS -> Interconnect -> SRAM)
  2. 8MB SRAM with ECC (Path: R5FSS -> Interconnect -> NAVSS -> VirtSS -> NBSS -> MSMC)
  3. DDRSS (Path: R5FSS -> Interconnect -> NAVSS -> VirtSS -> NBSS -> MSMC -> DDRSS)

The relevant source code is the following:

void printSpeed(uint64_t timeInFlight, uint64_t size)
{
    double dataInMbyte = ((double)size / (1024 * 1024));
    double timeInSeconds = ((double)timeInFlight / (1 * 1000 * 1000 * 1000));
    printf("Speed in MBytes/s: %.3f\n",dataInMbyte / timeInSeconds);
}

void memTest(void* src, void* dst, uint64_t size)
{
    GTCModule gtc;
    uint64_t startTimeSet, stopTimeSet;
    uint64_t startTimeCopy, stopTimeCopy;

    startTimeSet = gtc.getTimeInNanoSec();
    memset(src, 0xAB, size);
    stopTimeSet = Singleton::GTCModule().getTimeInNanoSec();

    startTimeCopy = gtc.getTimeInNanoSec();
    memcpy(dst,src, size);
    stopTimeCopy = gtc.getTimeInNanoSec();

    perfPrintf("Memset ");
    printSpeed(stopTimeSet-startTimeSet,size);

    perfPrintf("Memcpy ");
    printSpeed(stopTimeCopy-startTimeCopy,size);
}


void main()
{
    uint64_t sizeDDR = 0x1000000;
    uint64_t sizeMSMC = 0x100000;
    uint64_t sizeSRAM = 0x10000;

    void *srcDDR_cached = (void *)0x81000000;
    void *dstDDR_cached = (void *)0x82000000;
    void *srcDDR_noncached = (void *)0x91000000;
    void *dstDDR_noncached = (void *)0x92000000;

    void *srcMSMC_cached = (void *)0x70200000;
    void *dstMSMC_cached = (void *)0x70300000;
    void *srcMSMC_noncached = (void *)0x70400000;
    void *dstMSMC_noncached = (void *)0x70500000;

    void *srcSRAM_noncached = (void *)0x3600000;
    void *dstSRAM_noncached = (void *)0x3610000;
    void *srcSRAM_cached = (void *)0x3640000;
    void *dstSRAM_cached = (void *)0x3650000;

    printf("\n*** Start Performance Tests ***\n");
    printf("\nDDR Cached:\n");
    memTest(srcDDR_cached,dstDDR_cached,sizeDDR);

    printf("\nDDR Non-Cached:\n");
    memTest(srcDDR_noncached,dstDDR_noncached,sizeDDR);

    printf("\nMSMC Cached:\n");
    memTest(srcMSMC_cached,dstMSMC_cached,sizeMSMC);

    printf("\nMSMC Non-Cached:\n");
    memTest(srcMSMC_noncached,dstMSMC_noncached,sizeMSMC);

    printf("\nSRAM Non-Cached:\n");
    memTest(srcSRAM_noncached,dstSRAM_noncached,sizeSRAM);

    printf("\nSRAM Cached:\n");
    memTest(srcSRAM_cached,dstSRAM_cached,sizeSRAM);

    printf("\n*** Tests finished ***\n");

    while(1);

}

The application is running on MCU2_0 (R5F Subsystem in Main Domain). Time measuring is done using GTC Module, which is clocked with 200 MHz, so I get a resolution of 5 ns. To configure corresponding memory sections as cached/uncached, I changed gCslR5MpuCfg accordingly. I am using PSDK RTOS v08.00.

These are the results:

*** Start Performance Tests ***

DDR Cached:
Memset Speed in MBytes/s: 198.476
Memcpy Speed in MBytes/s: 65.397

DDR Non-Cached:
Memset Speed in MBytes/s: 57.641
Memcpy Speed in MBytes/s: 23.441

MSMC Cached:
Memset Speed in MBytes/s: 309.797
Memcpy Speed in MBytes/s: 135.247

MSMC Non-Cached:
Memset Speed in MBytes/s: 52.850
Memcpy Speed in MBytes/s: 29.499

SRAM Non-Cached:
Memset Speed in MBytes/s: 108.184
Memcpy Speed in MBytes/s: 58.462

SRAM Cached:
Memset Speed in MBytes/s: 521.703
Memcpy Speed in MBytes/s: 211.492

*** Tests finished ***

I have got a few questions concerning those results:

Q1: Are those numbers plausible?

Q2: Why exactly is MSMC SRAM (cached as well as uncached) SO MUCH slower than the SRAM directly connected to interconnect? I understand, that the paths to each memory differs, and that the path to MSMC involves more subsystems like NavSS, NBSS, VirtSS. But I did not expect a factor of 2 in improvement.

Q3: Is there room for improving read/write time when accessing SRAM/DDR RAM from MCU2_0?

Thanks for your help and best regards,

Felix