Hi TI,
I wrote a simple application to test datarate of different memories on TDA4VM SoC. This includes the following memories:
- 512 kB SRAM (Path: R5FSS -> Interconnect -> SRAM)
- 8MB SRAM with ECC (Path: R5FSS -> Interconnect -> NAVSS -> VirtSS -> NBSS -> MSMC)
- DDRSS (Path: R5FSS -> Interconnect -> NAVSS -> VirtSS -> NBSS -> MSMC -> DDRSS)
The relevant source code is the following:
void printSpeed(uint64_t timeInFlight, uint64_t size)
{
double dataInMbyte = ((double)size / (1024 * 1024));
double timeInSeconds = ((double)timeInFlight / (1 * 1000 * 1000 * 1000));
printf("Speed in MBytes/s: %.3f\n",dataInMbyte / timeInSeconds);
}
void memTest(void* src, void* dst, uint64_t size)
{
GTCModule gtc;
uint64_t startTimeSet, stopTimeSet;
uint64_t startTimeCopy, stopTimeCopy;
startTimeSet = gtc.getTimeInNanoSec();
memset(src, 0xAB, size);
stopTimeSet = Singleton::GTCModule().getTimeInNanoSec();
startTimeCopy = gtc.getTimeInNanoSec();
memcpy(dst,src, size);
stopTimeCopy = gtc.getTimeInNanoSec();
perfPrintf("Memset ");
printSpeed(stopTimeSet-startTimeSet,size);
perfPrintf("Memcpy ");
printSpeed(stopTimeCopy-startTimeCopy,size);
}
void main()
{
uint64_t sizeDDR = 0x1000000;
uint64_t sizeMSMC = 0x100000;
uint64_t sizeSRAM = 0x10000;
void *srcDDR_cached = (void *)0x81000000;
void *dstDDR_cached = (void *)0x82000000;
void *srcDDR_noncached = (void *)0x91000000;
void *dstDDR_noncached = (void *)0x92000000;
void *srcMSMC_cached = (void *)0x70200000;
void *dstMSMC_cached = (void *)0x70300000;
void *srcMSMC_noncached = (void *)0x70400000;
void *dstMSMC_noncached = (void *)0x70500000;
void *srcSRAM_noncached = (void *)0x3600000;
void *dstSRAM_noncached = (void *)0x3610000;
void *srcSRAM_cached = (void *)0x3640000;
void *dstSRAM_cached = (void *)0x3650000;
printf("\n*** Start Performance Tests ***\n");
printf("\nDDR Cached:\n");
memTest(srcDDR_cached,dstDDR_cached,sizeDDR);
printf("\nDDR Non-Cached:\n");
memTest(srcDDR_noncached,dstDDR_noncached,sizeDDR);
printf("\nMSMC Cached:\n");
memTest(srcMSMC_cached,dstMSMC_cached,sizeMSMC);
printf("\nMSMC Non-Cached:\n");
memTest(srcMSMC_noncached,dstMSMC_noncached,sizeMSMC);
printf("\nSRAM Non-Cached:\n");
memTest(srcSRAM_noncached,dstSRAM_noncached,sizeSRAM);
printf("\nSRAM Cached:\n");
memTest(srcSRAM_cached,dstSRAM_cached,sizeSRAM);
printf("\n*** Tests finished ***\n");
while(1);
}
The application is running on MCU2_0 (R5F Subsystem in Main Domain). Time measuring is done using GTC Module, which is clocked with 200 MHz, so I get a resolution of 5 ns. To configure corresponding memory sections as cached/uncached, I changed gCslR5MpuCfg accordingly. I am using PSDK RTOS v08.00.
These are the results:
*** Start Performance Tests ***
DDR Cached:
Memset Speed in MBytes/s: 198.476
Memcpy Speed in MBytes/s: 65.397
DDR Non-Cached:
Memset Speed in MBytes/s: 57.641
Memcpy Speed in MBytes/s: 23.441
MSMC Cached:
Memset Speed in MBytes/s: 309.797
Memcpy Speed in MBytes/s: 135.247
MSMC Non-Cached:
Memset Speed in MBytes/s: 52.850
Memcpy Speed in MBytes/s: 29.499
SRAM Non-Cached:
Memset Speed in MBytes/s: 108.184
Memcpy Speed in MBytes/s: 58.462
SRAM Cached:
Memset Speed in MBytes/s: 521.703
Memcpy Speed in MBytes/s: 211.492
*** Tests finished ***
I have got a few questions concerning those results:
Q1: Are those numbers plausible?
Q2: Why exactly is MSMC SRAM (cached as well as uncached) SO MUCH slower than the SRAM directly connected to interconnect? I understand, that the paths to each memory differs, and that the path to MSMC involves more subsystems like NavSS, NBSS, VirtSS. But I did not expect a factor of 2 in improvement.
Q3: Is there room for improving read/write time when accessing SRAM/DDR RAM from MCU2_0?
Thanks for your help and best regards,
Felix