I have the following HW configuration: DSP is TMS320C6726B and SDRAM: Micron SDR SDRAM, MT48LC4M16A2 – 1 Meg x 16 x 4 Banks, Cycle time 7ns
SYSCLK1 is running at 266MHz and SYSCLK3 for EMIF is 133MHz. All the program and data are located on DSP on-chip memory except an array with the size of 6912 words.
A vector of 144 words on-chip memory and 6912 words on SDRAM. Profile the processing time of the following code
#pragma DATA_SECTION (array, ".sdramsect") float array[6912];
float vector[144];
for(i=0; i<48; i++)
{
r = 0;
for(j=0; j<144; j++)
{
r += vector[j] * array[i*48+j];
} output[i] = r;
}
When both vector and array are stored on-chip memory the processing time is around 50us.
When put array to SDRAM (access time 7ns) the processing time jumps to 1300ms!
All elements is array[6912] reside in SDRAM sequentially. No READ - WRITE mixed operations in profiling.
Probe EMIF: RAS, CAS and WE, EM_BA[0:1], etc and found the commands are the following
ACTIVE -> NOP -> READ -> NOP -> BURST TERMINATE -> ... READ -> NOP -> BURST TERMINATE then repeat in 182ns cycle
The main reasons of the slow access are, most of time, DSP only access ONE word and wait extra time to access the next word.
Q1: The setup in EMIF registers is for continuous page burst (full page READ). It should read those words in the same row together if possible. Why only read ONE word each time? EMIF_SDTIMR = 0x69114910; EMIF_SDSRETR = 0x00000008; EMIF_SDRCR = 0x0000081E; EMIF_SDCR = 0x00004520;
Q2: Why each READ is separated by 182ns? This makes the time to read all 6912 words * 180 ns = 1.244ms! Thanks.