This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SDRAM access appears way too slow

Other Parts Discussed in Thread: TMS320C6726B

I have the following HW configuration: DSP is TMS320C6726B and SDRAM: Micron SDR SDRAM, MT48LC4M16A2 – 1 Meg x 16 x 4 Banks, Cycle time 7ns

SYSCLK1 is running at 266MHz and SYSCLK3 for EMIF is 133MHz. All the program and data are located on DSP on-chip memory except an array with the size of 6912 words.

A vector of 144 words on-chip memory and 6912 words on SDRAM. Profile the processing time of the following code

#pragma DATA_SECTION (array, ".sdramsect") float array[6912];

float vector[144];

for(i=0; i<48; i++)

{  

r = 0;

 for(j=0; j<144; j++)

 {

  r += vector[j] * array[i*48+j];

 }  output[i] = r;

}

When both vector and array are stored on-chip memory the processing time is around 50us.

When put array to SDRAM (access time 7ns) the processing time jumps to 1300ms!

All elements is array[6912] reside in SDRAM sequentially. No READ - WRITE mixed operations in profiling. 

Probe EMIF: RAS, CAS and WE, EM_BA[0:1], etc and found the commands are the following

ACTIVE -> NOP -> READ -> NOP -> BURST TERMINATE -> ... READ -> NOP -> BURST TERMINATE then repeat in 182ns cycle

The main reasons of the slow access are, most of time, DSP only access ONE word and wait extra time to access the next word.

Q1: The setup in EMIF registers is for continuous page burst (full page READ). It should read those words in the same row together if possible. Why only read ONE word each time?  EMIF_SDTIMR  = 0x69114910;  EMIF_SDSRETR = 0x00000008;  EMIF_SDRCR   = 0x0000081E;  EMIF_SDCR    = 0x00004520;

Q2: Why each READ is separated by 182ns? This makes the time to read all 6912 words * 180 ns = 1.244ms! Thanks.

  • Joshua,

    If you have one 16-bit SDRAM on the EMIF bus, then you should be reading a 2-beat burst for each word in order to fetch a 32-bit word. This does mean the minimum possible access time is 14ns instead of 7ns, although this is a long way from the current time you are seeing.

    Your code is being executed as it is written. The DSP is being instructed to read one word from vector and one word from array. The DSP will take that data and do its processing before attempting the next read.

    You can rewrite your code to change the loops to be more optimized. You can use compiler optimization and #pragmas to tell the compiler how to use your code. You can use intrinsics to read 64 bits at a time (_amem8).

    The most efficient way to access external memory is to use the dMax module to read a few words of memory while the DSP is processing another set of data, then ping-pong between those. The dMax will read data in longer bursts than the DSP will, even if the DSP is using the _amem8() intrinsic to read 8 bytes at a time. The dMax Reference Guide discusses how to do Ping-Pong Data Buffering.

    The 182ns read burst period includes the DSP processing time plus the internal access time to get the address to the EMIF and to get the data from the EMIF. This could easily be 20-30 DSP cycles just for the word-by-word internal access time to the EMIF. The remainder could be improved with code optimization.

    Regards,
    RandyP
  • Randy,

    Is there any example C code for matrix multiplication while using dMAX to load the matrix to on-chip ping-pong buffer? Thanks.

    Josh
  • Josh,

    I doubt there is already an implementation of your project done, but you can search the web and TI.com and this forum for possible discussions or offerings of something similar.

    You can go to our Wiki and find the DSP Integration Workshop. It includes a discussion of using DMA to optimize processing. It does not use the dMax and it does not do matrix multiplication, but it shows you how to use DMA to transfer data for processing and to do that processing in portions. You can learn from those examples how to do a wide range of operations using the methods covered there.

    Regards,
    RandyP