This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Is my access time to SDRAM through dMAX reasonable?

Other Parts Discussed in Thread: TMS320C6726B

Randy asks me to use DMA to access SDRAM in the following thread to improve the access time

https://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/p/413541/1469170

 

I have implemented and measured the time

For loading 6912 32-bit words from a 16-bit SDRAM (7ns access time) on the EMIF bus, regardless the overhead, it would be

6912 words x 7 ns x 2 (2-beat burst for each word in order to fetch a 32-bit word) = 96.8 us

But the maximum quantum transfer size limit in dMAX event entry is 16 elements.

Quantum transfer up to 16 elements: I measured 480us.  <-- this about 3 times faster comparing to use CPU access but it is larger than 96.8us calculated above.

Quantum transfer up to 8 elements: I measured 840us.

Quantum transfer up to 4 elements: I measured 1.44ms.

Quantum transfer up to 1 elements: I measured 5.24ms.

I would like to know is there any other place can be fine tune to reduce more access time. Thanks. 

  • Added more info:
    DSP is TMS320C6726B and SDRAM: Micron SDR SDRAM, MT48LC4M16A2 – 1 Meg x 16 x 4 Banks, Cycle time 7ns
    SYSCLK1 is running at 266MHz and SYSCLK3 for EMIF is 133MHz.
    All the program and data are located on DSP on-chip memory except an array with the size of 6912 words.

    related DMAX settings:
    DMAX1_LOMAX_EVENT_TBL_01 = 0xEE463303;
    // QTSL=11xxXXXXXXXb, quantum transfer may not move more than sixteen elements
    // SYNC=xx1xXXXXXXXb, The whole transfer is completed after receiving one synchronization
    // TCC=X1110XXXXXXb, Transfer Complete Code=14 for DTCR1 bit6
    // ATCINT=XX0xxxXXXXXb, CPU gets notified after completion of each frame disabled
    // TCINT=XXx1xxXXXXXb, Transfer Complete Interrupt enabled; To disable use 0x26063303.
    // RLOAD=XXxxx0XXXXXb, No Reload active element counter, active SRC, and DST addresses when transfer is completed
    // CC =XXX01xxXXXXb, COUNT2 =7-bits, COUNT1 = 8 bits, COUNT0 = 16 bits
    // ESIZE=XXXxx10XXXXb, 10b 32-bit element size; 01b 16-bit
    // PTE=XXXX33XXh, Pointer to Transfer Entry 1: 33h x 4 = CCh
    // ETYPE=XXXXXXxxx00011b. Event type: General purpose data transfer

    DMAX1_LOMAX_TRANSFER_ENTRY_1_ASRCADDR = (unsigned int) &abcd[0][0];
    DMAX1_LOMAX_TRANSFER_ENTRY_1_ADSTADDR = (unsigned int) &wxyz[0][0];

    DMAX1_LOMAX_TRANSFER_ENTRY_1_ACOUNT2 = (char) 0;
    DMAX1_LOMAX_TRANSFER_ENTRY_1_ACOUNT1 = (char) 0;
    DMAX1_LOMAX_TRANSFER_ENTRY_1_ACOUNT0 = (unsigned short) 0x1B00; // 12*576=6912=1B00h

    DMAX1_LOMAX_TRANSFER_ENTRY_1_DSTINDX0 = (short) 1;
    DMAX1_LOMAX_TRANSFER_ENTRY_1_SRCINDX0 = (short) 1;

    DMAX1_LOMAX_TRANSFER_ENTRY_1_COUNT2 = (char) 0;
    DMAX1_LOMAX_TRANSFER_ENTRY_1_COUNT1 = (char) 0;
    DMAX1_LOMAX_TRANSFER_ENTRY_1_COUNT0 = (unsigned short) 0x1B00; // 12*576=6912=1B00h
  • Hi,

    Thanks for your post.

    The above numbers looks reasonable but still, you can look for SDRAM benchmark throughput numbers and the dMAX module performance is presented in the number of dMAX clocks required to complete a transfer and the data presented can be used to determine the loading of dMAX when performing 1DN transfers.

    Basically, throughput to the SDRAM interface depends on the SDRAM setting and on clock ratio between dMAX and the external memory interface (EMIF).

    Please refer case 4 in section 4.9 of c672x dmax reference guide below and see Table 4-21 (32-bit EMIF) and Table 4-22 (16-bit EMIF)

    http://www.ti.com/lit/ug/spru795d/spru795d.pdf

    The case4 in the above guide describes he burst of sequential data is moved between sequential locations in the external

    memory.

    General tips for best performance:

    1. To get the highest throughput, use large QTSL values and maximize COUNT0

    2. Use of small QTSL values works better if low latency is required

    3. To achieve the maximum performance, the burst transfers should be used wherever possible and in general, the burst type of transfers (where INDEX0 is equal to one) have the maximum throughput.

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.

    -------------------------------------------------------------------------------------------------------