This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA performance on TMS320DM647

Other Parts Discussed in Thread: TMS320DM647

We have done some tests to establish DMA performance on our board. We are using TMS320DM647 with DDR2 using a 25Mhz clk for PLL2(DDR2's PLL). We have 16Mx32 configuration. We expected our max BW should be 25Mhz x 20 x 4Byte= 2GB/sec, however our test which transferred 48kb blocks from L2 to DDR2 achieves approx 500 MB/Sec using EDMA. A similar test which transferred, using the same EDMA code, from L2 to a host on the pci bus achieved 80 MB/S which seems reasonable.

Are there factors we missed in calculating the theoretical maximum for DDR2? Do you have tips for optimizing setup of the DMA to achieve good performance? Example code?

Here's the DDR2 setup code - called like this: Set_DDR2( dsp, 250000000 );

/****************************************************************************
 *
 * NAME
 *      Set_DDR2
 *
 * PURPOSE:
 *      Configure DDR2 to run at specified frequency on a 32 bits bus.
 *
 * USAGE
 *      This routine can be called as:
 *
 *      Set_DDR2(ddr2_freq)
 *
 *      ddr2_freq - (i) Running desired frequency in Hz for DDR2 memory.
 *
 * RETURN VALUE
 *      NONE
 *
 * REFERENCE
 *
 ****************************************************************************/
static void Set_DDR2( dm647_t *dsp, int ddr2_freq )
{
    int iSdcfg,iCas;

    // ************************************************************************
    // SL DDR2 Memory timing info
    // Config 4 banks, page size=512word, (Clk>200 MHz CAS=4, Clk<=200 MHz CAS=3)
    //        32 bits data bus

    // Adjust CAS latency/SDCFG depending of clock speed around 200 MHZ
    if (ddr2_freq>200000000)
    {
        // CAS 4 and config for CAS4, set bank=4 and Page=512
        iCas = 4;
        iSdcfg = 0x00570821;
    }
    else
    {
        // CAS 3 and config for CAS3, set bank=4 and Page=512
        iCas = 3;
        iSdcfg = 0x00570621;
    }
    // Setup for 250.0 MHz DDR (CAS4)
//#define DDR_SDCFG 0x00000821
#define DDR_SDRFC   0x000003DE
#define DDR_SDTIM1  0x24DB5B91
#define DDR_SDRIM2  0x0095C722
#define DDR_DMCCTL  0x50006405

    // ************************************************************************

    // Gives power to ddr2 just in case
    Set_PSC_State(dsp, DM647_PD0, DM647_LPSC_DDR2, DM647_PSC_ENABLE);

    //  *******************************************************
    // 1- DDR2 Module Initialization

    // ***** Set DDR2 drive strength to weak **********
    dm647_writereg( dsp, DM647_DDR_SDCFG_ADR, (iSdcfg | DM647_BOOT_UNLOCK));  //Unlock upper section
    dm647_writereg( dsp, DM647_DDR_SDCFG_ADR, iSdcfg );  //lock again

    // ***** Assert Reset for DDR2 Interface **********
        dm647_writereg( dsp, DM647_DDR_DMCCTL_ADR, 0x50006420 | ( iCas + 1));

    dm647_writereg( dsp, DM647_DDR_SDCFG_ADR, (iSdcfg | DM647_TIMUNLOCK));

    // Refresh Rate - Freq (Hz) * 3.96e-6 (sec) for Industrial Temp
    dm647_writereg( dsp, DM647_DDR_SDRFC_ADR, DDR_SDRFC);

    // set SDTIM registers timing from memory specs
    dm647_writereg( dsp, DM647_DDR_SDTIM1_ADR, DDR_SDTIM1);
    dm647_writereg( dsp, DM647_DDR_SDTIM2_ADR, DDR_SDRIM2);

    // Lock DDR Bank timing
    dm647_writereg( dsp, DM647_DDR_SDCFG_ADR, iSdcfg);

    // ***** Release Reset for DDR2 Interface **********
    // ReadLatency = CAS +1 with default bits
    dm647_writereg( dsp, DM647_DDR_DMCCTL_ADR, 0x50006400 | ( iCas + 1));


    // ***** Set DDR2 Controller Priority **********
    dm647_writereg( dsp, DM647_DDR_BPRIO_ADR, 0x00000080 );  //To avoid starvation, raise oldest cmd priority after 128 transfers

    Wait_Soft( 1500 );
}

 

Thanks,

Geoff

  • You will find a list of helpful Application Notes at the DM647 Product Folder. You can reach any Product Folder by doing a Part Number search in the upper-right corner of www.ti.com, and then selecting the correct device from a short list (in some cases). In the Product Folder, click on Technical Documents, and you will be moved down to the list of technical documents for the device, usually starting with the datasheet and then a list a Application Notes.

    In particular for the DM647 and your questions, you will find there the Application Note titled TMS320DM648/7 SoC Architecture and Throughput Overview. It has the information you are looking for about throughput to different components of the DM647 and what affects that throughput.

     

    If this answers your question, please click  Verify Answer  on this post; if not, please reply back with more information to help us answer your question.

  • Thanks very much for the reference  to the app note - very helpful, however we are seeing at least half the performance we'd expect even after reading the app note. Is the test code available that was used to generate the actual throughput results quoted? Our problem could be either our test code or our hardware.

  • Making repeated calls to DAT_copy (see below) we can achieve Max (with BURST 32) about 821 MB/S for 32 ACNT, 1 KB BCNT,  L2 to DDR2. The application note leads us to expect 2GB/S (fig 11 page 17, Throughput of DMA for L2, DDR access). Increasing BURST to 64 or 128 the throughput plateau at 780 MB/S.

    #define BURST 32
    Uint32 DAT_copy(void *src, void *dst, Uint16 byteCnt ) {
        Uint32 chNum = 0;
        Uint32 tccNum = 0;

        chNum = _getFreeChannel(&tccNum);

        EDMA3_DRV_setTransferParams(DAT_EDMA3LLD_hEdma, chNum, BURST, byteCnt / BURST, 1, 0,
                EDMA3_DRV_SYNC_AB);
        EDMA3_DRV_setDestParams(DAT_EDMA3LLD_hEdma, chNum, (unsigned int)dst,
                EDMA3_DRV_ADDR_MODE_INCR,(EDMA3_DRV_FifoWidth)0);
        EDMA3_DRV_setSrcParams(DAT_EDMA3LLD_hEdma, chNum, (unsigned int)src,
                EDMA3_DRV_ADDR_MODE_INCR,(EDMA3_DRV_FifoWidth)0);
        EDMA3_DRV_setSrcIndex(DAT_EDMA3LLD_hEdma, chNum, BURST, 0);
        EDMA3_DRV_setDestIndex(DAT_EDMA3LLD_hEdma, chNum, BURST, 0);
         return _setupTransferOptions(chNum, tccNum);
    }

    What do we need to change to achieve your published bandwidth numbers?

     

    Geoff

  • I wish I had the source code for the benchmarks to give you, but I do not. I have tried, but do not have them.

    If you were to look at the DDR bus during your benchmarking, you would see large gaps, probably showing DDR activity about 40% of the time and no DDR activity the other 60% (if my math is right, ~800/2000 = .40).

    You are simultaneously benchmarking both the EDMA3 performance of a QDMA transfer and the DSP performance executing a series of EDMA3_DRV_* function calls. These EDMA3_DRV_* function calls are very convenient and easy to use, but they are not as efficient as using EDMA3_DRV_setParam() to write all the values from a struct in one function call; doing that will give you better performance if what you want to do is use QDMA.

    And the real question is, what do you want to do here? Do you want to do QDMA copies from randomly different places to randomly different places? Do you want to transfer a certain amount of data in a certain amount of time after some event occurs, or otherwise within the course of your application? Or do you want to duplicate the app note?

    If you are just looking to duplicate the app note, then you should program up a series of DMA transfers (not QDMA) using a bunch of DMA channels and PARAMs, and make all of them really long, like the size of all available L2 space. Then start the transfers using a single write to ESR to trigger all of them at the same time, polling IPR for the highest numbered channel to finish since it should be the last one in the queue.