This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Problems with DMA to/from L1 SRAM

Guru 10685 points
Other Parts Discussed in Thread: SYSBIOS

I've got my algorithm successfully using memcpy to copy some data to L1D SRAM and then out again using the C674x core in the DM8168. I have this working with L2 SRAM as well.

When I try the same thing with DMA (ECPY), the DMA to L2 memory is fine, but when I try and DMA the data back out of L1 SRAM, I find it is all zeroes.

Why is the data all zeroes? I've tried using memcpy to copy data into L1D and use DMA to copy data out but I still end up with zeroes. It appears that the DMA is doing one or both of the following:

1) Failing to copy any data into L1 SRAM.

2) Reading zeroes when it tries to copy data _out_ of L1D SRAM.

Anyone know why this isn't working? Edit: the DMA itself is doing something but it appears to be copying data to/from the wrong memory address and as a result the data being copied is all zeroes.

Thanks,
Ralph

P.S.

I am using the DMA offset of 0x3000 0000 as required due to the difference in memory maps between the L3 interconnect and the C674x memory map.

  • I know this might sound like a stupid suggestion, but are you sure L1 is not configured as cache.
    At least on C66x loads from the L1D address range return always return 0 when the L1D range accessed is configured as cache.

  • Hi Clemens, thanks for your reply. I've checked the .map file and it seems the L1D is correctly configured as it has length of 0x7000 (28kB) and says that all of it is used up with the ".my_l1d_sect" section:

    ******************************************************************************
                   TMS320C6x Linker Unix v7.3.1                    
    ******************************************************************************
    >> Linked Fri Nov 29 14:04:21 2013

    OUTPUT FILE NAME:   <bin/servercom.xe674>
    ENTRY POINT SYMBOL: "ti_sysbios_family_c64p_Hwi0"  address: 995bf000


    MEMORY CONFIGURATION

             name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      IRAM                  10800000   00020000  0000c000  00014000  RW X
      L1DSRAM               10f00000   00007000  00007000  00000000  RW  
      OCMC_0                40300000   00040000  00000000  00040000  RW X
      OCMC_1                40400000   00040000  00000000  00040000  RW X
      DDR3_HOST             80000000   16c00000  00000000  16c00000  RWIX
      DDRALGHEAP            98000000   01400000  01400000  00000000  RWIX
      DDR3                  99500000   00c00000  000bf654  00b409ac  RWIX
      DDR3_SR1              9a100000   00100000  00100000  00000000  RWIX
      DDR3_SR0              9f700000   00200000  00200000  00000000  RWIX
      DDR3_SR2              b3d00000   0bc00000  0bc00000  00000000  RWIX


    SEGMENT ALLOCATION MAP

    run origin  load origin   length   init length attrs members
    ----------  ----------- ---------- ----------- ----- -------
    10800000    10800000    0000c000   00000000    rw-
      10800000    10800000    0000c000   00000000    rw- .INT_HEAP
    10f00000    10f00000    00007000   00000000    rw-
      10f00000    10f00000    00007000   00000000    rw- .my_l1d_sect
    98000000    98000000    01400000   00000000    rw-
      98000000    98000000    01400000   00000000    rw- .EXTALG_HEAP
    99500000    99500000    00063900   00063900    r-x
      99500000    99500000    00063900   00063900    r-x .text

    If I search in this file for "l1dArray" which is my L1 SRAM array, I find this entry:

    10f00000   l1dArray

    which tells me it has been placed correctly at the start of L1D where the SRAM area is (I believe the cache area always follows the SRAM area at a higher address).

    Ralph

  • Hi Ralph,

    Did you write your own DMA driver or are you using the EDMA driver package from TI ?

    Best,

    Ashish

  • I'm using the ECPY API from Framework Components. It has these functions:

    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/fc/3_20_00_22/exports/framework_components_3_20_00_22/docs/html/ecpy_8h.html

    ECPY itself uses EDMA3 apparently.

    Thanks,
    Ralph

  • Ralph,

    I will request someone from the framework components team to help you.

    Best,

    Ashish

  • Okay thanks. The only difference between my memory sections for L1 or L2 is the lack of the "X" attr flag for L1 in the ".map" file that is generated - maybe this could be the problem?:

    MEMORY CONFIGURATION

             name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      IRAM                  10800000   00020000  0000c000  00014000  RW X
      L1DSRAM               10f00000   00007000  00007000  00000000  RW 

    "IRAM" is of course the standard section name for L2D SRAM.

  • When you configure the DMA transfer the source and destination addresses need to be global addresses, in your earlier post you mention something like:

    I am using the DMA offset of 0x3000 0000 as required due to the difference in memory maps between the L3 interconnect and the C674x memory map

    can you tell exactly how you are configuring the src and dst addresses?

    Note that CPU can typically use either global or processor local addresses interchangably, but EDMA needs global addresses only. So you may need to convert the addresses that you extract via '&' to the global address by adding the offset.

    Murat

  • Okay, let's say I'm just copying data from my shared DDR memory (the source) to the L1D SRAM (the destination).

    In this case, my source address is the address of the buffer that is passed to my "IUNIVERSAL_Fxns" .process function that is in my codec source code. The arguments to my process function are:

    IUNIVERSAL_Handle handle, XDM1_BufDesc* inBufs, XDM1_BufDesc* outBufs, XDM1_BufDesc* inOutBufs, IUNIVERSAL_InArgs* universalInArgs, IUNIVERSAL_OutArgs* universalOutArgs)

    The address of the source buffer that I give to ECPY is given by this:

    XDAS_Int8* source               = inBufs->descs[0].buf

    I have the ARM-side code set up to allocate this source buffer from the Shared Memory 2 region in the DM8168 memory map.

    As for the destination, I have this in my codec source file:

    #pragma DATA_SECTION(l1dArray, ".my_l1d_sect")
    XDAS_Int8 l1dArray[28*1024];
    #define l1dArrayDma (l1dArray+0x30000000)

    The section ".my_l1d_sect" is defined in my codec server's link.cmd file:

    SECTIONS
    {
            .my_l1d_sect  > L1DSRAM
    }

    Now, the address I actually give to ECPY is "l1dArrayDma". When I pass this address to ECPY and the array was allocated in L2D SRAM the DMA works fine but as it is with L1D, it doesn't work.

    Is this the information that you wanted to know?

    Thanks,
    Ralph

  • Hi Ralph,

    When the data (l1darray) is located in L1D have you tried assigning the address of the buffer without address mapping using your macro (l1dArrayDma).

    It looks like for L2D SRAM addresses you are able to assign the global src/dst addresses and EDMA works fine, but for L1D you need to use the global physical address that EDMA (which may just be the unaltered same address as the CPU). (e.g. by adding the offset) based on your memory configuration for the device.

    Have you tried the following utility from the ECPY module: (from ecpy.h).

    extern cregister volatile unsigned int DNUM;
    static inline void *restrict EDMA_ADDR_LOC_TO_GLOB(void *restrict loc_addr)
    {
       unsigned int tmp = (unsigned int)loc_addr;

       if((tmp & 0xFF000000) == 0)
       {
          return (void *)((1 << 28) | (DNUM << 24) | tmp);
       } else return loc_addr;
    }

    Best regards,

    Murat

  • Hi Murat,

    I tried what you suggested but it still copies zeroes. Also, I couldn't find the EDMA_ADDR_LOC_TO_GLOB function in ecpy.h.

    I finally found the answer though. Your comment made me think maybe there was a function I'd missed in ecpy.h. I had a look at the header and found a function "ECPY_setDestinationMemoryMode". I added this to my DMA function:

    if(dstIsL1)
    {
        ECPY_setDestinationMemoryMode(dmaHandle0, INTMEMORY0);
    }

    AND IT WORKED!!!!!!!

    Frankly, I'm disappointed that no one at TI knew this. It's especially annoying now that TI no longer support the DM8168 through FAEs.

    Anyway, I've fixed it on my own after an absolutely epic struggle. Your suggestions spurred me on though, so thanks for at least having a conversation with me in this thread as it kept me thinking of new things to try.

    Ralph

  • Hi Ralph,

    First, I am glad to hear you got it to work, and I am sorry it took a long while to get you there. I am the original author of the function ECPY_setDestinationMemoryMode() function, so I probably should have thought about suggesting it, but it has been a long while and it looks like you most likely encountered a Silicon issue, which by using this specialized API were able to workaround. There is some information about the obscure issue that have affected some silicon and various revisions in the past:

    http://ap-fpdsp-swapps.dal.design.ti.com/index.php/GEM_Cache_Coherence_Bugs

    The reason you may still want to pay attention to the issue you encountered  is that it might come back. The Si defect may manifests when the same EDMA TC queue is used for submitting transfers that write to both L2/L1D and DDR (via separate transfers). Using the API  ECPY_setDestinationMemoryMode() call in your case is raising the priority of the transfers submitted through this handle, and potentially helping you avoid a Si issue condition.

    This API was introduced to give programmer ability to workaround potential Si lockup conditions by configuring distinct DMA handles to use distinct TC queues (based on the memory-mode argument and via configuration) and have programmer use the same handle for transfers to the same memory region.

    Murat

  • Hi Murat, I'll bear in mind that this could raise its head again. I think the link that you posted was an internal TI one. The only errata for the DM8168 that is vaguely related was this one:

    Advisory 2.0.49 DMA Queue Priority for DSP SDMA
    Revisions Affected: 2.0, 1.1, 1.0
    Details: The device does not support EDMA queue priority feature for C674x SDMA transactions
    due to an error in the mapping of open-core protocol (OCP) sideband signals during
    conversion to VBUS signals.
    Workaround: There is no workaround for this issue. It is recommended not to use the EDMA queue
    priority feature.

    Unless of course it's a C674x-specific bug.

    Ralph