This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Using uDMA for getting real time SSI data

Other Parts Discussed in Thread: TM4C129ENCPDT

I am currently working on a project based on the TM4C129ENCPDT where I need to receive and process serial data from an SSI in real time. In order to do so, I am using a uDMA configured for ping pong mode to move data from the SSI data register to a suitable buffer in internal RAM.

The ping pong setup should ensure that data is continuously copied from the SSI with no data loss.

The SSI needs to run at a clock frequency of 16 MHz and my core clock frequency is set to 96 MHz. With a SSI receive FIFO depth of eight 16 bit units, this means that an empty receive FIFO will be filled in 16*8*(1/(16000000 Hz)) = 8*(10^-6) s = 8 us or 768 core clock cycles.

What happens if another part of my code running in the processor core blocks RAM access for 768 core clock cycles? Based on actual testing, my best guess is that since uDMA access to RAM is always prioritised lower than processor core access to RAM, the uDMA will not be able to empty the SSI FIFO, and a SSI receive overflow (RXOR) will occur. Obviously, this receive overflow ruins my real time capability.

It seems strange that such an overflow condition can arise when the presence of a ping pong uDMA mode suggests that uDMA could or should in fact be used for real time purposes.

Does anyone have any enlightening thoughts on the matter? Am I understanding and using the architecture correctly? Could I do anything to let my uDMA outrank the processor core for RAM access?

  • Hello Simon,

    You are correct. The uDMA is a lower priority than CPU and LCD on the TM4C129 devices. The better way of handling this is using the uDMA in burst only mode so that the DMA spends more time transferring data per control word fetch from the Control Table.

    Also another way to bypass the arbitration issue is to use of 2 Banks of SRAM for DMA alone. Since the Banks are 4 way interleaved the CPU can use the other 2 banks while DMA fills one bank for Ping and another Bank for Pong.

    Regards
    Amit
  • Hi Amit, thank you for getting back to me :-) I have verified that I am using the uDMA on burst mode only, and I've been trying to figure out how to ensure that DMA data and CPU data are kept in separate SRAM banks, but I couldn't find any detailed information in the datasheet.

    I know that the SRAM memory map begins at address 0x2000.0000 and that the TM4C129ENCPDT has 256 kB of internal SRAM. Does this mean that I can assume the following memory map scheme?

    SRAM bank 1: 0x2000.0000 - 0x2000.FFFF (64kB)
    SRAM bank 2: 0x2001.0000 - 0x2001.FFFF (64kB)
    SRAM bank 3: 0x2002.0000 - 0x2002.FFFF (64kB)
    SRAM bank 4: 0x2003.0000 - 0x2003.FFFF (64kB)

    ALSO: I have an external SDRAM module connected to the TM4C129ENCPDT via the EPI interface. The SDRAM is currently clocked at 50% of the core clock,  but if I set the EPI clock divider to 0, I am in fact able to run the SDRAM at full core clock.

    How is this even possible? My SDRAM module can handle up to 133 MHz, but the TM4C129ENCPDT datasheet specifies that external SDRAM modules can only be run at up to 50% of the clock. Do you know the reason behind this limitation and why I am able to bypass it?

    When I do run the external SDRAM at full core clock, my initial SSI receive overrun problem disappears (even though the SSI and uDMA only work on internal SRAM and thus should not be affected by the speed of the external SDRAM). Can you explain this?

  • Hello Simon,

    Did you check the clock on the CLK pin as 120MHz? It may be possible for a few "lucky" devices, but by timing not every uC would work at the same frequency.

    As for the original issue: The SRAM mapping is correct for the banks as you mentioned and you must ensure that one of the bank is exclusively for the uDMA.

    Regards
    Amit
  • Hi Amit,

    Yes, I used an analog oscilloscope to verify that the SDRAM clock was in fact running at the core clock frequency (96 MHz in my case).

    I'm using the GCC compiler and linker and I know that I can ensure that the uDMA variables are linked to a specific SRAM address using

         __attribute__ ((section(".udma_variables_sram_region"))) ,

    but I couldn't think of a way to keep other variables outside of this region except for specifically linking ALL other variables to another region using

         __attribute__ ((section(".other_variables_sram_region"))) .

    This seems rather drastic, and I'm not sure if it is the right way to do it. Do you have any suggestions?

    Regards,
    Simon

  • Hello Simon

    In the Linker file you can change the size of the SRAM so that CPU does not know that another bank is available. In the C code you can use an address pointer for the DMA Table.

    Regards
    Amit
  • Thank you Amit, that's a nice solution.

    In the end it turned out that that our low priority uDMA that we use to transfer data to external SDRAM was bottlenecking our high priority uDMA that we use to get data from the SSI peripheral to internal SRAM.

    When we switched the low priority transfer from external SDRAM to internal SRAM, our timing problems disappeared, and the high priority transfer is able to run with no problems.

    Regards
    Simon
  • Hello Simon,

    Did you change the high priority destination location as well?

    Regards
    Amit
  • Yes, I tried placing my ping and pong buffers in different SRAM banks, but it did not seem to make any difference, so I disabled it again.
  • Hello Simon

    That is strange. It should have worked but as you mentioned the root cause was the uDMA being held by a low priority task to the SDRAM, does make sense as well since the uDMA core will process one thread at a time.

    Regards
    Amit
  • Hi Amit,

    On second thought it may not be so strange after all. If you read the processor data sheet carefully, you'll notice that on page 610 it is mentioned that

    "The SRAM is implemented using four-way 32-bit wide interleaved SRAM banks (separate SRAM arrays)"

    I wasn't quite sure what the interleaving part meant, but after doing some research I ended up here. I think that interleaving means that the memory map that I suggested in a previous post is actually not correct, and that the memory map should probably look more like this:

    SRAM bank 1 SRAM bank 2 SRAM bank 3 SRAM bank 4
    0x2000.0000 0x2000.0004 0x2000.0008 0x2000.000C
    0x2000.0010 0x2000.0014 0x2000.0018 0x2000.001C
    0x2000.0020 0x2000.0024 0x2000.0028 0x2000.002C
    0x2000.0030 0x2000.0034 ... ...

    This would explain why linking a ping buffer to an address in the 0x2000.0000 - 0x200.FFFF range and a pong buffer in the 0x2001.0000 - 0x2001.FFFF range did not solve my problem as the ping and pong buffers would actually still be spread across multiple physical SRAM banks.

    Regards
    Simon

  • Hello Simon,

    I ran a test on the SRAM memory bank and the access were in fact as per the organization of the address mentioned, unless I misread the data. Anyways, I have planned a few more tests to see the actual access of memory with multi initiators.

    Regards
    Amit