This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Too low DMA (GPIO to memory) transfer speed in LX4F120H5QRF (TM4C1233H6PM)

I use DMA to read state of GPIOD port into a memory. DMA event generates by Timer 0 (32 bits). It seems ok until I increase timer's frequency up to 20 MHz (microcontroller works at 80 MHz) by the next instruction:

TimerLoadSet(TIMER0_BASE, TIMER_A, 3);

After that it looks like I still sample GPIOD port at 10 MHz, not at 20 MHz. I even tried to use access to GPIOD through AHB instead of APB, but it doesn't matter.
What's the maximum allowed reading frequency of GPIO port through DMA?

  • I find your post clever & well constructed.    That said - would it not prove of value to first determine the maximum, "GPIO Port sample rate" achieved "WITHOUT" the uDMA?   In this way insight may be gained as to, "How & even If" the uDMA is, "slowing your Port Read - thus reducing the overall transfer speed."

    I'd not suspect the uDMA to be equipped or able to "speed" a simple port read.   (that is - what it is - although here, your use of the AHB should help.)    The ability of the uDMA to perform its transfer task, "in the background" is where (I believe) its maximum value lies.    Further - use of the uDMA should maximize your "transfer rate" but I doubt it can (at all) increase the speed of your GPIO Port read...

  • Hello Tankist,

    When running the DMA it takes at least 6 clock cycles to load and execute the first DMA instruction from the DMA Transfer Table. Running a timer to generate trigger will just flood the the DMA with triggers that are not going to be processed. Clear intent of the end goal may be useful.

    Regards
    Amit
  • Amit Ashara said:
    Running a timer to generate trigger will just flood the the DMA with triggers

    Hi Amit,

    That's valuable info - thanks for that.   While this post is open - might it make sense to describe the "proper" method to maximize transfer speed?

    It would appear that "subsequent" GPIO Port reads (after the first one) should await some "signal" from the uDMA - to avoid the flooding you note.   Is this something which you care to briefly detail?   (and - would such method prove "universal" in that "any port" (i.e. not just GPIO) may adopt this method - when seeking an optimal transfer rate?)  

  • Hello cb1

    The uDMA can queue up to one additional trigger when processing an existing trigger (as given in the data sheet). Now this depends on the fact the CPU is not starving the DMA of bandwidth on SRAM. If it does access the SRAM more then it would work to delay the uDMA access to the transfer table and descriptors. It is tough quantifying the delay the uDMA as it is specific to the application.

    Regards
    Amit
  • Hi Amit,

    Ok - thanks for that. Kindly note that neither poster nor I sought to "quantify" the delay - instead we're seeking the means to "optimize" the data transfer.

    I'm inexpert in TM4C uDMA use/management - yet I know other MCUs to provide "signals" which serve to automate a process - and point to an optimal transfer rate...
  • Hello cb1,

    I am not sure (in the absence of a observable signal) how it could be optimized. The closest we get to is to toggle GPIO in a corresponding interrupt handler (if the condition is the same as the DMA trigger). As an example in case of timer it could be in output mode and be working with a Toggle on Match/Timeout.

    Regards
    Amit
  • Hi Amit,

    I'm very much "out of my element" w/your devices µDMA - yet might the µDMA provide some interrupt or other signaling event upon completion of the "initial transfer from gpio to memory - so that the "next read" of the (GPIO port - in this post's case) could then proceed?

    Again pardon me - I don't (really) understand all of the mechanisms w/in your µDMA.
  • Hello cb1

    The uDMA does generate an interrupt upon completion of the full transfer but not during every element. As an example a Transfer Size of 64 elements with 4 per request will cause a completion interrupt at the end of 64 elements and not after every 4 elements. This is routed to the peripheral requesting the transfer.

    Regards
    Amit
  • Hi Amit,

    Thank you - appreciated.

    Is it correct then to believe that after the initial "read" of a GPIO Port - all subsequent Port reads (w/in the µDMA construct) are "automated" by the µDMA - and not the responsibility of the "timer" as the poster suggested?
  • Hello cb1,

    The transfer of the number of elements within the burst are automated. The total number of transfers that will be done to get through the transfer size are dependent on the DMA request. In this example, the 4 elements will be done automatically when a burst request is initiated. However 16 requests (64 div by 4) still would be required. During the time frame of 4 elements (Load of the Control word, transfer of 4 units and unload of the control word) there could be at most one more burst request that could be queued from the peripheral on the channel.

    Regards
    Amit
  • Thank you - appreciated - hope that this "back-forth" proves of value to o.p. and to others who may wander (someday) here.

    And (almost forgot) would poster's use of AHB bus speed his overall transfers?

  • Hello cb1,

    The DMA's bulk process during the transaction is on AHB. The only place where it will make a difference is DMA reading from the peripheral on the AHB Bus. If the element per burst is 1 (which would be fair to assume), then the impact may not be as much as expected from moving to AHB from APB bus for GPIO. The user has to ascertain the right rate to trigger the DMA.

    Regards
    Amit
  • Thanks for your explanations, but I still haven't got the solution. I'll enumerate a set of facts which I have:

    1) The reason to open the topic was to find out the "narrowest" part in the series GPIO-DMA-RAM for speed.

    2) I need the highest sample frequency for GPIO scan. As I understand, it's possible through DMA only (interrupts take too much time).

    3) I don't use the burst mode in DMA (I could be read all buffer in one request, but I need to change a sampling frequency and I must know the concrete sampling frequency) so I transfer one word per one timer event.

    4) Amit, you said "DMA takes 6 cycles at start", so it doesn't take the same time during transaction?

  • Hello Tankist,

    It takes almost 2-4 clock cycles for transfer depending on the bus arbitration at the AHB Bus Matrix.

    Regards
    Amit
  • I use RTOS (FreeRTOS), and my CPU sleeps most time between system's events (1 msec). In case of 10 MHz sampling I fill a buffer for 1000 words in 0,1 msec so I think CPU don't use memory bus in this time.

    As I understand the theoretical maximum speed of DMA is 2 system cycles. What is one system cycle with respect to CPU clocks (80 MHz)?

  • Hello Tankist

    The CPU and System clock are the same. So one system clock is same as one CPU clock. Also in RTOS it may not be in sleep but in idle task which would be accessing the SRAM still. Do confirm that from TI-RTOS as the concept of Sleep is often different at SW and HW.
    In terms of the TM4C12x devices Sleep is a low power state of the device which is induced when the CPU asserts a WFI or WFE with a particular programming model and system setting at register level.

    Regards
    Amit
  • My IDLE routine has "wfi" instruction, it works well as I checked (the cpu core sleeps during the idle function). Also I configured SysCtlPeripheralClockGating(false) to allow all periphery (timer, dma, gpio) work during the cpu sleep cycle. So I'm sure CPU doesn't prevents DMA. But the problem still occurs.

    Any ideas?

  • Hello Tankist,

    Note that you are already operating the device feature at its maximum condition, doing at a higher rated is going to overload the hardware causing an issue. Simplest solution: find the optimum load scenario for Timer Trigger and DMA.

    Regards
    Amit