This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

speed issue with DMA



Hello

I have EK-TM4C123GLX Launchpad and trying to write a program for led driver application. The program doing what is intended, but it is too slow. I’m newbie with Tiva controllers, so please don’t ignore my questions even they could look dumb.

 

I using ccs.v6 and tivaware 2.1.0.12573. Let me explain where I’m struggling. To be more precise, I attached few scope captures and whole project.

The pic.1 is whole frame, it consist of 8 burst, with changing timing between the bursts. But this frame is too long for my application. It is 10.7ms. And I need it reduce to about 1ms. As all frames dependent on shortest duration between bursts, I must reduce this shortest duration.

Now lets see the first burst. (Pic.2) It last 6us, including delay time, to toggle latch. Its good enough, just this delay time could be reduced.

1)      Question, how can I reduce this delay time. It consist of recognizing the SSI not busy, and toggling signal by itself (see procedure toggle_latch) I also experimented with interrupt from SSI module, but it was even bigger delay, and also more latch signals than I need, so I dropped it.

Then lets see at first and second bursts. (pic3) The time between them, is 51us (time between latch signals). I want to reduce it down to 10us It will give me correct time for whole frame. So I must adjust timer 0 delay.

After adjusting timer 0 delay, frame length now acceptable. (pic4) The problem is, it is affecting whole frame length, but not affecting the shortest delays. (pict5) Looks like controller cant service Timer0IntHandler during short time. Only 4’th burst has correct delay.

2)      Question, how to optimize isr routine, to work faster. Guess dma restart function is time consuming (618 cycles). Do I really need to run this function on every DMA restart? Does it not enough to set some bits in registers to restart DMA

3)      Question Does my aproatch to this problem is wrong. Does it possible to generate whole frame as DMA tasks? Scatter gather mode, any examples?

workspace.zip
  • Hello Liutauras,

    Thanks for the details, but may be I am missing the big picture here. As from the code there are 16-bits in a SSI Frame that is being transmitted at 15MHz. How many bits are required by the LED Cube to be sent for RGB data every second or in one frame (in which case you need to tell us the time of one frame)

    Secondly, what is causing the distance between the toggle to change in pic-1?

    Regards

    Amit

  • Hello Amit,

    Ok, probably Im was not so cleat to explain the whole idea, but stuffed the post with details.

    Whole frame you can see in pict1, but because it cant be zoomed, it not visible frame structure.

    So idea is to sent 64 bits, 8 times for each cube level. If you look at the DMA code, you will see, Im sending 16bit per SSI, and repeating it 4 times, totally 64bits. Then I have a latch. This mean, the leds are on now. Then have a pause. The pause causes how long leds are on. During the pause I  send another data burst (16bits x 4). And again, latch -> leds on -> but longer pause as before.

    Calculation is simple. 8x8 leds = 64bits. Color depth 24bits, mean 8 bits per color. Means I need to send 8 times these 64bits bursts per frame. The frame services one cube level. 8x8x8 cube has 8 levels. If I need 100Hz refresh rate for cube, duration for one frame (one level) must be 1.25ms

    Three colors, three SSI modules, working in parallel

    Im not sure what you mean by second question, but if you asking why the timing between latches getting longer and longer, on every burst,  this is whole idea to control led color. Timer 0 causing this moving delaiy.

    The key word is BAM -  Bit Angle Modulation

  • Hey Liutauras,

    I admit i did not completely understand the runtime calculation behind the LED-Cube. However, i see the problem of your DMA-Restart Routine taking too long. I myself had a similiar issue with a time-critical DSP application. Here's how i solved it:

    1. Make use of the Control Table. Keep yourself a pointer to re-assign proper values in few cycles.
    2. Move away from the TivaWare Functions if things tend to be time-critical. Rather directly make use of the registers to re-enable the DMA.

    Looks as follows. In the Initialization Routine, i assign myself pointers to the Control Structure of the DMA:

    // Assign pointers to the Control Structures of the DMA-Channels of the ADC-Modules.
    // These global pointers are used in the Interrupt routine to re-initialize the
    // DMA-Channels fast.
    pu16Sin1PriControlPointer = (uint16_t*)&pui8DMAControlTable[0x100 + 0x8];
    pu16Sin1AltControlPointer = (uint16_t*)&pui8DMAControlTable[0x300 + 0x8];
    	
    pu16Sin2PriControlPointer = (uint16_t*)&pui8DMAControlTable[0x1A0 + 0x8];
    pu16Sin2AltControlPointer = (uint16_t*)&pui8DMAControlTable[0x3A0 + 0x8];
    
    

    NOTE: The adress of your desired data may of course vary! Those are the adresses of Channel 16 and 26, which is ADC1_SS2 and ADC0_SS2 respectively. You can look those up in the datasheet.

    And in an Interrupt Routine, i reset my DMA (Set up in Ping Pong Mode) as follows:

    // Re-Enable the DMA through its DMAENASET - register.
    // Value 0x4010000 is Bit/Channel 16 and 26. 
    HWREG(UDMA_BASE + 0x28) = 0x4010000;
    
    // Push the new values for the transfer into the Control Structure. 
    // Since we kept the pointer, it's as fast as one single assignment. // Value is the Channel Control Word. (see datasheet, DMACHCTL) *pu16Sin1PriControlPointer = 0x313; *pu16Sin2PriControlPointer = 0x313;


    In that way, my re-initialization of 2 DMA Channels is as fast as 3 pointer assignments. Hope that will help you out.

    Sincerely,

     

    Michel

  • Hello Liutauras,

    Thanks for the explanation. It was better getting the numbers. To ensure all of us (all posters and readers) are on the same page the requirement translates as

    1.25ms is the frame time for transmission in which 64x8 bits have to be sent. Also this has to be done for 3 colors which is why you are using 3 SSI's.

    The Pause time is where the color will remain lit on in the cube. With a 100Hz refresh rate the window for Pause is 10ms-1.25ms. is that correct?

    Regards

    Amit

  • Hello Michel, hello Amit,

    Thanks guys for supporting.

    Michel, yes, this BAM algorithm is confusing at firs look, but if you catch the idea, then all picture becomes clear. And yes, in my eyes problem is as you described. It takes too long to recharge dma controller for next transfer. The whole isr (where I recharging dma and toggling latch) routine takes almost 800 cpu cycles. If cpu runs at 80MHz, then 800 cycles last 10us, that actually I see on scope as well. I really cant see any good reason why there must be as many work for cpu. I already considered to go straight to registers, but for me, it looks not so simple. This is my first project, with cortex, first time with tiva, first time with ccs ide. What I think, I still want to use tivaware, but could try to write my own procedure for dma restart inside timer 0 isr. Please your advice, will it be ok if I use procedure from tivaware to initialize dma and first time transfer, (when timing is not critical) and then go straight to registers for restart transfer? As I not require any change in dma configuration, it just enough to set 7 bit of DMAENASET to restart transfer, right? Or there is more bits I must to modify, in my case.

    Amit, you are perfectly right with first two statements, but not with the third.

    Yes, Pause time is where all led’s in selected level remain lit on. There will be 8 pauses, of different length inside every frame. In my calculations, first pause must be between first burst of 64bits and second burst of 64bits, and last for 5us. Then between second and third for 10us; between third and fourth, for 20us; fourth and fifth for 40us and so on till eighth burst, every time twice as long as before. Totally the whole frame will take 1.275ms After, follow second frame, and once again start from 5us pause. To cower whole cube I need 8 such frame. So 8x1.275ms gives me 10.2ms, hat is near 100Hz refresh rate for whole cube. Of cause 100Hz more than enough, 60Hz will do a job. Just I want to squeeze all juice from launchpad.

    In current code, best what I can get, is 15us for shortest pause. (in pinout.h change ISM_CLOCK_DIVIDER to 66666). This will lead to 15us for shortest pause, and still correct pause increment on every burst. But whole frame will take 3.825ms. The cube refresh rate only 32.6Hz. This definitely will look nasty. 

  • Hello Liutauras

    While I rework the number, please bear in mind

    1. I still do not understand by 5us cannot be met by the timer for the Pause

    2. You may have to relax the refresh rate to 60Hz, if the SSI Transfer cannot be made faster along with the uDMA

    Regards

    Amit

  • Hey Liutauras,

    It is perfectly fine if you want to intialize the first settings with the TivaWare Routines, but sadly it won't work to just set a single bit to re-initialize the DMA. This is because the DMA will reset it's Control Structure once a Transfer is done.

    It is therefore important to understand what the DMA actually needs and what it does. In the beginning, you first show the DMA where its control structure is located (uDMAControlTable). That's where the DMA reads its information from. Those information is basically seperated in two parts. The Channel Control and its Transfer info.

    The Control is a permanently saved info. It will not change unless you do so. It contains things like

    • Arbitration Size
    • Destination Address Increase
    • Source Address Increase
    • Bit-width of each transfer

    The Transfer information however is partly volatile. Those will only last for one transfer. After that, this DMA Channel is stalled and waits for another Transfer info. Now, what is a transfer? It is defined by the following things:

    • Transfer Mode
      • Basic, Automatic, PingPong, ScatterGather..
    • Destination Pointer (fixed)
    • Source Pointer (fixed)
    • Transfer Size

    As i observed, only the Transfer Size and Mode will be reset upon a Transfer Completion. Which means, after your DMA completed n = Transfer Size movement operations, it will stall (reset its Channel Enable Bit) and wait for new transfer information.

    You were right as you stated that this info is always the same for you. Well lucky you! Then you can keep the pointers to your Control Structures of your SSI Channels and just reload them everytime in your ISR with the same value. This is exactly what i've done in my example code i posted earlier.

    Another very interesting option for your application would be the "Memory Scatter Gather" Mode of the DMA. Here, you could transfer the necessary data to ALL SSI-Modules in ONE Transfer. The DMA works with "Tasks" here and each task would transfer the colour data to the used SSI Module, then the next task takes care of the next colour etc..

    Very attractive is, that you would only need one or even no DMA re-initialization at all. Since the DMA goes from one task/transfer to another, you can send it in circles. However, i didn't try that out yet, but i do think i could work up some quick-and-dirty working code for you.

    I hoped that cleared up things for you.

    Sincerely,

    Michel

  • Hello,

    Amit, the problem is not a timer, problem is 800 cycles, cpu spending to restart dma transfer. You can debug my code, and look at the counter. Isr takes 10us, but I want to be ready in 5us for next transfer.

    Michel, so basically, to restart the dma, I must copy transfer information (4 bytes) from control table to dma controller, and set some bytes in registers. Actually I restarting 6 dma channels every time I need a burst. And doing this step–by-step. This my approach forcing cpu to work and spend these 800 cycles.

    And your idea, to start another dma channel, which will copy 4 control bytes for every dma channel. Totally I will copy 4bytes for 6 channels. So I need to activate alternative control table, in primary control table setup one dma task to copy from any array, where I put my SSI dma tasks, to alternative table. My primary task must be like this: transfer size 24, transfer items, 8 arbitration size 24, priority default. My current SSI dma transfers remain unchanged. Then just write bits to registers, and I’m done. And because dma transfer is quick, I will gain cpu time, compared to step-by-step copying. Right? At least it sounds sensible to me.

    I hope, I will find a time, and test this idea this evening. Thank you Michel for yours ideas and efforts

  • Hey Liutauras,

    wowowow.. Hold on, Speedy. :-)

    "[...] to restart the dma, I must copy transfer information (4 bytes) from control table to dma controller"

    - No, you write the transfer information directly into the Control Table. That's why i kept pointers to it and that's where the DMA works.

    "Actually I restarting 6 dma channels every time I need a burst."

    - Is there a reason you need the RX-Channels as well? Are you working with the answer of the LED-Driver? If you only want to send the colour information (as i thought you would) it simplifies the DMA operation to 3 channels. 1 TX Channel per SSI-Module.

    "My primary task must be like this: transfer size 24, transfer items, 8 arbitration size 24, priority default."

    - Hm careful here.. Your transfer size of your 'main' control structure would be the amount of data you transfer from your task list to your alternative control structure. Which sums up to: Source End Pointer (4 Byte) + Destination End Pointer (4 Byte) + Control Word (4 Byte). Alas, your transferred items are 12 Bytes at 8 Bit each.

    Peripheral Scatter-Gather is a quite crispy option.. Be sure to try the 'normal' setup first and then decide if further optimization would be worth it.

    "Then just write bits to registers, and I’m done."

    - Even the DMA has its limits and needs you to interfere sometimes. Intelligently solved, that'd reduce to few pointer arithmetics though and would be less than a few cycles. Again, the only registers you have to write, are the Channel Enable Registers. All the rest (and interesting stuff) is managed over the Control Structure!

    "[...] I will gain cpu time [...]"

    - Quite certainly, yes. I'm at 10 Instructions to reset 2 DMA-Channels (ADC Operation). Compared to your 800 cycles that's.. a lot.

    Glad i could help out a little. Feel free to message me if you experience trouble setting the DMA up. But again, think twice about heading straight for Peripheral Scatter Gather. Could come at a combined price of hair-yanking and frustration.

    Sincerely,

     

    Michel

     

  • Hello Michel,

    I tested dma in your way. Sure it works! And it fast. 127 cycles for whole isr.

    There is what I did:

    Exposed my table to the main, in Peripherals.h

    extern uint8_t uDMAControlTable[512];

     

    in main.c

    defined and initialized pointers

           /*

            * pointers to controltable. [offset to channel number in control table +

            *                 offset to control word in channel control structure]

            */

           uint16_t *pRedTxCW = (uint16_t*)&uDMAControlTable[0xD0 + 0x8]; //Red Tx ch 13

           uint16_t *pRedRxCW = (uint16_t*)&uDMAControlTable[0xC0 + 0x8]; //Red Rx ch 12

           uint16_t *pGreenTxCW = (uint16_t*)&uDMAControlTable[0xF0 + 0x8]; //Green Tx ch 15

           uint16_t *pGreenRxCW = (uint16_t*)&uDMAControlTable[0xE0 + 0x8]; //Green Rx ch 14

           uint16_t *pBlueTxCW = (uint16_t*)&uDMAControlTable[0xB0 + 0x8]; // Blue Tx ch 11

           uint16_t *pBlueRxCW = (uint16_t*)&uDMAControlTable[0xA0 + 0x8]; // Blue Rx ch 10

     

    Inside isr, enabled dma channels

    /*

     * DMA channel enable set directly into dma register + offset to DMAENASET

     * value = cannels 10 | 11 | 12 | 13 | 14 | 15

     */

          HWREG(UDMA_BASE + 0x28) = 0xFC00;

     

    Loaded control table with control word

          /*

           * Loading transfer mode(basic = 0x1) and transfer size( 0x4) into control table for each used

           * channel, because only these 2 variables will be reseted upon transfer completed, and

           * must be renewed for new transfer. And arb. size probably must be renewed.

           * value = arb. size | transfer size - 1 | transfer mode

           * Transfer mode basic = 1; transfer size = 3; arb.size = 4. The value = 0x8032

           */

     

          *pBlueRxCW = 0x8032;

          *pBlueTxCW = 0x8032;

          *pRedRxCW = 0x8032;

          *pRedTxCW = 0x8032;

          *pGreenRxCW = 0x8032;

          *pGreenTxCW = 0x8032;

    And it runs very nice.

    But I have lost synchronization with latch signal. It is in right place only for first transfer, when tivawre utilized. Looks like toggle latch fires up too early. Probably SSIBusy() not signaling correctly any more. There is saleae sessions attached, it much easier to look at the sessions instead pictures. Saleae can be downloaded at https://www.saleae.com/downloads. But i guess, you already know it.

    Regarding reception channels, at the moment they just placeholders. The drivers can report status of leds, so it is not pointless. But this is nice to have, not must to have.

    Also not tested scatter gather setup yet, must somehow sort out with latch. But its late time already…

    sessions.zip