This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DK-TM4C129X: Controlling relative Ethernet/µDMA bus priority

Part Number: DK-TM4C129X

I have an application which keeps a set of scatter-gather µDMA's continuously busy with about 30 Mbits/s.  The DMA's seem to function properly.  When I turn on transmission of the 30 Mbits/s via the TI-RTOS stack over the Ethernet, the DMA's experience an increase in jitter of about 400 ns.  This is tolerable in my design, but not desirable, since it is a performance-limiting factor.  In an attempt to minimize jitter caused by Ethernet operation, I set

HWREG(EMAC0_BASE+EMAC_O_DMABUSMOD) = BFM(HWREG(EMAC0_BASE+EMAC_O_DMABUSMOD), 1, EMAC_DMABUSMOD_PBL_M);
HWREG(EMAC0_BASE+EMAC_O_DMABUSMOD) |= EMAC_DMABUSMOD_FB

"BFM" sets the value of the PBL field to 1.

With this setting, I get the 400 ns of jitter.  I haven't yet looked at the stack source to see how these fields are treated, perhaps being overloaded frequently and nullifying my effort.

What is the DMA priority of the EMAC, relative to the µDMA?  Can it be set lower than the µDMA?

  • I checked, and the stack isn't overloading my settings.
  • The DMA is in Ping-Pong mode, over SSI1. I have set it to be the only high-priority DMA, and still see at least 250 ns jitter on the SSI clock bursts, even with the Ethernet unplugged. What could be causing the delay, and how do I avoid it?
  • Can you give us details on how you are measuring the jitter? Also, are you measuring latency (delay) or jitter (variation of latency)?

    Todd
  • Hi Leo,

    Could you please tell us which version of TI-RTOS you're on, too?

    Steve
  • Hi Todd and Steven:

    The attached oscilloscope trace, taken from a DK-TM4C129X board, shows

    1. a GPIO used to trigger 4 16-bit DMA transfers in a single arbiter-locked transfer to SSI1 as a "pusher" to force an SSI receive,
    2. the SSI1 clock, and
    3. the SSI1 data.

    The aqua jitter shadow between the oscilloscope cursors is 264 ns long.  Simultaneous with the Ping-Pong SSI output, there is a scatter-gather SSI input DMA which transfers 64 16-bit words.  I programmatically verify that an incrementing sequence I transmit over the SSI to a wire loopback makes it back via this DMA.  The SG DMA is looped, continuously transferring a request from memory location dmaTrigger to UDMA_SWREQ.  The request is usually zero, but, if software loads a request value to the location, it triggers a one-shot DMA which does some atomic hardware work synchronously with the looped DMA, and writes dmaTrigger back to zero.  The SG DMA also transfers EMACTIMSEC/EMACTIMNANO to memory after transferring a block from the SSI.  These gymnastics provide signal timing which ensures really good performance in the analog signal path.  The only arbiter lock greater than one transfer is the one which does the SSI output transfer which is jittering.

    This scheme runs SSI in, SSI out, UDP out, dmaTrigger requests via HTTP every 1/2 second and a UDP broadcast overnight every 5 seconds with no errors.  The "dummy" tasks in the SG DMA are required to avoid a fixed 16-bit offset of the data in the SSI receive stream.  I am measuring jitter.  My hardware design will provide external hardware to adjust out the fixed portion of the latency in this periodic scheme.

    I am using tirtos_tivac_2_16_01_14.  If TI-RTOS is doing high-priority, long-arbiter-lock uDMA's, that might be the cause.  But, then, I know of no uDMA usage registration scheme in TI-RTOS, so I would assume that isn't being done, because the OS probably doesn't have a way to know that I'm not using a channel it might select.  Is TI-RTOS occasionally globally-disabling the uDMA?  That might do it too.

    Thanks,

    Leo

  • Hi Leo,

    Thanks for the pic. What does the SPI transfer look like without anything else running? Can you replicate the issue with just using SPI driverlib calls. I'm trying to figure out the best way to handle this one.

    Todd
  • Here is the jitter with SSI RX disabled and the Ethernet cable unplugged:

    With SSI RX disabled and Ethernet plugged in, with or without high-speed UDP, there is no change from the above trace.  With SSI RX enabled and the Ethernet unplugged, there is no change from the above trace.  When I go back to full load, as shown in the very first trace in this thread, I'm now seeing about 148 ns.  This may be an address dependency, where moving a buffer somewhere and changing its alignment is causing access delay.

    I just constrained the alignment of my DMA data buffers to 256 bits, since I read in the manual that uDMA uses "memory striping," which sounds like multibank wide accesses, so this may help.  The command structure table is of course already aligned.  I also constrained the task list alignment to 128 bits, which is the task size: 32*4=128.  If tasks are being copied into the alternate structure using misaligned accesses, that will surely be time-consuming.  Now I see about 280 ns!  Trying multiple hardware resets and disconnecting the debugger doesn't change the observed jitter.  Neither does a complete power-cycle.

    Given the speed at which the design runs, I have to use DMA, and I'm reluctant to shoehorn in the driverlib implementation.  That is the threshold of effort at which I am willing to live with the problem.  It is tolerable -- just a performance loss I would rather not have.  A reduction in jitter of t nanoseconds allows me to decrease my ADC sample interval by just that amount.  If there isn't a hardware reconfiguration trick I can try, it's probably time to give up on this one.

    One potential lead: I note that the SSI burst length is 256 ns, about the same length as the jitter, at least in this compile.  I wonder if there is timeslot-like behavior being excited in the SSI.  I haven't seen any configuration bits for the SSI which appear that they might affect that kind of behavior.  I do have HSCLKEN set, because I think I must have it to work at 66 MHz.

    My current bet is that the timing of an off-boundary address alignment is enough to kick the SSI out of timeslot-like behavior.  I will save investigation of that for sometime when the workload is smaller.

    Best regards,

    Leo

  • By moving the trigger to the SSI clock, I get this trace:

    Now, it looks highly probable that the SSI is cogging at its own transfer interval.  The real jitter, showing up as the transition width on the yellow trace, is less than 100 ns, while the transitions are separated by the SSI transfer time.  Perhaps the SSI TX needs to be a task which writes to an "End of transmission" bit, to keep the SSI from doing some sort of end of transmission timeout.

  • I note in hw_ssi.h a bit named "SSI1_CR1_EOT" which is not documented in the manual. Is this perhaps a bit to be set to force the SSI to completely transmit all the contents in the TX FIFO, maybe preventing it from cogging the data transmission? What can be done to force the SSI to transmit data the instant it appears in the TX FIFO, instead of waiting for a full frame occasionally?
  • The delay always occurs right after service of SSI RX SG DMA interrupts. I attempted to write SSI1_CR1_EOT with "1" in the ISR, but that doesn't change anything. If I force alignment of the pusher DMA data buffer to an odd halfword boundary, the no-load scenario has a significantly-lower frequency. Word boundary alignment has a higher frequency. However, the average frequency of the problem with UDP packetization running is quite high, about 2300 Hz. When I have time, I will try SG pusher DMA. Right now, it is ping pong, under the assumption that the bus usage is better for that mode.