This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TM4C1294KCPDT: USB integrated DMA generating oversize full speed packets

Part Number: TM4C1294KCPDT

I have a project that uses the TivaWare/usblib/device library to drive the processor's USB engine as a bulk device with two endpoints (one in and the other out). I've had the code working correctly manipulating the USB FIFO directly. I'm trying to get the system working with the integrated DMA engine for sending data on the IN endpoint, but I get occasional oversize full speed packets (> 64 bytes). This failure usually happens after about 1000 blocks have been sent successfully (with a few hundred bytes in each block).

Where should I start looking to find the error? There is no obvious difference between the block that fails and any of the preceding blocks except that the last packet sent in the block is > 64 bytes.

  • Hello Peter,

    As Charles mentioned, we have not gotten a USB + DMA example put together. From what I investigated, the USB library hooks for DMA are not well put together and the scope was limited. Because of the limited requests for this feature and the complexity to develop it further, we ultimately focused efforts onto other more widely applicable applications to improve collateral and release apps notes around so it is not something we will be looking to support in the near term.

    For your specific issue, there are not any hooks in place for DMA + Bulk mode. That is likely where this is breaking down as there are USBLibDMA API's used to manage the the DMA interface, but those are really only implemented for the MSC and Audio interfaces. Using DMA with Bulk and CDC is not natively supported by the USB library and trying to use the DMA without adding those hooks in seems to result in bugs typically.

    I would guess the root cause for the issue occurs in usbdbulk.c and if you add the file to your project, you'll be able to use breakpoints to debug individual functions.

    Best Regards,

    Ralph Jacobi

  • Thanks for the effort Charles. That is pretty much what I found.

    I've actually got the code substantially working - I get a few hundred to a few thousand successful bulk transfers before failure. Using a USB analyzer I can see the failure mode is a bulk transfer that ends with a >64 byte packet. The total number of bytes in the bad transfer is correct, it's just that the USB engine attempts to transfer too many bytes as the final packet. For example a 400 byte block might be transferred as 3x64 byte packets then a 208 byte packet. That block should be sent as 6x64 byte packets then a 16 byte short packet to terminate the block. The failure is not a function of block length - I see many blocks of the same size as the failing block sent correctly prior to failure.

    It's hard to imagine what could cause occasional oversize final packets to be generated by what seems could only be the silicon. I could be poking the DMA/Endpoint registers at an unfortunate time, but again it's hard to image how I might cause that result. I have checked that the endpoint configuration registers are correct following the failure by investigating them using the CCS debugger without spotting any red flags.

  • Hi Ralph,

    I have code that is almost working, mostly bypassing the TI library code (that requires removing the DMA handling in USBDeviceIntHandlerInternal). I've stripped down the application code I'm working with to simplify generation of the data being sent across the wire. My current suspicion is that I'm managing to start a new DMA transfer before the previous USB transfer has completed.

    If I get the issue sorted out I'll post at least some code fragments to drive bulk DMA transfers.

    PS: I've now written two replies to this thread using FireFox and haven't been able to send either of them (clicking the Reply button closes the reply edit window, but doesn't successfully send the reply). Desperation has driven me to try Edge.

  • Hi Peter,

    Okay that would definitely help with making it simpler then. I wonder if there is a way you could count how many transfer are being started by the DMA vs how many USB transfers are completing and then see if you get out of sync? Would be two separate areas to measure and you could output the results periodically. Just a thought about how to debug that based on your description of the assumed issue which would be my initial guess as well. Do you think there are any NAK's or retries perhaps going on that could be why the USB transfer didn't complete before the next DMA transfer is triggered? Just another thought about what could cause the bus to be delayed a bit.

    Best Regards,

    Ralph Jacobi

  • Hi Ralph,

    the data blocks I'm sending already have a serial number so I know with my simplified code that I'm missing the first few blocks altogether. I use a logic analyzer and waggle output lines so I know when the DMA code is doing business. I also monitor (using the logic analyzer) a serial debug output that prints diagnostic information and I have a USB analyzer so I can see the data going across the wire. Both the logic analyzer (a Saleae Logic 16 Pro) and the USB Analyzer (ellisys USB Explorer) let me set an arbitrary time 0 so I can correlate the data collected by both to get a pretty good overview of what's going on from the outside. I can't see the communication between the USB's integrated DMA, the USB FIFO and the USB engine so I'm struggling to figure out what I might be doing to cause the >64 byte packet issue.

    I do see NAKs and ZLPs in expected places. In principle the code holds off sending the next block until the last block has been processed - but there is a gap between "principle" and "practice" somewhere!

  • Hi Peter,

    I don't really have concrete ideas myself, looked into this a bit from a datasheet standpoint just to be able to help brainstorm this better. I don't have much insight to the DMA <-> FIFO <-> USB engine inner workings myself, we are probably looking at the same block diagrams.

    Which DMA Mode are you using then, 0 or 1? For mode 0, this segment is mentioned:

    For Rx endpoints operating in Request Mode 0, the DMA request line goes
    high when a data packet is available in the endpoint FIFO and normally goes low at the end of the
    cycle in which the 8th from last byte starts to be processed (which happens two transfers minus
    one clock cycle in advance of the transfer containing this byte). The request line also goes low if
    the CPU clears the RXRDY bit in the USB Receive Control and Status Endpoint n Low
    (USBRXCSRLn) register.

    That's about the only thing I read so far that made me think of something that could be related to this particular issue.

    Best Regards,

    Ralph Jacobi

  • Hi Ralph,

    I hauled out out Tiva C series dev kit and had a play with the MSC example project. That works as expected and is using bulk DMA transfers so at least the silicon works (expected really, but reassuring to have it confirmed). That did however give me some transfer timing to compare against my previous non-DMA solution, which turns out to be as fast as the DMA transfers in the example project and pretty close to as fast as can be achieved with Full Speed (12MHz) USB. So that prompted me to put aside the DMA solution for now and look harder at where our real bottle neck is.

    Eventually I'll come back to the DMA code because it unloads a fair chunk of work from the processor and eventually we want that processing power for other purposes, but for right now DMA isn't going to solve our problem. I'm pretty sure my issue is related to setting up another DMA transfer before the previous USB transfer has completed, but I haven't been able spot the problem code so I'll leave it as a problem for another day.

    Thanks for the hint toward MSC. That at least made me reassess what I was doing!

  • Hi Peter,

    Thanks for the update and I'm glad the MSC example helped you uncover the system issue better. I think one of the reasons why DMA hasn't been more widely requested is that it isn't needed to hit those speeds so the main benefit is simply the offloading.

    FYI this thread may lock before you get back around to picking up DMA again, but you'll be able to reference it with "Ask a related question" so we'll see the prior history here when you do pick up on the DMA investigations again. In the meantime I'll mark your latest reply as the 'resolution' for this particular thread.

    Best Regards,

    Ralph Jacobi