This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430F5529: Under what conditions does the USB Module issue a stall in hardware?

Part Number: MSP430F5529

Hello all,

I'm trying to add MSP430 support to the tinyusb USB stack, because it's what I'm used to and I figured a port wouldn't take too long.


The main functionality is working, but I've reached a snag where sometimes EP0 stalls without having set the STALL bit in USB{O,I}EPCNF_0, or traversing any of the STALL/clear STALL code whatsoever. Since I can't duplicate this behavior consistently, I believe there's a race condition somewhere, and I need to know under what conditions the USB Module will issue a stall on EP0. This is not mentioned anywhere in the Family Reference Manual.

Below is an example trace I captured of when a stall occurs on EP0 (data available upon request):

From experimenting, I've determined that I can force a STALL to occur by doing the following:

  1. Decode the Setup Packet.
  2. If the direction bit is set, clear the DIR bit in USBCTL, clear the NAK bit in USBOEPCNT_0, and set the NAK bit in USBIEPCNT_0.
  3. Upon receiving an IN packet from the host after all these conditions are met, the USB Module will issue a STALL to the host.

I'm not able to get an OUT xfer in the Status Phase to stall, even when I invert the directions in step 2.

My question is... are there other conditions that would cause the USB Module to STALL, or is this the only case? From staring at my code, I cannot currently find any code paths (even with interrupts) that would result in the above scenario, so my hope is that there are other combinations of cleared/set bits that can cause a stall when an IN packet is seen on EP0. Unfortunately this isn't documented in the manual, and I don't have access to a block diagram which could show me the STALL behavior.

  • Hi William,

    I have reached out to our USB guru to see if he can help with your question.

  • Hi Dennis,

    Thank you very much! I've made no progress on my end debugging the STALL behavior. If it's possible, could you forward this follow up question to your USB guru:

    Was the MSP430 USB Module designed in-house by TI, or was it purchased IP from a third party (analogous to how the Synopsys DesignWare OTG is popular)? Rationale: If it's a third party core, I can seek out an IP manual and technical support from the IP designer if necessary.

  • Hi William:

    Thank you for your interest in porting tinyusb to our device. I don't know much about the tinyusb stack, but I will try to provide some aid.

    • The MSP430 USB Module was designed in-house by TI.
    As you may have seen in the Family User's Guide, the only thing that makes the USB module send out a STALL is by setting the STALL bit in software.
    • Any other behavior that will cause another STALL would be related to the USB 2.0 spec defined by the USB org.
    • A quick browse through the spec can suggest a few things, such as "is the host halting the endpoint?"
      • Your mention of the DIR bit/packet mismatch causing a STALL is new to me, but seeing that you only get this behavior in one direction, I wouldn't chase that rabbit hole first.
    • What is the SETUP transaction that it is failing on? Is it truly an invalid request? Can you make it consistently fail on the same SETUP transaction, or is it a different one every time?
    • To throw something else your way: I don't think this is related, and it seems to suggest the opposite of your problem, and it was fixed many silicon revisions ago, but if you check the errata USB6 in this document: http://www.ti.com/lit/er/slaz314ac/slaz314ac.pdf
      • You will see that the STALL bit can be accidentally preemptively cleared. Depending on your silicon revision, this should no longer be relevant, but I figure it's worth checking.
      • Perhaps the STALL bit was set somewhere but it got cleared so you don't see it if you were debugging.
    • Does tinyusb use multiple threads? That could make for trickier debugging so another thread could be messing with the STALL bit, or is making the endpoint halted in some way that is not evident to the main thread.

    Let me know your thoughts.

  • Hello Wallace,

    These are some good questions and feedback :). Let's get started:

    >Thank you for your interest in porting tinyusb to our device. I don't know much about the tinyusb stack, but I will try to provide some aid.

    Tinyusb is what I'm used to using from doing various STM32/Synopsys DesignWare OTG ports, so I figured I'd do an MSP430 port since I'm actually more comfortable w/ MSP430 in general.

    > The MSP430 USB Module was designed in-house by TI.

    Alright, good to know. Not sure if you have access to a block diagram or schematic, but at least I don't have to contact a third party for more info :).

    >As you may have seen in the Family User's Guide, the only thing that makes the USB module send out a STALL is by setting the STALL bit in software.

     I should've made more clear. In the photo I attached, there are two STALLs. The STALL at index 1217215 is a legitimate protocol stall as defined in 8.5.3.4 of the USB 2.0 spec. TinyUSB unconditionally STALLs the control endpoint for that particular device request as unsupported. I set some watchpoints using mspdebug for the USB{O,I}EPCNF_0 regs, and during the STALL at index 1217215, I get a watchpoint hit, as well as a breakpoint hit indicating the TinyUSB device driver function `dcd_edpt_stall` was called.

    However, for the STALL at index 1259519, neither the watchpoint at USB{O,I}EPCNF_0 or the breakpoint in `dcd_edpt_stall` are hit. Both only seem to be hit for the protocol stall at index 1217215. This is why I think the hardware itself is stalling (and not notifying the software!).

    >Any other behavior that will cause another STALL would be related to the USB 2.0 spec defined by the USB org.

    >A quick browse through the spec can suggest a few things, such as "is the host halting the endpoint?"

    I will need to get back to you on both these points. Admittedly, I'm not familiar with all the conditions in which a device will STALL per the spec, just "unsupported control request" and "the device malfunctioned". And both these cases would be handled by setting the STALL bit in software.

    >Your mention of the DIR bit/packet mismatch causing a STALL is new to me, but seeing that you only get this behavior in one direction, I wouldn't chase that rabbit hole first.

    Ack. It's certainly possible (probable) I'm missing something.

    > What is the SETUP transaction that it is failing on? Is it truly an invalid request? Can you make it consistently fail on the same SETUP transaction, or is it a different one every time?

    Interesting question. All the requests my software fail on are valid requests. If a run of my USB demo STALLs, there are a number of SETUP packets it will STALL on. If I reset the demo from within the debugger, the STALL on the next run, if it STALLs, will normally be on the same packet, though I've seen two consecutive runs fail on different SETUP transactions. If I wait a while between a batch of demo runs, the packet on which the demo STALLs will change and/or the demo won't STALL on multiple runs for an indeterminate period of time. This makes me think of a race conditions.

    I am able to modify the demo to purposely cause a consistently cause a STALL on the same packet every run without touching the STALL bits. However, this isn't the source of the bug :). Perhaps I should upload a MCVE of deliberately causing a STALL without touching the STALL bits. Would that help?

    >To throw something else your way: I don't think this is related, and it seems to suggest the opposite of your problem, and it was fixed many silicon revisions ago, but if you check the errata USB6 in this document: http://www.ti.com/lit/er/slaz314ac/slaz314ac.pdf

    • You will see that the STALL bit can be accidentally preemptively cleared. Depending on your silicon revision, this should no longer be relevant, but I figure it's worth checking.
    • Perhaps the STALL bit was set somewhere but it got cleared so you don't see it if you were debugging.

    I will get back to you on the revision when I can check. Re: "the STALL bit got cleared", since a watchpoint doesn't trigger during these STALLs, I'm not sure the software is writing/reading the STALL bit at all.

    >Does tinyusb use multiple threads? That could make for trickier debugging so another thread could be messing with the STALL bit, or is making the endpoint halted in some way that is not evident to the main thread.

    TinyUSB uses one main thread, and interrupts. In my application, the Timer CCIFG0 interrupt and USB interrupts are enabled. I suspect some pathological interaction between interrupts, but seeing as the STALL appears to be happening without software intervention, I'm unsure of where to look.

    And that about covers everything for now :).

  • stall-mcve.zip

    I have attached a MCVE for MSP430F5529 in which I can deliberately trigger a STALL, seemingly without ever touching the STALL bits.

  • Hi William

    I have a similar issue with a project I am busy with using the MSP430F5529 but with the sample USB code developed for the Launchpad. I just wanted to verify if it is a similar issue.

    Does the micro "Stall" or is it only the USB?

  • >Does the micro "Stall" or is it only the USB?


    I'm afraid I don't understand the question. If by "Does the micro "Stall"" you mean "does the microcontroller crash?", no the microcontroller never crashes and continues servicing tasks. Only the USB STALLs.

  • Hi William:

    • Unfortunately I do not have access to a block diagram or schematic either.
    • I was talking with some others internally here who were involved with the F5529's USB software, and they suggested that another good resource is the official MSP430F5529 USB Developer's Package. Inside, there is our USB stack that handles some silicon errata regarding the USB. For production, we would not recommend using software that is not in the stack that we developed for this reason. However, as I understand, since this is more of a personal project, the USB stack is a good reference guide.
    • Although I can only give you my best guess, I surmise that it is uncommon for USB hardware IP's to send STALL handshakes on its own (even though this is what we are seeing here on the surface). Usually it is expected to be done by the software.
      • However, it is possible.

    • And I agree with you, I think it would be very difficult for someone to be familiar with all the conditions in which a USB device will STALL. Myself included. :)
    • The behavior you are describing (the inconsistency of which SETUP transaction triggers this issue), I think is our best lead so far.

    In your MVCE, I'm not really sure what the host is doing, so I could be off-base, (and I'm not sure how close this is to your actual stack code, nor what ECHO_MODE is), but:

    • I would like to see if this behavior persists if we follow the User's Guide's instructions in 42.3.1.3 Control Read Transfer in http://www.ti.com/lit/ug/slau208q/slau208q.pdf
    • There are some differences such as clearing NAK for both IEP0 and OEP0.
    • There are also 2 NOTEs in this section that could be helpful to you. Both also seem to address potential timing issues.
    • For the data stage, if I were to ignore ECHO_MODE for a second, I indeed see that the DIR bit is set opposite of the bus direction (because the host presumably sends an IN token packet which is shown in your trace). I also notice that you clear the NAK for OEP0 but not IEP0. Let's try clearing the NAK for IEP0 since you get STALLs on your control read requests, but not your control write requests.
    • Maybe there is something in the status stage too, but there doesn't seem to be code for it in the MVCE, and seems like we would usually get stuck before that point (unless the issues comes from a previous transaction, which is worth investigating if the above does not work).

    Let me know your findings.

    Thanks.

  • Hi William:

    >Unfortunately I do not have access to a block diagram or schematic either.

    I see, was worth a try!

    >I was talking with some others internally here who were involved with the F5529's USB software, and they suggested that another good resource is the official MSP430F5529 USB Developer's Package. Inside, there is our USB stack that handles some silicon errata regarding the USB. For production, we would not recommend using software that is not in the stack that we developed for this reason. However, as I understand, since this is more of a personal project, the USB stack is a good reference guide.

    I have referred to the official USB stack while developing this port for a variety of reasons, including information on STALL behavior; I have not found any info on the STALL behavior, and it looks like I'm setting the DIR bit correctly when scheduling xfers in tinyUSB.

    >Although I can only give you my best guess, I surmise that it is uncommon for USB hardware IP's to send STALL handshakes on its own (even though this is what we are seeing here on the surface). Usually it is expected to be done by the software.
    However, it is possible.
    A hardware-induced STALL is my best guess at the moment, and I was hoping someone who designed the IP (if they're still around) could confirm or deny. The actual bug in my application (the trace in my first post) is likely a race condition. But without knowing what causes the STALL, I can't hone in on possible bug culprit.

    >And I agree with you, I think it would be very difficult for someone to be familiar with all the conditions in which a USB device will STALL. Myself included. :)
    My current impression is that indeed a STALL is normally controlled in software, but hardware can STALL if things go wrong. USB doesn't forbid a functional STALL on EP 0, but it's not recommended.

    >The behavior you are describing (the inconsistency of which SETUP
    transaction triggers this issue), I think is our best lead so far.
    Ack.

    >In your MVCE, I'm not really sure what the host is doing, so I could be off-base, (and I'm not sure how close this is to your actual stack code, nor what ECHO_MODE is), but:
    You can ignore ECHO_MODE. I used it for debugging to show that these 200 lines of
    code are sufficient to get the USB core to respond to SETUP packets.

    The MCVE was me copying and pasting code from the stack, but only the bare minimum to get an xfer working (e.g. tinyUSB itself uses queues and interrupts to start IN and OUT xfers, I don't bother in the MCVE).

    >I would like to see if this behavior persists if we follow the User's Guide's instructions in 42.3.1.3 Control Read Transfer in www.ti.com/.../slau208q.pdf
    I'll take a look.

    >There are some differences such as clearing NAK for both IEP0 and OEP0.
    TinyUSB as of right now schedules SETUP transactions one stage at a time sequentially. I.e. it will wait for the Data Phase to be done before scheduling to sent/receive a packet in the Status Phase. For reasons unrelated to this bug, this behavior will change in the near future (i.e. a flag will control whether to schedule an OUT ZLP response early or not).

    >There are also 2 NOTEs in this section that could be helpful to you. Both also seem to address potential timing issues.
    Ack, maybe I missed something. I'll take a look.

    >For the data stage, if I were to ignore ECHO_MODE for a second, I indeed see  that the DIR bit is set opposite of the bus direction (because the host presumably sends an IN token packet which is shown in your trace). I also notice that you clear the NAK for OEP0 but not IEP0. Let's try clearing the NAK for IEP0 since you get STALLs on your control read requests, but not your control write requests.
    A few things here:

    •  Just to reiterate, the DIR bit being set the opposite of intended is deliberate.
    • I've not been able to get the host (Windows 7) to do a control write xfer during enumeration. All control requests I've seen for my msp430 code (including tinyUSB demos and this MCVE) have been control reads. When I say "I'm not able to get an OUT xfer in the Status Phase to stall, even when I invert the directions in step 2.", as in my original post, I am referring to the Status Phase of a control read.
    • For Status Phase OUT ZLP, I have been unable to create a STALL. The DIR bit does not seem to matter during the Status Phase, but for consistency with how my tinyusb code is structured, I always set it to the proper direction when scheduling an xfer.
    • Clearing NAK for OEP0 but not IEP0 is required to get a stall. The USB core works fine if NAK is cleared for both.

    >Maybe there is something in the status stage too, but there doesn't seem to be code for it in the MVCE, and seems like we would usually get stuck before that point (unless the issues comes from a previous transaction, which is worth investigating if the above does not work).
    "Spooky action from a difference" would be interesting. The USB Module STALLs as soon as a SETUP packet is received, and Windows' response is to resend the SETUP packet. So we never get to the Status Phase.

    Because of this, I've only tested "setting the DIR bit to the opposite of intended and clearing one NAK bit but not the other" for IN and OUT xfers separately. E.g. if I'm corrupting IN xfers, the code to set up OUT xfers works fine; if I'm corrupting OUT xfers, IN xfers are scheduled correctly. Since the STALL causes a new IN xfer to be sent in response, I didn't bother setting up an OUT xfer. But I will try and see what happens.

    I think there are a few permutations of behavior I can check:

    •  Do I schedule IN and OUT at the same time, or do it sequentially?
    • What value do I use for DIR during Data Phase? Status Phase?
    • Which NAK bits do I clear (IN, OUT, or both)?

    That's 2*4*3 = 24 combinations right there.

  • Hi William:

    OK, this sounds good. Thanks for sharing. I want to clarify, so my understanding of the current situation is that:

    • There is an overarching issue involving a STALL that is inconsistently occurring, even without software intervention. But none of us are sure why, and this is hard to diagnose.
    • The MCVE (and by proxy, the TinyUSB stack) intentionally reverses the DIR bit to cause a STALL, and does not STALL if both IEP and OEP NAKs are cleared.
      • However, clearing both NAKs is recommended by the User's Guide for control reads.
      • This MCVE may highlight that the USB core generates extra STALLs that are not commanded by software. This is entirely plausible. However, I can neither confirm nor deny with confidence at this point in time.
      • Since this behavior has not yet been encountered or reproduced in our official MSP430 USB Stack (because we do clear both NAKs), this is still our golden standard for you to freely compare to at your behest.

    Hopefully with the documentation and suggestions we have provided so far is enough to help you find the solution you need. Unfortunately, there is only so much we can help diagnose a non-TI provided software stack. And my apologies for not being able to confirm a list of USB core STALLs. If there is something related to the MSP430 USB stack we can help with, we would be glad to.

    If this helps you resolve the concerns we can help you with, please mark the thread as Resolved. Thanks!

  • Unfortunately, this will not solve my issue. There is a decent reason I'm not clearing both NAK bits at the same time in TinyUSB; it's an assumed invariant (right now) that the Data Phase and Status Phase will be scheduled sequentially, not in parallel. And clearing both NAK bits at the same time in parallel breaks this invariant.


    With the correct guard code to share data between threads, a STALL should never happen anyway- meaning it's a bug/race condition in my code. I'll keep looking into it to see if I can't find anything I missed.

  • Hi William:

    I was trying to understand what you meant by parallel, but I don't think I have figured it out. Even if it were to be scheduled in parallel (by multiple threads?), it still has to be processed sequentially on the device since you can't start the Status Phase until the Data Phase is finished. And it also cannot receive both phases in parallel since it is a serial bus and they both must go to the same endpoint. Maybe you can help me understand?

    If I may add in, please notice that the instructions for control read requests are different for every stage. Clear both NAKs for a Setup Stage, only IEP0 NAK for Data Stage, and something different all together for Status Stage.


    Hope this provides some guidance.

  • >Maybe you can help me understand?


    Sure! My tinyUSB driver is completely interrupt driven for IN and OUT xfers. This includes notifying the stack that an xfer (possibly of multiple packets- it's up to the stack to decide how much to send/receive) has completed.

    In the case of Control Reads on EP 0, tinyUSB is designed in such a way that the OUT ZLP response will not be schedule until it is absolutely sure the IN xfer has ended. In other words, the OUT EP 0 should not be invoked at all while the IN EP 0 transmit is in progress.

    A transfer is considered scheduled in my MSP430 driver when the NAK bit for the endpoint has been cleared, the corresponding EP interrupt has been enabled, and the CNT register has been programmed with an acceptable size. Clearing the NAK bit happens last. Eventually, the interrupt handler will notify the stack that the xfer has completed via an event queue, at which point (on EP 0) tinyUSB will schedule the OUT ZLP response.

    It looks like I missed something important: You're only expected to clear _both_ NAK bits during the Setup Phase. After reading a packet and clearing the SETUP Received Interrupt flag, does the core automatically set the NAK bits on both endpoints again? My reading of the manual is yes. Anyways, this changes a lot of stuff around in tinyUSB, and I need some time to think.

    Incidentally I can't find any indication that the official USB Stack from TI clears the NAK bit for IN and OUT EP0 when doing SETUP packet handling; NAK implicitly gets cleared in functions like "usbSendNextPacketOnIEP0", but I can't find the Setup packet handling code in the official stack.

  • I did a bit of testing myself, and here's what I found that may have contributed to some misconceptions of my own. Consider this a "post-mortem analysis":

    • Clearing SETUPIFG does not set NAK in USB{O,I}EPCNT_0. You must clear these manually, otherwise upon immediately reading USBVECINT and getting 0x20 (SETUP packet received) back, you have two live xfers on the control pipe just waiting for the host to send an IN or OUT packet.

      I wish this was more clear in the manual one could make the argument that "since it's never stated that SETUPIFG sets NAK on IN/OUT EP 0, assume that it doesn't." What threw me off is that in the Data Stage Transaction of the manual page you linked me, the first step is to explicitly clear the NAK bit; this made me think that SETUPIFG does it for you, because why would this be mentioned otherwise?

    • When SETUPIFG is cleared, and assuming a 16 MHz CPU clock, a ZLP IN xfer can (and pretty consistently does) complete in as few as the following two C statements. This threw me off, because it feels like that's far too short a duration for the xfer to complete. But then again, I don't have a schematic :). I have attached two C code snippets I used for testing below- assume a 16 MHz CPU clock. 

       
      // Snippet 1
      uint8_t oepcnt_b = USBOEPCNT_0; uint8_t iepcnt_b = USBIEPCNT_0; uint8_t usbifg_b = USBIFG; uint16_t curr_vector = USBVECINT; uint8_t iepcnt_a = USBIEPCNT_0; // NAK bit will be zero at 16 MHz clk. uint8_t oepcnt_a = USBOEPCNT_0; uint8_t usbifg_a = USBIFG;

      // Snippet 2
      uint8_t oepcnt_b = USBOEPCNT_0; uint8_t iepcnt_b = USBIEPCNT_0; uint8_t usbifg_b = USBIFG; uint16_t curr_vector = USBVECINT; uint8_t oepcnt_a = USBOEPCNT_0; uint8_t iepcnt_a = USBIEPCNT_0; // NAK will be set by this point, ZLP xfer succeeded! uint8_t usbifg_a = USBIFG;


    • As you probably realize by now, my driver isn't fast (and tinyUSB is meant more for simplicity than speed). This is why the Data Phase IN and Status Phase OUT are not scheduled "in parallel", but rather the Status Phase xfer isn't scheduled (see above post for what I mean by "scheduling") until we know the Data Phase (if any) has fully completed.

      "Both EP0 NAKs are never cleared at the same time" was an invariant that I would assume would hold throughout my driver because of how tinyUSB works. However, I realized tonight it's not necessary; tinyUSB waits until a setup packet arrives passively (i.e. only reacts to a SETUP packet if it's in the event queue). Since a well behaved host won't send either an IN or OUT before a SETUP packet, it doesnt matter if both NAKs are cleared because the host will never send/receive to/from EP 0. Since receiving a SETUP packet sets SETUPIFG, and SETUPIFG forces NAKs on EP0, as long as I set manually both EP0 NAKs immediately before clearing SETUPIFG, tinyUSB can't tell that I've been breaking invariants behind the scenes :).

      The random STALLs disappeared completely when I added code to clear EP0 NAKs every time a bus reset occurs, and to always SET EP0 NAKs before clearing SETUPIFG.[1][2] Ideally I should also be clearing EP0 NAKs after the Status Phase is over, but tinyUSB currently has no hooks to do that. I'll talk to the maintainer about this.

    Tentatively, I mark my issue as resolved, but I still want to do testing w/ EP0 NAKs after the Status Phase, which could be a few days/weeks. Is it common courtesy to mark the issue as resolved anyway at this point?

    Footnotes

    1. Actually for the sake of full disclosure, adding just code to "always SET EP0 NAKs before clearing SETUPIFG" and keeping my code to set NAKs during a bus reset as-is is also sufficient to prevent the STALLs. But I should do this the right way :).

    2. Ironically enough, with the changes above, I was able to capture traces where the host just stopped sending packets after the intended protocol STALL. I wish I captured a screenshot! They went away when I reset my Launchpad/debugger however. I'm not about to debug the Windows USB host stack to figure out why it gave up :).
  • Hi William:

    Awesome, great to hear that you dug up something that can move you forward.

    • If you are unblocked from moving forward at this time, marking this thread as Resolved would be helpful for our thread owner to take it off his task list. If you run into issues later you can re-open this same thread and we will be notified or post a new one if it's not the same type of issue. Thanks for the conversation, I enjoyed it.
    • The MSP430 USB stack as you can see doesn't really follow the stages. It's neither right nor wrong, the software architect at the time decided to do it this way. So there is not a clear moment where the stack clears both NAKs as we are looking for. It's a very deliberate and controlled program flow, although I will point your attention that the functions like usbClearOEP0ByteCount and usbSendNextPacketOnIEP0 alsoclear the NAK even though it is not explicitly noted.
      • EP0 NAKs are also explicitly cleared in USB_reset(), as you mentioned a similar situation in your own code.
    • On all the other points you mentioned, I agree! Thanks for the "post-mortem analysis" that future engineers can use when they refer to this thread.

    Looking forward to hearing about the completion of this port!

**Attention** This is a public forum