This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Problem with USB DFU bootloader on F28069

Other Parts Discussed in Thread: CONTROLSUITE, TMS320F28069, UNIFLASH

We have implemented USB DFU bootloader support on an 80MHz TMS320F28069 using the boot_loader code from controlSUITE (f2806x\v141\MWare\boot_loader\) and communicate to it with unmodified dfuprog (f2006x\v141\MWare\tools\dfuprog\).

Most of the time this works perfectly. However, about 20% of the time, dfuprog reports that it cannot find the target device.
For some host PCs, the behavior is even worse - suggesting that this is possibly related to a subtle timing issue.

I have traced the failure and found it consistently occurs when the bootloader returns a NAK in response to the DATA IN phase of a DFU_GETSTATUS command.

The two attached USB packet captures show the successful and failure cases:


In DFU_Good.pdf, transfer 1232 is a successful DFU_GET_STATUS, transfer 1233 is a successful DFU_REQUEST_TI command, and transfer 1234 is a successful DFU_DOWNLOAD.


In DFU_Fail.pdf, transfer 2017 is a NAK'd DFU_GET_STATUS, and 2018 is a NAK'd DFU_REQUEST_TI command. The operation aborts at this point.

These transfers are performed by the TIDFUDeviceOpen() function in tidfu.cpp.  I've attached the relevant code snippet.
The code does not check the return status of DFUMakeDeviceIdle() or CheckForTIProtocol(), and I see no code to perform a retry on NAK.
I have confirmed that this implementation is the same in the latest version (V151) in controlSUITE 3.3.9

Is this a known issue? Any recommended fix? Any thoughts as to why the boot_loader code responds with NAK some of the time?

By the way, all F28069 code is built with CCS 6.1.1.  dfuprog is built with VS2013.

Update 20160404:

Based on additional USB  traces, it appears the problem may occur when  DFU_GETSTATUS is the first command sent to the bootloader following  the USB SOF packet).


Bill

DFU_Good.pdfDFU_Fail.pdftidfu_snippet.cpp

  • Hi Bill,

    Is this only happening when trying to Download an Image? Or does this also happen when you are trying to enumerate (-e) or clear the flash (-c)?

    sal
  • This may also be a helpful thread for us. Putting it here for easy reference.

    e2e.ti.com/.../325729
  • Sal,
    Thanks. The thread does look useful. I'll read through it carefully.

    As for your first question, the problem occurs on any attempt by dfuprog to access the bootloader on the board, including just doing enumeration or clearing flash. The error always occurs when dfuprog calls TIDFUDeviceOpen(). That function is always able to properly obtain a vallid device handle, and successfully call GetDeviceDescriptor(), GetConfigDescriptor(), and FindDescriptor() for the first (and only) interface. Similarly it successfully calls FindDescriptor() for the DFU descriptor

    The problem occurs when TIDFUDeviceOpen() calls DFUMakeDeviceIdle(). In that function, DFUDeviceStatusGet() fails, due to the device NAKing the DATA IN phase of the DFU_GetStatus USB message.

    Since DFUDeviceStatusGet() returns an error (Unknown error), DFUMakeDeviceIdle() returns non-zero. However, TIDFUDeviceOpen() does not check the return status, and continues to try to access the device -- however, the device NAKs any subsequent transfers.

    Note that this does not always happen. the two packet traces I uploaded show one failure and one success. On one Win7 PC, I get success about 80% of the time. On a nearly identical PC running Win8.1, I get about 5% success. The USB traces show no obvious timing or protocol differences between the two.

    If you can help me understand what conditions would make the bootloader NAK the DFU_GetStatus message, that would be very useful.

    I can also tell you I have tried inserting a sleep (up to 1000mSec) just prior to DFUMakeDeviceIdle calling DFUDeviceStatusGet() to make sure the device is idle - but this had no effect on the error - this leads me to suspect the problem occurs between the command and data phases of that transfer.

    Bill
  • Hi Bill,

    Thank you for the information. I am still trying to reproduce the problem and get a little more familiar with USB DFU.

    NAKs can be a normal part of the USB transfers, but because the device only NAKs after the fail, that concerns me a little. This makes me think this will require some debugging of the embedded code.

    It is good you have found where it is failing in the Host PC code. You can modify the code to handle the errors and retry TIDFUDeviceOpen() if necessary. This will make the host PC code more robust. But we need to figure out why the embedded device is not responding sucessfully and why it is unable to recover.

    I hope to have something in the middle of the week next week for you. At least some suggestions. In the meantime, please keep me informed of your debugging efforts and what you have tested and observed.

    Regards,
    sal
  • Bill,

    Can you please try a few different things for me?

    1) First use Windows 8 to enumerate and connect to the embedded device. After it fails and you only receive NAKs, do no perform a reset, but then try to connect and enumerate with a Windows 7 machines.
    --Is the device recognized by the Windows 7 machine? Is it able to enumerate and is it visible in Device Manager? Are you able to run dfuprog successfully?

    2) After dfuprog fails and the PC only receives NAKs from the device, will a XRSn reset resolve the issue and get the device to start working again properly? Does it start working after a PORn? Do you need to reprogram the device using JTAG to get it working properly again?

    sal
  • Bill,

    Try this... In bl_usb.c in the boot_loader project, please USB0DeviceIntHandler with this implementation.

    __interrupt void
    USB0DeviceIntHandler(void)
    {
    unsigned long ulTxStatus = 0UL, ulGenStatus = 0UL;
    uint16_t usbis = 0U, txis = 0U;
    //
    // Get the current full USB interrupt status.
    //
    //
    // Do-While to make sure that all status registers are cleared before continuing.
    // This eliminates the race condition which can cause the USB interrupt to stay high
    // and never get triggered again.
    //
    do
    {
    // Get the transmit interrupt status.
    txis = HWREGH(USB0_BASE + USB_O_TXIS);
    // Get the general interrupt status.
    usbis = (uint16_t)HWREGB(USB0_BASE + USB_O_IS);

    ulTxStatus |= txis;
    ulGenStatus |= usbis;
    }
    while((txis != 0x0000U) || (usbis != 0x0000U));


    //
    // Received a reset from the host.
    //
    if(ulGenStatus & USB_IS_RESET)
    {
    USBDeviceEnumResetHandler();
    }

    //
    // USB device was disconnected.
    //
    if(ulGenStatus & USB_IS_DISCON)
    {
    HandleDisconnect();
    }

    //
    // Handle end point 0 interrupts.
    //
    if(ulTxStatus & USB_TXIE_EP0)
    {
    USBDeviceEnumHandler();
    }


    PieCtrlRegs.PIEACK.all |= 0x10;
    }

    sal

  • Sal,
    Unfortunately, once the device has returned a NAK, I am unable to communicate with the bootloader even if I connect the board to a different PC. Windows does not detect the device and load a device driver until I cycle power on the board. Cycling board power restores normal operation. the application code is intact, and I can enter the bootloader and repeat the process using dfuprog.
  • Sal,
    This did not help - behavior is the same as the unmodified code. However, It allowed me to discover a clue that suggests our application code is, in some way inducing or at least exacerbating the problem.

    While testing this change , I found the following repeatable scenario:
    1. Use Uniflash to download the updated bootloader binary to the board, erasing entire flash
    2. Cycle power on the board - it comes up in the bootloader since no application is present
    3. Use dfuprog to connect to the bootloader and successfully enumerate the board, erase application flash, and download the application!
    4. Cycle power on the board - The bootloader jumps to the newly downloaded application and runs fine
    5. Instruct the application to enter the bootloader by jumping to AppUpaterUSB() in bl_usb.c
    6. The UpdaterUSB() function runs, and enters its while loop waiting for a USB command
    7. Use dfu prog -e to connect to the bootloader - and encounter the intermittent problem with DFUDeviceStatusGet() when trying to enumerate.

    I repeated steps 1-7 multiple times. step 3 never fails. Step 7 fails with the frequency I've described previously. When step 7 succeeds, I am also able to erase flash and reprogram the application successfully.

    In the application firmware, prior to branching to AppUpdaterUSB(), I call USBCDCTerm() on the USB port, and stop my periodic systick timer, which is the only active interrupt source at the time. Windows correctly detects the transition, uninstalls the CDC driver and installs the DFU driver.

    Can you think of any additional initialization that I should be performing prior to branching to AppUpdaterUSB()?

    Bill


    PS, I should also note that we are building dfuprog with it's own local copies of tidfu.dll and tiusb.dll, all built from the same controlSuite release.  We are not relying on the copies installed with the device driver.  In fact, my goal is to add the microsoft winusb descriptors to the bootloader configuration so no driver inf is required at all.

  • Bill,

    I have still be unable to reproduce the problem.

    It seems like you are not having an issue when just using the bootloader, but you are having the problem when you branch from your application to the bootloader. Is this correct?

    In your application, are you using the USB as a CDC prior to a DFU? Are you sure PLL2 is configured and locked correctly?

    The difference in performance between Windows 8 and Windows 7 makes me believe that there is a race condition occurring in the embedded host code. This may be something to do with the DFU bootloader or something to do with your application setting up the bootloader.

    I suggest trying to debug this from the embedded host side. This will be a little more difficult because I cannot reproduce the problem, but we can try.

    Start the debug from USB0DeviceIntHandler(void) and then dig deeper into the usblib.

    After the fail, are there any USB interrupts being triggered? After the fail, set a breakpoint in USB0DeviceIntHandler(void) and see if the breakpoint ever gets reached by the bootloader. My first thought is that there is a race condition which is causing the USB interrupts to cease being fired and serviced.

    sal
  • Sal,
    Yes, based on the most recent testing, it appears the bootloader is behaving properly when I enter it directly on boot. I have only been able to reproduce the error when entering from the application. The PLLs "should" be configured properly -- the CDC interface is working correctly - in fact, a command received via the CDC interface is what directs the application to disconnect the CDC and branch to the bootloader.

    I agree with your race condition assessment - in multiple USB packet traces, I've seen no consistent or suspicious timing differences between the two systems - packet sequence is identical, timing is always within 10-20 uSec.

    Rather than try to finesse this, can you suggest a "brute-force" initialization sequence I could perform from the application code prior to branching to the bootloader? Time is not critical at this point.

    Bill
  • My suggestion is to look at the USB Device examples in controlSUITE. You may wish to look at F2806x and F2837x examples. They are very similar but there are a few differences.

    Have you tried disconnecting the USB PHY and then reconnecting the USB PHY? After getting the CDC packet and command to switch to DFU, see if you can disconnect and reconnect using SOFTCON. I would also suggest doing a complete re-initialization of the USB as can be seen in the bootloader example and other device examples.

    Let me know if this helps or if you have any other questions.

    sal
  • Sal,
    I've tried your suggestions, but they had no impact on the problem.
    Here is the complete sequence that occurs when entering the bootloader from the applicaton code:

    0. The application code configures the USB port as a CDC serial interface, and is properly enumerated by Windows.
    1. The application code receives a command via the USB CDC port requesting to enter bootloader.
    2. Call USBDCDCTerm() (in usbdcdc.c) to terminate CDC operation. Eventually, this clears SOFTCON.
    3. At this point, Windows detects the CDC device removal.
    4. Application branches to AppUpdaterUSB() in bl_usb.c
    5. This function calls ConfigureUSBInterface() This eventually sets SOFTCON.
    5a.(Note: I have also tried replacing ConfigureUSBInterface() with all of the USB and clock configuration performed in the main bootloader entry point - with no change in behavior.)
    6. At this point, Windows detects and enumerates the DFU device, and installs the correct driver
    6a. (I have tried unplugging and replugging the USB cable at this point, with no change in behavior)
    7. The bootloader then calls UpdaterUSB() which enters a forever loop that polls for the command flag from the USB interrupt handlers
    8. dfuprog is started on the Windows host, which calls _TIDFUDeviceOpen() in the tidfu library
    9. _TIDFUDeviceOpen() calls InitializeDeviceByIndex() and successfully locates the device
    10. It also successfully get the DeviceDescriptor, configuration descriptor, and DFU Function descriptor
    11. It then calls DFUMakeDeviceIdle(), which calls DFUDeviceStatusGet()
    12 DFUDeviceStatusGet() calls Endpoint0Transfer() which returns an error due to the device returning NAK.

    As a test, I disabled the AppCheck() function in the bootloader, and entered the bootloader via a system reset from the WDT. The error behavior was identical.

    I have determined that the NAK occurs when the DFU_GET_STATUS command is the first transfer following a USB SOF packet. When there is an intervening transaction, the DFU_GET_STATUS succeeds and everything works fine.

    On Win7, there is often another transaction between the SOF and the DFU_GET_STATUS command, while on Win8.1 there is rarely an intervening transaction. this appears to explain the difference in failure rates I am seeing between the two OS versions.

    To me, this suggests the problem is NOT in the bootloader, per se, but possibly in the low level USB interface timing. Can you provide a code sequence that will restore the USB interface to it's power-up condition?

    BIll
  • Sal,
    I wanted to follow up and thank you for all your help on this issue. Once I was able to connect the debugger to the bootloader process after entering from the application, I found the root cause of the problem. As you suspected, the issue was one of improper initialization.

    The USB DFU bootloader reference code contains ram resident static variables that are initialized during the boot process. These variables are not re-initialized when the bootloader is reentered from the application code. If the application does not overwrite the RAM containing these initialized variables, the bootloader functions properly. If, however, the application's ram usage causes these variable to be overwritten, then the bootloader will fail when entered from the application.

    Since the bootloader and the application are separately linked projects, the RAM usage is not typically coordinated between the two. It is also not desirable to prevent the application from using a block of available ram. I have reworked my code to avoid this collision, and everything is working well.

    I would suggest that the reference bootloader project be modified to eliminate the use of non-const initialized static variables. Instead, provide an initialization function that can be called on entry - whether from Power on reset, watchdog timer, or from an application.

    Again, thanks for your help with this

    Kindest Regards,
    Bill
  • Bill,

    It is great to see you got it working and were able to debug the issue!!

    Uninitialized static global variables in this library are a problem, especially on F2806x, were the RAMs are not initialized by Boot ROM. I should have suggested that earlier, but I am glad you were able to debug the issue. Thank you for making the effort.

    I have made some fixes to other F2806x USB examples to have the linker initialize .ebss (static global variables) to 0x0000. I will file a bug to modify the linker command file for the bl_app and bootloader. But, I agree that the best fix would be to provide an initialization function to initialize the static globals to 0.

    If you could and have time, please provide the names and files of the static globals variables that were causing the issue. This would be helpful.

    Regards,
    sal
  • Sal,
    The specific global that was causing the bootloader crash was g_pfnRequestHandlers[] in bl_usb.c. However, there are several other initialized statics in that file, such as g_sDFUStatus, g_sDFUProtocol, g_eDFUState, ...

    The map file shows the .cinit section contains a little under 256 bytes of initialized variables.

    Bill