This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320DM8147: DSP reset/restart/reload fails after many successful iterations

Part Number: TMS320DM8147

The setup of interest here is a DM8147 the is running a custom uboot to load code across the PCIe bus to the DSP and then start the DSP.

The DSP has EDMA and McASPs active and runs 8 tasks. Everything works as expected.

What doesn't work i100% of the time is an asynchronous  "soft" reboot. This happens when the ARM resets the DSP and code is reloaded across the PCIe bus and the DSP restarted. Are there special steps required to reset the DSP to it's powered down state.

Additionally, once the DSP has failed to come up correctly, a complete power off, restart cycle is required.

Are there any additional steps required to soft reboot the DSP in the DM8147.

Thanks.

  • Based on my experience with other devices, I would suggest that before a DSP reboot the ARM issues some kind of "notice" to the DSP.  The DSP would react by disabling any associated EDMA/McASP operations that are on-going and falling into a spin-loop to wait to be rebooted.  The goal would be to avoid resetting the DSP while there's a memory transaction "in flight".  If the DSP can stop its activity and just spin, I think the issues will go away.

  • Acknowledged. I will try your suggestion.

  • More details follow.....

    In Linux the following sequence works 100% of the time...

    On driver unload

    - send CLOSE command from Linux kernel driver to DSP via PCIe

    - DSP halts all McASP traffic

    - driver unloads

    On driver load

    - tell ARM to reset the DSP

    - load DSP code

    - start DSP code running

    - bring up PCIe interface

    The above sequence works 1000s of times without an issue.

    But, in Linux, when a soft reboot is performed, the driver is not unloaded by the operating system, so there is not a clean shutdown. On the DSP side we capture the PCIe RESET event via a GPIO interrupt and then do a McASP (and EDMA) shutdown, but it is possible there could an EDMA PCIe bus-mastering transaction underway at the time of the PCIe RESET. Are there any registers we can look in to confirm a PCIe transaction error? And, is there a recovery mechanism?

  • Andrew Elder said:

    More details follow.....

    In Linux the following sequence works 100% of the time...

    On driver unload

    - send CLOSE command from Linux kernel driver to DSP via PCIe

    - DSP halts all McASP traffic

    - driver unloads

    On driver load

    - tell ARM to reset the DSP

    - load DSP code

    - start DSP code running

    - bring up PCIe interface

    The above sequence works 1000s of times without an issue.

    That's great!  I'm glad to hear you've made substantial progress.  Thanks for the update.

    Andrew Elder said:
    But, in Linux, when a soft reboot is performed, the driver is not unloaded by the operating system, so there is not a clean shutdown. On the DSP side we capture the PCIe RESET event via a GPIO interrupt and then do a McASP (and EDMA) shutdown, but it is possible there could an EDMA PCIe bus-mastering transaction underway at the time of the PCIe RESET. Are there any registers we can look in to confirm a PCIe transaction error? And, is there a recovery mechanism?

    I don't quite understand the scenario.  When you say a "soft reboot" are you referring to rebooting the entire chip, e.g. "reboot" command from Linux?  Even in that case, I would have thought some type of hardware reset would be invoked under the hood as part of the reboot which would clear out any kind of pending errors, etc.  Are you seeing an issue?  Is the issue perhaps on the other side of the PCIe bus, i.e. do we put the other device into a bad state due to a partially completed transaction?

  • 'soft reboot' = reboot command from Linux.

    Important note - the PCIe reset line is not connected to the DM8147 reset pin because on many PCs the PCIe configuration registers cannot be loaded with the correct Hardware IDs (PCI VEN and SUBSYS) fast enough. ie, the time between when the PCIe reset linee going high and BIOS scanning the PCIe bus is faster than uboot can configure the PCIe Hardware IDs. So, for this reason, the PCIe reset line is wired to a GPIO pin and the DSP and ARM both "watch" for the PCIe reset line going low.

    So, in summary, rebooting the Linux host PC will not result in the hardware RESET pin on the DM8147 being asserted. 

  • See also advisory 3.0.66 here http://www.ti.com/lit/er/sprz343c/sprz343c.pdf, forcing us to boot from SPI flash.

  • Sorry, I'm still confused!  Was your initial note about "Linux soft reboot" referring to "Host PC reboot" or to "DM8147 reboot"?  Initially I thought you meant the DM8147, but now I think you're referring to the PC.  Is that correct?  So your concern is that when you reboot the PC there might be some kind of "in flight" PCI transaction that causes issues?  I don't know precisely how that's handled on this device.  I would expect that a pluggable interface like PCIe has various timeouts and other error conditions to handle this sort of issue (though I'm admittedly not an authority on PCIe!).

  • Brad Griffis said:

    Sorry, I'm still confused!  Was your initial note about "Linux soft reboot" referring to "Host PC reboot" or to "DM8147 reboot"?  Initially I thought you meant the DM8147, but now I think you're referring to the PC.  Is that correct?  So your concern is that when you reboot the PC there might be some kind of "in flight" PCI transaction that causes issues?  I don't know precisely how that's handled on this device.  I would expect that a pluggable interface like PCIe has various timeouts and other error conditions to handle this sort of issue (though I'm admittedly not an authority on PCIe!).

    Well, drat, I accidentally clicked the resolved button. Any way to undo that?

    I see the confusion now. Sorry I wasn't clear. The "Linux soft reboot" refers to "Host PC reboot". The DM8147 is not running Linux at all (although I realize it can).

  • Sometimes there's a "reject answer" button.  Perhaps it's not present though when you clicked "resolved" yourself.

    Related to this whole topic, are you aware that you can reset the entire chip via software by using the PRM_RSTCTRL register?  Maybe I'm headed the wrong direction here, but when you "see" that the Host PC is rebooting would it make sense to reboot yourself too?  That would definitely clear out any errors that are pending!

  • Brad Griffis said:
    Related to this whole topic, are you aware that you can reset the entire chip via software by using the PRM_RSTCTRL register?  Maybe I'm headed the wrong direction here, but when you "see" that the Host PC is rebooting would it make sense to reboot yourself too?  That would definitely clear out any errors that are pending!

    I was unaware of that option. I suspect we still end up in a race to load uboot and program the Hardware IDs before BIOS scans PCIe registers, but I will review further.

  • I have discussed this option internally and concluded that using a write to PRM_RSTCTRL to do a software reset of the DM8147 would result in the same "race" between uboot PCIe configuration and the host Linux PC BIOS reading back the PCIe configuration.

    Do you have any idea as to what registers we should examine to further research exactly why the DSP is locking up after a program reload?

  • Andrew Elder said:
    I have discussed this option internally and concluded that using a write to PRM_RSTCTRL to do a software reset of the DM8147 would result in the same "race" between uboot PCIe configuration and the host Linux PC BIOS reading back the PCIe configuration.

    That sounds reasonable.  I figured that might be the case, but wanted to at least make sure the option was considered.

    Andrew Elder said:
    Do you have any idea as to what registers we should examine to further research exactly why the DSP is locking up after a program reload?

    It sounds like you made some changes with respect to the DSP shutdown/reload process that had a major impact.  Are you still having issues?  Just less regularly perhaps?  Please refresh me on the current state of this issue in terms of what sort of symptoms you're seeing, how often, etc.

  • Yes, still having the issue. Manual driver unload/reload from Linux (no reboot) that resets the DSP and reloads DSP code works 100% reliably. Host Linux PC "soft" reboot (ie not a "hard" power cycle) that asserts PCIe RESET at some asynchronous time will run 100 times correctly and then fail. After the failure, the recovery requires a power cycle of the host Linux PC.

    When the failure occurs, it looks like the DSP locks up hard. We have 1 second LED heartbeat and that stops flashing.

  • In the lockup scenario can you detect this has occurred from the ARM and issue a chip level reset through the PRM_RSTCTRL register?

  • Hi Brad,

    I am Andrew's colleague. Issuing a chip wide reset would make that issue go away but won't help us unfortunately because if we did that the endpoint would no longer be enumerated by the the root complex. We have verified that EDMA Transfer controller errstat and errdet register are clear right before the DSP is taken our of reset on a "bad boot run" i.e. the DSP locks up. The DSP has some exception flags and a buserr register but they are all in the DSP's internal RAM and thus not visible from the ARM. Connecting the emulator after the fact to inspect the state of the DSP has failed in the past for us because link fails repetedly and if/when it finally suceed the DSP has been reset by the connection process.

    Thank you!

    --

    Delio

  • Delio Brignoli said:
    The DSP has some exception flags and a buserr register but they are all in the DSP's internal RAM and thus not visible from the ARM.

    Are you able to get any more data related to the bus error?  That might give some new clue or direction to this conversation.

  • Brad Griffis said:
    Are you able to get any more data related to the bus error?  That might give some new clue or direction to this conversation.

    Any suggestions how to do that? As best we can tell the DSP is "locked up".