This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3358: EMI induced SW Warm reset debug on BBB like design

Part Number: AM3358


I am looking for any advice on how to debug a SW WARM reset due to Electrical Fast Transient or ESD on Ethernet Port of a design based on Beagle Bone Black EVM.

The PRM_RSTST bit 1 "Global_WARM_SW_RST" is being set only as a result of the EMI transient exposure dumping into DGND.

I assume some exception vector is forcing this SW reset but I was hoping there might be some other bread crumbs to follow as to the cause.

Q - are there any other registers that log more detail on cause of this SW WARM Global trip?

Q - are there any known weak areas in BBB design that I should consider along these lines?

Q - are there any AM3358 settings that may be helpful to give better transient immunity.

 Design overview relative to BBB EVM - Design is very close to this EVM

Not using any of the HDMI interface which is depopulated or HOST USB.

Field interfaces are just Ethernet 10/100 and USB0 device as a serial port for configuration.

Internal TTL UART connections to host device.

I am sourcing both AC in from host or USB and not using a battery or push button on PMIC, I have followed app notes along these lines including what to do with unused circuits.

I am not experiencing any latched shut down modes as result, PMIC appears solid.

DDR3 is shifted to DDR3L and VDD_3V3B isourced by buck LMZ10500SILT instead of LDO to reduce heat dissipation.

I changed to the "D" version of TPS65217D for the DDR3L

MPU core running at 600MHz and DDR3L at 606MHz data rate also to conserve on power dissipation.

DDR3L memory is performing well with various performance tests.

pcb is 8 layer with high speed traces tight to DGND ground plane as controlled impedance without significant impedance discontinuities in routing and digital interface group lengths made  tighter than requirements.

Traces from MPU to DDR3L  and Net Phy are short at ~ 1"

This pcb build has worked well for me on other designs surviving EMI requirements beyond level 3 (IEC61000-4-X group).

 

  • Chris

    Someone should will respond to you early next week, if not today.

    -- Paul
  • Noise from EFT and ESD events may be coupling into the signal connected to the nRESETIN_OUT pin. Several years ago another customer had a similar problem with their system. After describing my concern of noise coupling to this pin and asking a few questions related to their PCB layout, we found the problem was related to component placement.

     

    They connected a push-button reset switch and filter capacitor to the nRESETIN_OUT pin similar to S1 and C24 shown in the Beagle Bone Black schematic. However, they located the switch and capacitor together several inches from the processor. The capacitor provides a low-impedance path to ground for any transient noise coupled into the signal. The result was unexpected voltage transients being applied to the nRESETIN_OUT pin when noise was coupled into the signal. This occurred because the far-end (relative to the processor) had a low impedance path to ground via the capacitor and the nRESETIN_OUT pin is connected to a high-impedance LVCMOS input buffer. Any current coupled into the signal would produce a voltage at the high-impedance end of the signal. They were able to resolve their issue by moving the filter capacitor to the near-end of the signal. I cannot say for sure you are having is the same problem, but this is the first thing I would check.

     

    This type of problem is very system dependent and every system is different. Therefore, there can be many system variables that influence noise immunity. It is very difficult for TI to comprehend all of the possible contributions, so you will need to determine how noise is coupling into the processor and make the appropriate changes to resolve the problem.

     

    Regards,

    Paul

  • Paul -

    Note please skim  http://www.ti.com/lit/ug/spruh73p/spruh73p.pdf  - system manual  Page 1428  starts PRM_RSTST  contents

    In particular the various bits and what they stand for.

    Q -  to my first question - Are there any other MPU registers that log detail beyond the PRM_RSTST register?  I assume there is not since you did not comment. 

    Bit 1 - stands for a SW WARM reset that could be due to a purposeful shut down or I suppose an exception type reset vector due to corrupted DDR3L code.

    Any thoughts along this line of what could cause a bit 1 reset?  More below on this

    Q - To my second question - thanks for your thoughts about BBB design issues.

    Note that the nRESETIN_OUT signal you talk about will set bit 5 of the PRM_RSTST register = EXTRTNSL_WARM_RST

    If I push my HW reset button then I will see bit 5 set of the reset register  I am NOT seeing this bit set during my ESD reset.

    So I do not believe that nRESETIN_OUT is toggling like your example.

    If the PMIC were to drive the PORZ low are any of the other inputs such as this nRESETIN_OUT signal also captured at the same time?

    I have these signals PORZ and nRESETIN_OUT set up same as BBB EVM including the reset button and R pull up and parallel 2.2nf cap and the buffer from PMIC_PGOOD to nRESETIN_OUT.

    I only see bit 0 set if PORZ goes low.  I suspect that PORZ dominates and ignores nRESETIN_OUT.

    Q - to my third question are there setting that would help? 

    I understand that this is highly dependent on what is being affected.

    MORE DETAILS TO CONSIDER:

    Turns out that one ESD hit will start the ball rolling to reset but it takes 10-12 seconds for the OS to shut down.  It appears that in our Linux System Log there may be some clues about why the FW is shutting down.  It appears to be happening in a controlled fashion, not an exception vector.  We have the watchdog turned on and bit 4 has never been set.  I have captured several events with the system log and we are reviewing. So there may be some clues as to what HW sub circuit is being affected.  If we learn anything worth sharing I will post it here.  Perhaps it could help others.

    Thanks for your support    - Chris

    http://www.ti.com/lit/ug/spruh73p/spruh73p.pdf  - system manual

    Page 1428  starts PRM_RSTST  contents

  • I’m sorry for the confusion, bit 5 is the one that should be set if nRESETIN_OUT was the source. I looked at the register definition very quickly yesterday and missed this detail.

     

    I discussed your issue with a co-worker that has spent more time looking at the various reset functions in AM335x. He said bit 1 is set when software initiates a reset by writing 1 to bit 0 of the PRM_RSTCTRL register. Therefore, we assume one of the exception vector interrupt service routines in your OS is writing the bit. You may need to review all of the exception vector routines to understand which ones can initiate a warm reset by writing to bit 0 of PRM_RSTCTRL. If possible, you may need to insert debug statements in these routines or temporarily modify these routines such that you can determine which is initiating the reset. This may provide some insight into where the noise is coupling into your system.

     

    If PORz is driven low, you should read the value provided in the “RESET” column of the PRM_RSTST Register Field Descriptions table.

     

    We are not aware of any register settings that would help with this issue without understanding how noise is coupled into the processor.

     

    I have one other hardware related suggestion. I assume you are using a crystal circuit attached to the internal oscillator as the device reference clock source. If so, can you temporarily remove the crystal circuit and replace it with a 1.8V LVCMOS oscillator that sources a reference clock with fast rise/fall edges? We have seen cases where noise couples into the crystal circuit as its output slowly rises/falls through the input buffer switching threshold. This can create glitches on the internal clock if the noise is large enough and occurs just as the signal crosses the threshold. Glitches can produce a clock with higher than expected toggle frequency which can over-clock the attached circuits. Therefore, a glitch on the internal clocks may cause circuits to misbehave. Using a clock reference with fast rise/fall edges minimizes the chance of this occurring.

     

    Regards,

    Paul

  • I don't think you are seeing the tie between bit 0 and bit 5 I was asking about.
    Our PORZ is driven off the PMIC PGOOD signal -> bit 0
    This PGOOD then through a buffer also drives nRESETIN_OUT in parallel -> bit 5
    (see sys manual figure 8-21, BBB EVM and my application have the same relationship)

    I thought that since the two signals transition ~ together I would capture both bits in the PRM_RSTST register.
    But in reality the buffer puts a bit of a delay on the nRESETIN_OUT signal and PORZ wins.
    Or possibly during a PORZ event all other reset inputs to this reset register are just not captured.
    In other words if you experience a PORZ the register will not record any other simultaneous flag.
    The reason to understand this is to help resolve what is driving reset and focus my HW mods.

    I am NOT resetting on an exception vector!
    It Turns out that Linux is shutting down in a controlled fashion after a single ESD hit.
    This takes about 10-12 seconds and then we reset.
    In our Sys Log I can find the start of the shut down as udhcpd [1269}, received SIGTERM.
    So we are digging in to see if we can follow what called this SIGTERM.

    Each time the numbers in the udhcpd brackets are changing.
    I understand that this points to what called the SIGTERM.
    Hopefully that may point us to what event and possibly then what HW circuit is triggering the shut down.

    Yes I am using crystals both on MPU and Net Phy.
    Also related to this I may have the crystal cap size off on these test units.
    Other designs that I have done and were immune did use an oscillator, I was trying to reduce power budget where possible.
    This may not be too hard to hack in and try.
  • Chris Wells said:
    In our Sys Log I can find the start of the shut down as udhcpd [1269}, received SIGTERM.
    So we are digging in to see if we can follow what called this SIGTERM.

    A couple comments on this:

    1. How come you're using udhcpd?  The recent SDK's are all using systemd which includes systemd-networkd for handling DHCP.  I don't expect it relates to this issue, but as a general comment you should make sure you're not using both systemd-networkd and udhcpd. 

    2. Do you have a full Linux console log somewhere?  Or are you not able to capture it when running these tests?

  • It doesn’t matter if you assert nRESETIN_OUT at the same time or slightly after PORz, because PORz it the global reset that returns everything in the device back to a known state. So it is not possible for the register to record the assertion of nRESETIN_OUT.

    I’m not able to help with software. Hopefully, you can track down the source.

    Regards,
    Paul
  • Brad - As a HW focused engineer I will have to forward  the udhcpd versus system-networkd topic to our FW group.  I don't think our FW group is leveraging your SDK like you are expecting.

    i have not attempted to capture the console log during ESD exposure so far.  I have non-isolated TTL TXRX/USB interface that may affect our PC host during exposure.

    I may be able to add a layer of USB isolation (TTL to RS232 adapter hooked to RS232/ISO USB adapter) to the console interface and float the PC and capture.  I can trigger the event even at 1KV so that may work out.  I will give it a go and share the console details and the sys log as well.

  • We tracked down the problem to EMI toggling the boot sequence line R3 (LCD_DATA2).of AM33558

    We have the same boot sequence circuitry as BBB EVM but with a jumper instead of a push button.

    We had this IO also set up to call a shut down and reboot process in Linux.

    So basically by setting the jumper or in BBB pushing and holding he button the unit would shift boot from eMMC over to uSD/USB.

    This trace was acting like an antenna and the weak 100K PU resistor the signal was toggling due to EMI exposure.

    Killing this shut down process via the console eliminated the problem.

    In FW we will disconnect this process from this input and increase the PU from 100K to 10K plus add a small cap to ground across the signal to DGND.to keep the signal stable.

    Key in figuring this out was that the MPU reset cause register PRM_RSTST was only showing us bit 1 being set = SW WARM reset.

    Through the Linux system log and the console output we could see this orderly shut down.

    In our other products we had a user reset button that would call a shut down process which was different than the EXTERNAL reset push button S2 on the BBB EVM.

    Thanks all for your support, hope this helps someone else someday :-)

  • I'm glad you ware able to find the source of the issue.

    Regards,
    Paul