AM335 boot failure - temperature related

Daniel O'Neill

Other Parts Discussed in Thread: AM3352, SYSCONFIG

Hi,

We have a AM3352 on a board that doesn't boot when below -10 degrees or after being soaked at +50 degrees before the processor is given a quick spray with freeze spray.

Symptoms:

All power supplies are at the correct voltage >50ms before reset is released.
XTAL freq is 25MHz. This is not an XTAL start up issue (I originally thought it was).
SYSBOOT[5] is tied high and as soon as the reset is released 25MHz is observed on CLKOUT1.
Configured for NAND/NANDI2C/MMC0/UART0 boot sequence.
All sysboot pins are tied high or low directly or with a 10k except SYSBOOT15 which is floating (Causes wrong clock detection but this is handled in code).
UART0_TX floats at 2V when failing, no C's or U-boot data is observed.. When passing it floats at 2V until it is configured to put out C's or uboot data.
Tracing data registers (0x4030CE40 etc) shows 0x0BAD0BAD when read by jtag when failing (all registers show this).
JAT emulator is an XDS510USB plus and we are accessing the CS_DAP_DebugSS target.
It doesn't look like the unit will recover unless power cycled or reset.
Reset while in the failing temperature ranges fails to boot in the same manner.
The unit operates in the failing temperature ranges if it is not turned off before entering them.

It looks like the processor has a condition that is not being meet to release the internal reset

The AM3352 has Power, clock and a released reset line. What else could affect the start-up of the processor?

Are there any other suggestions to debug this issues?

Cheers,
Daniel

over 10 years ago

0 Biser Gatchev-XID over 10 years ago

TI__Guru**** 393215 points

Hi Daniel,

What is the temperature grade of your components? I also think that the

Daniel O'Neill said:
being soaked at +50 degrees before the processor is given a quick spray with freeze spray.

test is quite far from recommendations - this is practically a thermal shock you are exposing the device to.

0 Daniel O'Neill over 10 years ago in reply to Biser Gatchev-XID

Prodigy 215 points

Hi Biser,

The parts are rated -40 to +105 C.

I agree that the freeze spray is not ideal but it allows localised temperature control to isolate the issue. In this case spraying the micro caused failures where spraying the XTAL, DDR and NAND did not.

Do you have any suggestions on what could cause this?

Cheers,
Daniel

0 Biser Gatchev-XID over 10 years ago in reply to Daniel O'Neill

TI__Guru**** 393215 points

So I haven't heard of low temperature issues with the AM335X. The proper test would be to place the entire board in a thermal test chamber and see if the issue appears. There is a wiki on this: http://processors.wiki.ti.com/index.php/AM335x_Thermal_Considerations

0 peaves over 10 years ago in reply to Biser Gatchev-XID

TI__Guru 63195 points

Tell us more details related to the board having this problem. Is it a production board you purchased or custom board design?

If this is a production board, you should contact the manufacture and discuss the issue with them.

If it is a custom board design, how many boards were built in the same lot as this board?

I have seen cases where intermittent solder joints may open when the device is cold or hot. In a few of these cases, I have seen the symptoms change when you press down on the AM335x device.

You may need to perform a "head in pillow dye test" to confirm your reflow process is correct. You can google "head in pillow dye test" if you are not familiar with this procedure.

Regards,
Paul

0 Daniel O'Neill over 10 years ago in reply to peaves

Prodigy 215 points

This is a custom 6 layer board design. It has NAND and DDR3 as its primary memories. We have built 5 boards of this version of which 3 have been temperature tested and all 3 show the failure.

We have an earlier prototype that we will look to test ASAP.

I have tried applying pressure when failing and there is no behaviour change. As we only have 5 boards I'm reluctant to do a dye penetrate test this early in the investigation.

The testing is being conducted in a temperature chamber, as I said the freeze spray was used to isolate the issue to the processor. At -10 the unit boots normally. At -15 it fails to boot.

Cheers,
Daniel

0 JJD over 10 years ago in reply to Daniel O'Neill

TI__Guru* 88190 points

Daniel, I see that you have connected JTAG thru the DAP. Can you connect to the Cortex-A8? If so, can you read the PC? Also, what is the value of 0x44e10040 CONTROL_STATUS register? What do you measure on VDD_MPU and VDD_CORE voltages?

Thanks,
James

0 Yani Dubin over 10 years ago in reply to JJD

Prodigy 40 points

Hi James,

I am the software guy on this project.

Previously we had been unable to connect to the Cortex-A8 core, so we had no results for this. However I identified a fix today (the Linux equivalent of that described at http://e2e.ti.com/support/development_tools/code_composer_studio/f/81/p/352673/1243485#1243485). Now we are able to connect to this core.

I also came across references to the EMU0/EMU1 pin requirements at boot - these were being pulled low, but I had the hardware guys remove the connected components for testing. I expect the TAP was disabled, hence the inability to read sane values from the DAP.

Can anyone confirm whether having EMU0/EMU1 pulled low (~10K) during POR is a design flaw which we need to resolve? I.e. is it harmless, or is the "Reserved (do not use)" for EMU[1:0] = 0X in the TRM there for good reason?

I can now connect to both the DAP and Cortex-A8 when in this state - which suggests we are not being held in internal reset as Daniel previously suggested, and the software is running.

Here are my observations (in temperature controlled oven at -30 deg C):

* I can enable HALT on reset, and have the debugger trap at 0x20000, presumably the start address for the bootloader in internal RAM.

* The PC is looping over the 3 addresses 0x000213F{4,8,C}. Does this indicate anything useful when lined up with the internal bootloader disassembly (as below)?

000213f4: 045CF8D1 LDREQB PC, [R12], #--2257
000213f8: D0FB07C0 RSCLES R0, R11, R0, ASR #15
000213fc: F8D1E008 LDMNVFD R1, {R3, R13, R14, PC}^

* The CONTROL_STATUS register has 0x00000333, CONTROL_SYSCONFIG is 0x0000002A (sysboot[7:0] is then quite different from the 0x33 I would have expected for the NAND/NANDI2C/MMC0/UART0 configuration Daniel mentioned above). When I attempted to manually set sysboot[7:0] to 0x33 during halt on reset, it read back as 0x22 - and when let run, the code got stuck in the same place.

* Break points are not working. Setting H/W breakpoints on the above addresses results in the PC stepping over them, but the breakpoint does not trigger, nor is the count incremented.

We have not attempted to connect trace, and so our JTAG header is not bringing this out. Theoretically it might be possible to bring out EMU[4:0], if this is needed to determine how we got to this state.

I will confirm whether the H/W guys have measured VDD_MPU and VDD_CORE.

Regards,
Yani.

0 Yani Dubin over 10 years ago in reply to Yani Dubin

Prodigy 40 points

Daniel has confirmed that VDD_MPU and VDD_CORE are both at 1v1 (on the same supply).

I also did a test for comparison. On a good boot (20 deg C), CONTROL_STATUS and CONTROL_SYSCONFIG are exactly the same. Also, I got completely confused here. Sysboot is in CONTROL_STATUS not CONTROL_SYSCONFIG, and the 0x33 I expected to see was indeed there.

On a failed boot, Tracing Vector 1 [0x4030CE40] = 0x6, while trace vectors 2 and 3 are all zero. This indicates it has has not entered the main boot routine. I believe this indicates the issue occurs fairly early on in initialisation. Can anyone comment on this? What would the bootloader be doing at this time?

0 JJD over 10 years ago in reply to Yani Dubin

TI__Guru* 88190 points

Thanks for the info.

That code is looping on a wait for a DPLL lock for the CORE DPLL. So at those cold temps, the PLL is not locking for some reason.

I think you need to do 2 things before moving on:
1. EMU0/1 should be pulled high on the board. As you have experienced, this is the mode that allows you to connect the JTAG. Pulling them low should not harm the processor, it just sets a different JTAG scan path. For now, the best thing to do is to remove the pull downs (there are internal pull ups). Eventually if you spin the board, we recommend external pull ups.
2. SYSBOOT[15:14] needs to be set to match your input clock. Earlier i think you said your input clock is 25MHz, so SYSBOOT[15:14] should be 10b (these bits will eventually get reflected in CONTROL_STATUS[23:22]).

The ROM reads these bits to properly configure all the PLLs in the system. If the ROM reads the wrong values (ie, on your system, it will assume an input clock of 19.2MHz if it reads 00b), then all of the PLLs in the system will be configured incorrectly, which can especially be bad because peripheral clocks will be running at the wrong speed. Furthermore, since the ROM is assuming a 19.2MHz clock, the MPU PLL will be configured for 19.2MHz input. However, in reality, your input clock is 25MHz. Thus, the MPU is running faster than expected for the OPP selected during boot (1.1V), and thus is failing at cold temps because your are running the MPU out of spec (as well as other clocks).

Try those 2 changes and let us know the results.

Regards,
James

0 Yani Dubin over 10 years ago in reply to JJD

Prodigy 40 points

Hi James,

Yes, that makes perfect sense. Thanks for your insights.

I decided to prove this out to give us confidence ahead of our board spin. I used JTAG and HALT on Reset to stop the internal bootloader, set SYSBOOT[15:14] to 10, and let it run. This way the bootcode will set the correct PLL values, and run within spec.

As hoped, we get 'C's coming out at the correct frequency, showing that the board is now operational.

Regards,
Yani.

0 Daniel O'Neill over 10 years ago in reply to JJD

Prodigy 215 points

Thanks for your assistance identifying this issue so that we can move into a board spin with confidence. Much appreciated.

Daniel

Processors

Processors forum

AM335 boot failure - temperature related