AM1705: ROM Bootloader not loading Block 2 if Block 1 has errors

Jeremy Evans

Part Number: AM1705

We have recently updated our boot process via NAND on the Am1705 so that we have redundant boot images to use if we get damaged NAND areas. To this end, we now have 2 copies each of the initial User bootloader (UBL) stored at blocks 1 & 2 of the NAND chip in AIS format. Our UBL has some useful low-level diagnostics, but in normal boot it simply loads one of 2 U-Boot copies from either the primary or secondary area of NAND, then we use some U-Boot scripting to do the same for the 2 Linux kernel image copies, which, once running the OS, uses a UBIFS NAND partition.

We chose this route due to a couple of field failures (in the U-Boot image's ECC); the idea being that UBIFS can maintain itself pretty well once the board is fully up and running, but we don't really have the same option for the boot-level binaries. We have a Linux script that checks the redundant images against each other, plus it looks at any creeping ECC failures over time. If any discrepancies are found, it re-programs the affected image using the data from the good copy.

I noticed relatively early that there was a potential weakness in this scheme relating to the on-chip ROM bootloader (RBL); namely, that if we found a damaged Block 1 and happened to lose power after erasing it prior to refreshing with a good copy (from Block 2), then the RBL seemed to abort boot when it encountered a blank Block 1, rather than trying to load data from successive blocks, as I had assumed. Further testing has shown that RBL will only check Block 2 for our 2nd UBL copy if Block 1 is marked bad. I have tried introducing errors in the AIS image's CRC, ECC errors and simply damaging the magic word at the start of the block - all of these result in a non-bootable board, even though there is a perfectly good UBL image stored at Block 2.

Is my understanding of the RBL 'abort boot' conditions correct? I realise that we can't change the RBL behaviour, but we may have to look more carefully in how we deal with Block 1 with our NAND checking script if there's a limited number of options that lead to Block 2 being checked if Block 1 contains errors.

Is there source code for the RBL available to check my suspicions?

over 6 years ago

0 Yordan Kovachev over 6 years ago

TI__Guru**** 161600 points

Hi,

Which TI SDK are you using?

Best Regards,
Yordan

0 Jeremy Evans over 6 years ago in reply to Yordan Kovachev

Prodigy 110 points

We used 03.20.00.14 to get the project going back in the day, although it doesn't come up in day-to-day use.

0 JJD over 6 years ago in reply to Jeremy Evans

TI__Guru* 89365 points

Jeremy, your understanding of the abort mechanism is correct. The RBL will only reach the redundant boot image if the block is marked bad. An erased block or otherwise bad image will hang the RBL. It is best to mark the block bad with an independent byte write if supported as soon as you notice the corruption. Then you can proceed to erase and reflash the image. This should reduce the window of time which exposes you to boot failures.

Regards,
James

0 Jeremy Evans over 6 years ago in reply to JJD

Prodigy 110 points

Thanks for confirming my suspicions James, we'll look at our approach to checking/repairing the block a bit more carefully now.

Processors

Processors forum

AM1705: ROM Bootloader not loading Block 2 if Block 1 has errors