We have recently updated our boot process via NAND on the Am1705 so that we have redundant boot images to use if we get damaged NAND areas. To this end, we now have 2 copies each of the initial User bootloader (UBL) stored at blocks 1 & 2 of the NAND chip in AIS format. Our UBL has some useful low-level diagnostics, but in normal boot it simply loads one of 2 U-Boot copies from either the primary or secondary area of NAND, then we use some U-Boot scripting to do the same for the 2 Linux kernel image copies, which, once running the OS, uses a UBIFS NAND partition.
We chose this route due to a couple of field failures (in the U-Boot image's ECC); the idea being that UBIFS can maintain itself pretty well once the board is fully up and running, but we don't really have the same option for the boot-level binaries. We have a Linux script that checks the redundant images against each other, plus it looks at any creeping ECC failures over time. If any discrepancies are found, it re-programs the affected image using the data from the good copy.
I noticed relatively early that there was a potential weakness in this scheme relating to the on-chip ROM bootloader (RBL); namely, that if we found a damaged Block 1 and happened to lose power after erasing it prior to refreshing with a good copy (from Block 2), then the RBL seemed to abort boot when it encountered a blank Block 1, rather than trying to load data from successive blocks, as I had assumed. Further testing has shown that RBL will only check Block 2 for our 2nd UBL copy if Block 1 is marked bad. I have tried introducing errors in the AIS image's CRC, ECC errors and simply damaging the magic word at the start of the block - all of these result in a non-bootable board, even though there is a perfectly good UBL image stored at Block 2.
Is my understanding of the RBL 'abort boot' conditions correct? I realise that we can't change the RBL behaviour, but we may have to look more carefully in how we deal with Block 1 with our NAND checking script if there's a limited number of options that lead to Block 2 being checked if Block 1 contains errors.
Is there source code for the RBL available to check my suspicions?