I've read the documentation (Using the AM17xx Bootloader; SPRABA4B – November 2010) for this part, but I have a question as to the detailed behavior of the NAND boot mode.
I'm working on a product that will likely be in use longer than the expected data retention of our NAND part (10 years). The product is in a location where there are no guaranteed times where there's not potential power loss. I want to make all parts of the bootloader "self-healing" so it gets rewritten when more than a certain number of ECC corrections occur.
This is pretty easy if you just reserve some blocks at the beginning of NAND. After you boot try read the AIS script. If you got more ECC corrections than you like, copy it to the next available block, verify it's correct, then mark the first block bad (by clearing a bit in the "test bytes" area of the bad block's spare bytes). This should be safe vs. power cuts happening at any point in the process.
However, it's likely that first block is not actually bad and can be re-used if erased. Is there any order of operations that can successfully erase and write the AIS script back to that first block without resulting in an unbootable system at some point? Or at the very least, reduces the window of opportunity to the smallest point?
Obviously, as soon you erase the block, it is no longer "bad" according to the RBL documentation. What is the behavior if it sees a "good" block with no magic? Does it fail completely or try the next one? If so, you could write page 0 last (if your NAND allows it) to reduce your window of opportunity for damage. If the NAND allows partial page programming you could theoretically write just the magic last and have no time of unbootability at all.
Does anybody know the exact details of the NAND boot algorithm so I can try to do the best job I can here? The description in SPRABA4B is nowhere near detailed enough about what happens in error cases.
(Also, what is the behavior when an uncorrectable ECC error occurs? Does it fail out or try the next block?)