We're using the OMAPL138 microprocessor with Linux. We use UBL 1.65 with U-Boot for the boot up sequence. On about 2.5% of our boxes that go out to the field we get the following conflicting messages from UBL which halts the boot up process. These boxes are then RMA'd back to us.
Valid magicnum, 0x55424CBB, found in block 0x00000006.
No valid boot image found!
It is impossible for this code to give these two conflicting messages. The code that returns these two messages is in nandboot.c. blockNum has a value of 6 which is the block where magicnum is expected.
for(count=blockNum; count <= DEVICE_NAND_UBL_SEARCH_END_BLOCK; count++)
{
Uint32 magicNum;
if(NAND_readPage(hNandInfo,count,0,rxBuf) != E_PASS)
{
continue;
}
magicNum = ((Uint32 *)rxBuf)[0];
/* Valid magic number found */
if (magicNum == UBL_MAGIC_BINARY_BOOT)
{
blockNum = count;
DEBUG_printString("Valid magicnum, ");
DEBUG_printHexInt(magicNum);
DEBUG_printString(", found in block ");
DEBUG_printHexInt(blockNum);
DEBUG_printString(".\r\n");
break;
}
}
// Never found valid header in any page 0 of any of searched blocks
if (count > DEVICE_NAND_UBL_SEARCH_END_BLOCK)
{
static void (*APPEntry)(void);
DEBUG_printString("No valid boot image found!");
}
The only way that this could happen is if the break; (located after displaying the Valid magicnum found message) has been corrupted. We suspect that there is a read disturbance when reading another block that corrupts the break; statement. Do you believe this is possible?
We have talked about several ways to work around this problem.
1) Add another break statement after the first one so that if the first one is corrupted the second one will break out of the for loop when count is 6 rather than allowing count to reach 33.
2) Place a second copy of the UBL in an unused block (/dev/mtd8) which is located at address 0x8220000. Jump to the second copy if the conflict is noticed. The jump apparently didn't work because the system hung when the jump was executed. I then thought of copying the second UBL to RAM and jumping to RAM. The problem is that I added debug code to read the first 5 bytes of all 4096 blocks and only 1, 6, 7, and 8 did not return E_FAIL.I don't understand why NAND_readFlash fails to read all blocks. Am I understanding correctly that address 0x8220000 is located at block 1029? In any case, I couldn't try this idea because I can't read block 1029 with NAND_readFlash.
3) Modify protection for /dev/mtd1 to allow me to erase/write /dev/mtd1 where the UBL is located and rewrite it if it gets corrupted. The firmware would read it periodically and compare it with a good copy of the UBL. Then rewrite it if it has been corrupted. I was able to find in board-da850-evm.c where I can enable erase/write of /dev/mtd1. But, when I erase, write, sync, and reboot, the UBL does not execute. The system just hangs. I did a hexdump of /dev/mtd1, saved a copy, converted it to binary. Then, I erased /dev/mtp1, did a nandwrite -p /dev/mtd1 mtd_copy.bin, did a hexdump again and compared it with the original dump. I did a sync and waited 2 minutes before entering reboot. But, the system hangs (displays no debug messages) when it reaches trying to execute the UBL.
I've been blocked in every direction I've turned. If you have any suggestions for resolving this UBL problem I would appreciate your help. If you see flaws in my understanding, please correct me.
Thanks,
Mark Stolp (801 303 3427)
ClearOne, Inc.