UBL reports that it finds the magic number but then reports it couldn't find it.

Mark Stolp

Other Parts Discussed in Thread: OMAPL138, OMAP-L132

We're using the OMAPL138 microprocessor with Linux. We use UBL 1.65 with U-Boot for the boot up sequence. On about 2.5% of our boxes that go out to the field we get the following conflicting messages from UBL which halts the boot up process. These boxes are then RMA'd back to us.

Valid magicnum, 0x55424CBB, found in block 0x00000006.

No valid boot image found!

It is impossible for this code to give these two conflicting messages. The code that returns these two messages is in nandboot.c. blockNum has a value of 6 which is the block where magicnum is expected.

for(count=blockNum; count <= DEVICE_NAND_UBL_SEARCH_END_BLOCK; count++)
{
Uint32 magicNum;

if(NAND_readPage(hNandInfo,count,0,rxBuf) != E_PASS)
{
continue;
}

magicNum = ((Uint32 *)rxBuf)[0];

/* Valid magic number found */
if (magicNum == UBL_MAGIC_BINARY_BOOT)
{
blockNum = count;
DEBUG_printString("Valid magicnum, ");
DEBUG_printHexInt(magicNum);
DEBUG_printString(", found in block ");
DEBUG_printHexInt(blockNum);
DEBUG_printString(".\r\n");
break;
}
}

// Never found valid header in any page 0 of any of searched blocks
if (count > DEVICE_NAND_UBL_SEARCH_END_BLOCK)
{
static void (*APPEntry)(void);

DEBUG_printString("No valid boot image found!");
}

The only way that this could happen is if the break; (located after displaying the Valid magicnum found message) has been corrupted. We suspect that there is a read disturbance when reading another block that corrupts the break; statement. Do you believe this is possible?

We have talked about several ways to work around this problem.

1) Add another break statement after the first one so that if the first one is corrupted the second one will break out of the for loop when count is 6 rather than allowing count to reach 33.

2) Place a second copy of the UBL in an unused block (/dev/mtd8) which is located at address 0x8220000. Jump to the second copy if the conflict is noticed. The jump apparently didn't work because the system hung when the jump was executed. I then thought of copying the second UBL to RAM and jumping to RAM. The problem is that I added debug code to read the first 5 bytes of all 4096 blocks and only 1, 6, 7, and 8 did not return E_FAIL.I don't understand why NAND_readFlash fails to read all blocks. Am I understanding correctly that address 0x8220000 is located at block 1029? In any case, I couldn't try this idea because I can't read block 1029 with NAND_readFlash.

3) Modify protection for /dev/mtd1 to allow me to erase/write /dev/mtd1 where the UBL is located and rewrite it if it gets corrupted. The firmware would read it periodically and compare it with a good copy of the UBL. Then rewrite it if it has been corrupted. I was able to find in board-da850-evm.c where I can enable erase/write of /dev/mtd1. But, when I erase, write, sync, and reboot, the UBL does not execute. The system just hangs. I did a hexdump of /dev/mtd1, saved a copy, converted it to binary. Then, I erased /dev/mtp1, did a nandwrite -p /dev/mtd1 mtd_copy.bin, did a hexdump again and compared it with the original dump. I did a sync and waited 2 minutes before entering reboot. But, the system hangs (displays no debug messages) when it reaches trying to execute the UBL.

I've been blocked in every direction I've turned. If you have any suggestions for resolving this UBL problem I would appreciate your help. If you see flaws in my understanding, please correct me.

Thanks,

Mark Stolp (801 303 3427)

ClearOne, Inc.

over 4 years ago

0 Mark Stolp over 4 years ago

Prodigy 40 points

1) Power cycling the unit does not resolve this problem. The NAND flash is corrupted until we reprogram it using the OMAP boot from uart dip switch settings.

2) We are using NAND MT29F4G08ABADAWP which uses SLC technology which is not supposed to be susceptible to read disturbance until they reach one million reads.

3) /dev/mtd1 is write protected in the field. So, we don't believe any rogue firmware is writing to a location in the UBL.

0 Mark Stolp over 4 years ago in reply to Mark Stolp

Prodigy 40 points

1) Our NAND part is powered by 3.3v. Would you recommend 1.8v to reduce the possibility of read disturbance?

0 Hong Guan64 over 4 years ago in reply to Mark Stolp

TI__Guru 70980 points

Hi Mark,
The AIS image generated from u-boot build can be flashed to NAND, and booted by on-chip RomBootLoader(RBL).
Please refer to the section <Flashing the images to NAND> in this file:
github.com/.../README.da850
The similar info on NAND flashing and booting @u-boot is also described here:
software-dl.ti.com/.../Foundational_Components_U-Boot.html
Best,
-Hong

0 Mark Stolp over 4 years ago in reply to Hong Guan64

Prodigy 40 points

Yes, I've seen that information before. What I want to do is reprogram the UBL from a Linux prompt. Then, I want to reboot and execute the new copy of the UBL. When I erase /dev/mtd1 and write a new UBL, the system does not boot up. I don't understand why. Is there a sure fire way to do it?

0 Hong Guan64 over 4 years ago in reply to Mark Stolp

TI__Guru 70980 points

Hi Mark,
Have you reviewed the "Advisory 2.3.24 Boot: ECC Data Error in Spare Area Causes NAND Boot Failure" in OMAPL138 Errata
www.ti.com/.../sprz301

Best,
-Hong

0 Mark Stolp over 4 years ago

Prodigy 40 points

I would need more detailed instructions on what to do to apply the patch mentioned in the errata. Right now I run a batch file called burnboot.bat which invokes installation of the ubl and u-boot. Do you have new versions of the install executable? I build the ubl and u-boot using a cross compiler running in ubuntu inside virtual box. Does my build process of the ubl and u-boot need to change? Can you explain what I need to do?

Thanks,

Mark

0 Hong Guan64 over 4 years ago in reply to Mark Stolp

TI__Guru 70980 points

Hi Mark,
There's some note on how to apply the workaround for "Advisory 2.3.24 Boot: ECC Data Error in Spare Area Causes NAND Boot Failure" in OMAPL138 Errata
>>>
The software patch is available as a pre-built file with the latest version of the AIS tool
that is used to generate the NAND flash boot image. The Using the OMAP-L132/L138
Bootloader Application Report (Literature number: SPRAB41) provides a link to the
install package for the AIS tool which includes the following in the install directory:
prebuilt patch files, the GUI AIS generation tool (AISGEN.exe, version 1.11 or later),
command-line AIS generation tool and an example INI file.
<<<<
In fact, UBL approach had not been updated since NAND boot is supported in Linux SDK.
Have we ran HW diagnostics test for any HW issue given the failure occurs only on 2.5% of the boards?
Best,
-Hong

0 RonB over 4 years ago in reply to Hong Guan64

TI__Mastermind 30706 points

Hi Mark,

I recommend for you to read the entire Errata on page 34 of this document:

https://www.ti.com/lit/er/sprz301m/sprz301m.pdf?ts=1635276326284&ref_url=https%253A%252F%252Fwww.ti.com%252Fproduct%252FOMAP-L138%253Futm_source%253Dgoogle%2526utm_medium%253Dcpc%2526utm_campaign%253Depd-pro-null-prodfolddynamic-cpc-pf-google-wwe%2526utm_content%253Dprodfolddynamic%2526ds_k%253DDYNAMIC%2BSEARCH%2BADS%2526DCM%253Dyes%2526gclid%253DCjwKCAjwzt6LBhBeEiwAbPGOgU2QJB2JL4oC_VHo5cHTUn34htwLWFDL0Z7dOX3r_XztcHM3fMZJwhoCn40QAvD_BwE%2526gclsrc%253Daw.ds

It has more details about the fix and how the newer AISgen supports that fix (patch).

We no longer support UBL as Hong points out above. We recommend you switch to the newer booting method and apply the errata patch to see if it will help your NAND boot problem.

Processors

Processors forum

UBL reports that it finds the magic number but then reports it couldn't find it.