This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Can't load environment (and u-boot) from NAND by SPL - bug suspected.

Part Number: AM5728


Have a good time.

We have a very strange situation with NAND (SDK 08_02_00_04 custom board).

I can load both SPL and u-boot via UART.

I can program NAND with this u-boot, read/write everything works fine.

Now, I erase NAND chip and program it with SPL, u-boot and do 'saveenv' from within u-boot in RAM (loaded via UART), switch boot configuration and make power cycle.

The board boots from NAND, I see SPL header and it writes:
U-Boot SPL 2021.01.10516-01 (Apr 11 2023 - 10:41:48 +0300)
DRA752-GP ES2.0
Trying to boot from NAND
Loading Environment from NAND...
omap-elm: uncorrectable ECC errors
....... (2 messages of this kind)
OK
UBI: Loading VolId #1
UBI: Loading VolId #1
UBI warning: Failed
SPL: failed to boot from all boot devices
 ### ERROR ### Please RESET the board ###

So, here are several questions:
1. I see a lot of debug() calls in u-boot tree which may really help But I failed to find the way how to activate them?
2. How can it be that my SPL/u-boot loaded via UART work fine with NAND, but SPL fais to work (ECC error?) loaded from NAND?
3. Why does SPL speak about UBI if it failed to load environment? Why UBI appears at all when it should load u-boot binary?

Please, help me!!!!

UPDATE1: After several days of attempts to trace the error, I see the following:
1. The control arrives to the function nand_read_page() from the file
/opt/ti-processor-sdk-linux-rt-am57xx-evm-08_02_00_04/board-support/u-boot-2021.01+gitAUTOINC+44a87e3ab8-g44a87e3ab8/drivers/mtd/nand/raw/am335x_spl_bch.c
eccsize=512, eccbytes=14, eccsteps=4, oobpos=2050 - which is correct.

2. Then omap_correct_data_bch() function is called from the file
/opt/ti-processor-sdk-linux-rt-am57xx-evm-08_02_00_04/board-support/u-boot-2021.01+gitAUTOINC+44a87e3ab8-g44a87e3ab8/drivers/mtd/nand/raw/omap_gpmc.c
Here are two arrays calc_ecc and read_ecc ready. I print them out and see THEY ARE IDENTICAL(!!!!) (after byte reordering).

3. Nevertheless, (they are identical!) this omap_correct_data_bch() function calls elm_check_error() function from the file
/opt/ti-processor-sdk-linux-rt-am57xx-evm-08_02_00_04/board-support/u-boot-2021.01+gitAUTOINC+44a87e3ab8-g44a87e3ab8/drivers/mtd/nand/raw/omap_elm.c
To this function arrives syndrome which is actually the calc_ecc (again, it is correct!). This syndrome is loaded into ELM by elm_load_syndromes() function (in the same file), ELM is run and it produces location_status=3 which means 3 errors were detected.
BUT THERE IS NO ERRORS!!!

4. The only thing I noted here, which seems to me as a bug, is the code of the function elm_load_syndromes(). The corresponding fragment looks like:
if (bch_type == BCH_8_BIT || bch_type == BCH_16_BIT) {
.....
          val = syndrome[12] | (syndrome[13] << 8) |  (syndrome[14] << 16) | (syndrome[15] << 24);
BUT!!!
ECC for BCH8 has 14 bytes. And bytes syndrome[14]  and syndrome[15] will be garbage, no?
I tried to remove the usage of excessive bytes, but the result is still location_status=3.

TI!!!! PLEASE!!!! COMMENT!!!!

UPDATE2: I inserted 'return 0' just before error message "uncorrectable ECC errors":

/* check if correctable */
location_status = readl(&elm_cfg->error_location[poly].location_status);
return 0; <-------- my return.

if (!(location_status & ELM_LOCATION_STATUS_ECC_CORRECTABLE_MASK)) {
printf("%s: uncorrectable ECC errors\n", DRIVER_NAME);
return -EBADMSG;
}

Now, SPL loads u-boot, u-boot starts and work fine!
I still see a lot of messages about badLocation_status (my printf), but everything works.
The loaded u-boot is fully functional. And also programs and reads NANDs  without a problem.

I am really really lost here. ANY HELP WILL BE HIGHLY APPRECIATED!!!

TI people, please, please, help!!!!!