This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM317evm NAND flash memory, UIBFS ECC errors at boot time

Other Parts Discussed in Thread: AM3517

Hello,

I'm trying to convert an existing filesystem from JFFS2 to UBIFS format from the detailed steps at: processors.wiki.ti.com/index.php/UBIFS_Support. The build process went well: my Linux workstation runs Ubuntu 10.04 and I used mkfs.ubifs version1.5. I flashed the NAND memory chip with u-boot and set environment variables to link with the UBIFS file system.

Note that the instructions tells the User to swich to BCH8 ECC before flashing the UBI image, but without telling the command explicitly. I set nandecc to BCH8_sw anyway before flashing.  

Now this is where it hurts: the kernel reports ECC errors when it boots: things start nicely as I power my EVM board on:

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[ 0.000000] Linux version 2.6.37 (marc@Linux) (gcc version 4.3.3 (Sourcery G++ Lite 2009q1-203) ) #40 Thu Aug 9 16:18:44 EDT 2012
[ 0.000000] CPU: ARMv7 Processor [411fc087] revision 7 (ARMv7), cr=10c53c7f
[ 0.000000] CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[ 0.000000] Machine: OMAP3517/AM3517 EVM
[ 0.000000] Memory policy: ECC disabled, Data cache writeback
[ 0.000000] AM3517 ES1.0 (l2cache iva sgx neon isp )
[ 0.000000] SRAM: Mapped pa 0x40200000 to va 0xfe400000 size: 0x10000
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 65024
[ 0.000000] Kernel command line: console=ttyO2,115200n8 noinitrd ip=off mem=256M rootwait=1 rw ubi.mtd=4,2048 root=ubi0:rootfs init=/init rootfstype=ubifs
(...)

Two secons later, the boot process hits several UBI snags: ECC errors (-74): please note that I got rid of the mtdoops message in subsequent attemps, but I still don't know why I still get ECC errors.

[ 2.306152] mtdoops: mtd device (mtddev=name/number) must be supplied
[ 2.314270] omap2-nand driver initializing
[ 2.319305] NAND device: Manufacturer ID: 0x2c, Chip ID: 0xbc (Micron )
[ 2.326477] Creating 5 MTD partitions on "omap2-nand.0":
[ 2.332122] 0x000000000000-0x000000080000 : "xloader-nand"
[ 2.348632] 0x000000080000-0x000000240000 : "uboot-nand"
[ 2.362884] 0x000000240000-0x000000280000 : "params-nand"
[ 2.376098] 0x000000280000-0x000000780000 : "linux-nand"
[ 2.391693] 0x000000780000-0x000020000000 : "rootfs-nand"
[ 2.672912] UBI: attaching mtd4 to ubi0
[ 2.677093] UBI: physical eraseblock size: 131072 bytes (128 KiB)
[ 2.683776] UBI: logical eraseblock size: 126976 bytes
[ 2.689483] UBI: smallest flash I/O unit: 2048
[ 2.694427] UBI: sub-page size: 512
[ 2.699310] UBI: VID header offset: 2048 (aligned 2048)
[ 2.705657] UBI: data offset: 4096
[ 2.711425] UBI error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 0:0, read 64 bytes
[ 2.721862] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c026174c>] (ubi_io_read+0x1b4/0x248)
[ 2.731140] [<c026174c>] (ubi_io_read+0x1b4/0x248) from [<c0261d90>] (ubi_io_read_ec_hdr+0x6c/0x3c0)
[ 2.741058] [<c0261d90>] (ubi_io_read_ec_hdr+0x6c/0x3c0) from [<c0265824>] (ubi_scan+0x1ec/0xc04)
[ 2.750457] [<c0265824>] (ubi_scan+0x1ec/0xc04) from [<c025bfbc>] (ubi_attach_mtd_dev+0x680/0xdec)
[ 2.759948] [<c025bfbc>] (ubi_attach_mtd_dev+0x680/0xdec) from [<c0021854>] (ubi_init+0x1c8/0x2dc)
[ 2.769439] [<c0021854>] (ubi_init+0x1c8/0x2dc) from [<c00353b8>] (do_one_initcall+0xc8/0x1a0)
[ 2.778533] [<c00353b8>] (do_one_initcall+0xc8/0x1a0) from [<c0008690>] (kernel_init+0x94/0x14c)
[ 2.787811] [<c0008690>] (kernel_init+0x94/0x14c) from [<c003b1b0>] (kernel_thread_exit+0x0/0x8)

(...)

The kernel carries on and initializes the remaining modules. Once done, it crashes and burns when it mounts the file system root.


[ 27.844604] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 27.853393] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c035d39c>] (panic+0x54/0x180)
[ 27.862030] [<c035d39c>] (panic+0x54/0x180) from [<c0009168>] (mount_block_root+0x1d0/0x210)
[ 27.870971] [<c0009168>] (mount_block_root+0x1d0/0x210) from [<c00092ec>] (prepare_namespace+0x88/0x1bc)
[ 27.880981] [<c00092ec>] (prepare_namespace+0x88/0x1bc) from [<c0008708>] (kernel_init+0x10c/0x14c)
[ 27.890533] [<c0008708>] (kernel_init+0x10c/0x14c) from [<c003b1b0>] (kernel_thread_exit+0x0/0x8)

What could be the problem? Is it because the NAND device might have bad blocks? I can't guarantee that it is free of them.

Thanks!

Yves

  • Yves,

    There are some issues in BCH8 algorithm in 2.6.37 kernel. Could you please send me the following files to renjith.thomas@pathpartnertech.com? Are you using the EVM or your own custom board?

    arch/arm/mach-omap2/board-3517evm.c
    arch/arm/mach-omap2/board-flash.c
    arch/arm/mach-omap2/gpmc.c
    drivers/mtd/nand/nand_base.c
    drivers/mtd/nand/omap2.c

  • Hello Renjith,

    Renjith Thomas said:

    There are some issues in BCH8 algorithm in 2.6.37 kernel. (...) Are you using the EVM or your own custom board?

    Here are some details about the hardware: I use LogicPD's eXperimenter board and AM3517 SOM combo for software development. The target is an in-house designed, data-acquisition prototype, whidh also uses the AM3517 SOM as-is, and our prototype main board remains compatible with the eXperimenter board. 

    Our prototype has one peculiar feature: it uses the GPMC interface to control an FPGA chip. The FPGA connects to the GPMC device_select 1 and it is programmed to behave like a 2Kb static RAM memory at signal-level. It doesn't require any ECC processing at all.

    Although the existing NAND driver remains unchanged, the gpmc.c source file has been patched. The patch changes the scope of a timing configuration function from private (static) to public. That was required to allow finer control over GPMC timings for FPGA access.  

    Are the bug fixes to the BCH8 related issues a work in progress? If they are resolved, then it would be nice to add those fixes to the kernel sources.

    Thanks!

    Yves 

  • Yves,

    I have solved these issues in another platform  DM81xx. Can you send the files that I requested to the mail id in my previous post? I can modify and send the files back. I can't upload it here as my Silverlight plug-in is not working properly.

  • Yves McDonald said:

    Note that the instructions tells the User to swich to BCH8 ECC before flashing the UBI image, but without telling the command explicitly. I set nandecc to BCH8_sw anyway before flashing.  

    A-ha! I notice that the procedure was modified: reference to BCH8 ECC was removed:


    Since we copy the data to NAND, Empty/Erase the required RAM. Then, get the UBIFS image to U-Boot

    NOTE

    On flashing UBIFS image from U-Boot, make sure that ECC selected is in sync with Linux

    Now, that means that if Kernel image was flashed with HW 1 as selected NAND ECC option, then I should also flash UBI images with HW 1 and forget about BCH8_sw option. Correct?

    Yves

  • Your understanding is correct. But there are some practical issues with the NAND driver support in 2.6.37 kernel.

    nand hwecc 0 selects 1-bit hamming code

    nand hwecc 1 selects hw BCH4 

    nand hwecc 2 selects hw BCH8

    1. Hamming code should work easily.

    2. BCH4 has some hardware issues. This is applicable for only 1.0 silicon revision. Please check the Errata for the same.

    3. BCH8 read implementation is not correct. It will not work always. 

  • Renjith Thomas said:

    2. BCH4 has some hardware issues. This is applicable for only 1.0 silicon revision. Please check the Errata for the same.

    I did check my AM3517 SoC silicon revision against the latest errata (http://www.ti.com/lit/er/sprz306d/sprz306d.pdf) and the ECC algorithm (BCH4) I selected is said to be faulty. So in my case, I might as well reflash both kernel and its root filesystem with Hamming code ECC. 

    • The good news: the ECC errors are gone
    • The bad news: invalid UBI block format error shows up.

    although I created a UBI block format for which the smallest flash i/o unit is 2Kb,  UBI driver still uses 512 as VID block offset, not 2048. Where is this error coming from? Are the kernel NAND driver patches going to fix that? 

    [ 2.684112] UBI: attaching mtd4 to ubi0
    [ 2.688415] UBI: physical eraseblock size: 131072 bytes (128 KiB)
    [ 2.695037] UBI: logical eraseblock size: 126976 bytes
    [ 2.700714] UBI: smallest flash I/O unit: 2048
    [ 2.705688] UBI: sub-page size: 512
    [ 2.710601] UBI: VID header offset: 2048 (aligned 2048)
    [ 2.716888] UBI: data offset: 4096
    [ 3.589508] UBI error: validate_ec_hdr: bad VID header offset 512, expected 2048
    [ 3.597351] UBI error: validate_ec_hdr: bad EC header

    FYI: mkfs.ubifs parameters are: -m  2KiB -e 124 KiB -c 4378 -F. Ubinize parameters: -p 128KiB -m 2KiB -s 2048 -O 2048 

    Thanks,

    Yves

  • The ECC errorr may not be over completely. From the logs it looks like UBI volume itself is not created because of sub-page size. This can be fixed by disabling sub-page in NAND driver. After this is fixed, UBI volume will be able to mount. 

    Once you are able to mount, you might still see errors in UBI, if you flash the image from u-boot. Anyways, lets wait and watch.

  • Renjith Thomas said:

    The ECC errorr may not be over completely. From the logs it looks like UBI volume itself is not created because of sub-page size. Once you are able to mount, you might still see errors in UBI, if you flash the image from u-boot. Anyways, lets wait and watch.

    Correct: I re-did my maths, swapped AM3517-SOM on my eXperimenter board to rule out any potential errors caused by  bad NAND blocks and did more experiments. In addition, I dropped u-Boot in favor of Linux-based flash_eraseall and ubiformat tools to flash the UBI image. All of this improved the boot process a bit, but UBIFS fails to work.

    From the kernel boot trace below, we can see that the kernel keeps booting. It also starts UBI background process, but the latter keeps running "torture" tests (ugh!) on PEB #122 and #4034 forever, or until I turn the board off before it actually destroys these PEB with excessive number of erase cycles.

    Also, ubi_io_read() returns ECC errors (-74) and many UNCORRECTED_ERRORs. 

    Interestingly, PEB #122, when multiplied by PEB size (128K) equals the size of the UBI image,  as if the first free block continually failed the test. PEB #4034 is one of the last free PEB of the UBI volume. I don't know if that is just a coincidence or if it really has something to do with free blocks.

    I haven't tried turning off sub-paging yet, but that would be the next step.

    [ 1.374664] 0x000000780000-0x000020000000 : "rootfs-nand"
    [ 1.648162] mtd: Giving out device 4 to rootfs-nand
    [ 1.662322] UBI: attaching mtd4 to ubi0
    [ 1.666503] UBI: physical eraseblock size: 131072 bytes (128 KiB)
    [ 1.673187] UBI: logical eraseblock size: 129024 bytes
    [ 1.678894] UBI: smallest flash I/O unit: 2048
    [ 1.683898] UBI: sub-page size: 512
    [ 1.688720] UBI: VID header offset: 512 (aligned 512)
    [ 1.694885] UBI: data offset: 2048
    [ 1.701110] UNCORRECTED_ERROR default
    [ 1.704925] UNCORRECTED_ERROR default
    [ 1.708862] UBI error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 1:0, read 64 bytes
    [ 1.719299] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)
    [ 1.728607] [<c027b0f4>] (ubi_io_read+0x1b4/0x248) from [<c027b738>] (ubi_io_read_ec_hdr+0x6c/0x3c0)
    [ 1.738525] [<c027b738>] (ubi_io_read_ec_hdr+0x6c/0x3c0) from [<c027f1cc>] (ubi_scan+0x1ec/0xc04)
    [ 1.747924] [<c027f1cc>] (ubi_scan+0x1ec/0xc04) from [<c0275964>] (ubi_attach_mtd_dev+0x680/0xdec)
    [ 1.757415] [<c0275964>] (ubi_attach_mtd_dev+0x680/0xdec) from [<c0021868>] (ubi_init+0x1c8/0x2dc)
    [ 1.766876] [<c0021868>] (ubi_init+0x1c8/0x2dc) from [<c00353b8>] (do_one_initcall+0xc8/0x1a0)
    [ 1.776000] [<c00353b8>] (do_one_initcall+0xc8/0x1a0) from [<c0008690>] (kernel_init+0x94/0x14c)
    [ 1.785278] [<c0008690>] (kernel_init+0x94/0x14c) from [<c003b1b0>] (kernel_thread_exit+0x0/0x8)
    [ 1.794830] UNCORRECTED_ERROR default
    [ 1.798645] UNCORRECTED_ERROR default
    [ 1.802581] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 1:512, read 512 bytes
    [ 1.813323] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)
                        (...) cutting out some of the kernel 'noise' here


    [ 3.637573] UNCORRECTED_ERROR default
    [ 3.641387] UNCORRECTED_ERROR default
    [ 3.645324] UBI error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 4034:0, read 64 bytes
    [ 3.656036] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)

    [ 3.731262] UNCORRECTED_ERROR default

    [ 3.735076] UNCORRECTED_ERROR default

    [ 3.739074] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4034:512, read 512 bytes
    [ 3.750091] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)
    [ 3.759368] [<c027b0f4>] (ubi_io_read+0x1b4/0x248) from [<c027b1f8>] (ubi_io_read_vid_hdr+0x70/0x544)

    [ 3.826202] UNCORRECTED_ERROR default
    [ 3.830047] UNCORRECTED_ERROR default
    [ 3.833984] UBI error: ubi_io_read: error -74 (ECC error) while reading 64 bytes from PEB 4035:0, read 64 bytes
    [ 3.844665] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)

    [ 3.919860] UNCORRECTED_ERROR default
    [ 3.923675] UNCORRECTED_ERROR default
    [ 3.927612] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4035:512, read 512 bytes
    [ 3.938629] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)

    [ 4.015228] UBI: max. sequence number: 3
    [ 4.048645] UBI: attached mtd4 to ubi0
    [ 4.052703] UBI: MTD device name: "rootfs-nand"
    [ 4.058441] UBI: MTD device size: 504 MiB
    [ 4.063720] UBI: number of good PEBs: 4036
    [ 4.068695] UBI: number of bad PEBs: 0
    [ 4.073364] UBI: number of corrupted PEBs: 0
    [ 4.078063] UBI: max. allowed volumes: 128
    [ 4.082977] UBI: wear-leveling threshold: 4096
    [ 4.087890] UBI: number of internal volumes: 1
    [ 4.092590] UBI: number of user volumes: 1
    [ 4.097259] UBI: available PEBs: 0
    [ 4.101989] UBI: total number of reserved PEBs: 4036
    [ 4.107238] UBI: number of PEBs reserved for bad PEB handling: 80
    [ 4.113677] UBI: max/mean erase counter: 1/0
    [ 4.118164] UBI: image sequence number: 477784687
    [ 4.123748] UBI: background thread "ubi_bgt0d" started, PID 383
    [ 4.131835] UNCORRECTED_ERROR default
    [ 4.135681] UNCORRECTED_ERROR default
    [ 4.139648] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4034:512, read 512 bytes
    [ 4.150726] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)
    [

     4.239776] UNCORRECTED_ERROR default
    [ 4.243652] UNCORRECTED_ERROR default
    [ 4.247619] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 122:512, read 512 bytes
    [ 4.258605] [<c003ff68>] (unwind_backtrace+0x0/0xec) from [<c027b0f4>] (ubi_io_read+0x1b4/0x248)

    [ 4.333587] davinci_mdio davinci_mdio: davinci mdio revision 1.5
    [ 4.339965] davinci_mdio davinci_mdio: detected phy mask fffffffe
    [ 4.347198] UBI: run torture test for PEB 122
    [ 4.351898] nand_erase_nand: start = 0x0000016c0000, len = 131072
    [ 4.420471] nand_erase_nand: start = 0x0000016c0000, len = 131072
    [ 4.429260] davinci_mdio: probed
    [ 4.432647] davinci_mdio davinci_mdio: phy[0]: device ffffffff:00, driver SMSC LAN8710/LAN8720
    [ 4.444671] usbcore: registered new interface driver cdc_ether
    [ 4.470245] usbcore: registered new interface driver dm9601
    [ 4.476257] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
    [ 6.496093] ehci-omap ehci-omap.0: OMAP-EHCI Host Controller
    [ 6.504455] ehci-omap ehci-omap.0: new USB bus registered, assigned bus number 1
    [ 6.555267] nand_erase_nand: start = 0x0000016c0000, len = 131072
    [ 6.622863] UBI: PEB 122 passed torture test, do not mark it a bad
    [ 6.629455] nand_erase_nand: start = 0x0000016c0000, len = 131072
    [ 6.637664] UNCORRECTED_ERROR default
    [ 6.641510] UNCORRECTED_ERROR default
    [ 6.645446] UBI error: ubi_io_read: error -74 (ECC error) while reading 512 bytes from PEB 4034:512, read 512 bytes

             (...)

    Renjith Thomas said:

     This can be fixed by disabling sub-page in NAND driver. After this is fixed, UBI volume will be able to mount

    Easier said than done ;-)... Is rebuilding a new UBIFS filesystem by specifying  sub-page size equal to minimum i/o page, and adjusting VID/Data offsets, or does it really means hacking the kernel?

    Thanks,

    Yves

  • Yves,

    Can you send me the files that I mentioned before to renjith.thomas@pathpartnertech.com? I'll modify it and send you back. 

    arch/arm/mach-omap2/board-3517evm.c 
    arch/arm/mach-omap2/board-flash.c
    arch/arm/mach-omap2/gpmc.c
    drivers/mtd/nand/nand_base.c
    drivers/mtd/nand/omap2.c