This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OMAP-L138: NAND ECC Issue

Part Number: OMAP-L138
Other Parts Discussed in Thread: OMAPL138,

Hi All,

         We are trying to interface Winbond W29N04GV 512 MB NAND flash with Omapl138 LCDK, In u-boot I am trying write a pattern from ram location to NAND flash that write is getting fail, I added some prints and found that that data will write into the flash correctly but while verifying that data the ECC calculation and correction is happening in correction stage the we are setting the corresponding bits in NANDFCR to calculate the ECC and checking the NANDFSR register for the errors it is returning 0x01 in ECC_STATE bit field.

         We are using 4-bit ECC in both kernel and u-boot, In kernel we are trying to flash a UBI filesystem, after flashing we cannot able to attach the ubifs we are getting some error like bit flipped in PEB 0. We cannot rectify what the issue is please can anyone share me the configurations or any driver patchs for kernel and u-boot to solve this issue. 

Thank You

Deepak H M 

  • Hi,

    First could you give more details on how do you interface the LCDK OMAP-L138 and the NAND flash (if possible post the schematic)?
    Also which Processor SDK Linux are you using?

    Best Regards,
    Yordan

  • Hi,

    I am trying to access the nand flash in both u-boot and kernel, In both the case I observed that, the generation of ECC fails when I try to write the data into the flash and the ECC value is generating while reading the data from flash.

    In kernel I am using 1-bit ECC with 8 data lines, I added some prints for ECC value and I found as below.

    root@omapl138-lcdk:~# nandwrite -p /dev/mtd4 helo.txt

    Writing data to block 0 at offset 0x0

    Ecc_val = 0

    Ecc_val = 0

    Ecc_val = 0

    Ecc_val = 524296

    Ecc_val = 524296

    Ecc_val = 0

    Ecc_val = 0

    Ecc_val = 0

    root@omapl138-lcdk:~# nanddump /dev/mtd4 -f dumphelo.txt -l 3000

    ECC failed: 0

    ECC corrected: 0

    Number of bad blocks: 0

    Number of bbt blocks: 0

    Block size 131072, page size 2048, OOB size 64

    Dumping data starting at 0x00000000 and ending at 0x00000bb8...

    Ecc_val = 91227504

    Ecc_val = 94373280

    Ecc_val = 265293776

    Ecc_val = 5246895

    Ecc_val = 197133375

    ECC: 3 uncorrectable bitflip(s) at offset 0x00000000

    ECC: 2 uncorrectable bitflip(s) at offset 0x00000800

    Ecc_val = 134219776

    Ecc_val = 99747314

    Ecc_val = 0

    root@omapl138-lcdk:~# nanddump /dev/mtd4 -f dumphelo.txt -l 3000

    ECC failed: 5

    ECC corrected: 0

    Number of bad blocks: 0 Number of bbt blocks: 0

    Block size 131072, page size 2048, OOB size 64

    Dumping data starting at 0x00000000 and ending at 0x00000bb8...

    Ecc_val = 91227504

    Ecc_val = 94373280

    Ecc_val = 265293776

    Ecc_val = 5246895

    ECC: 3 uncorrectable bitflip(s) at offset 0x00000000

    Ecc_val = 197133375

    Ecc_val = 134219776

    Ecc_val = 99747314

    Ecc_val = 0

    ECC: 2 uncorrectable bitflip(s) at offset 0x00000800

    In the 1st read the ECC Failed is zero if I run the nanddump command again and again the ECC Failed will keep on increasing. And I observed that if I change the data in helo.txt file and try to write into the flash the ECC value will be remain same as above that is not changing.

    I observed that the data written into the flash is correct if I read back the data from the flash and if I compare both the files like helo.txt and dumphelo.txt both are same the only issue is ECC. So that I am getting that bitflip error.

    I u-boot also I tried to read and write into the flash but It is giving some error as shown below,

    => nand write 0xcd000000 0x0 0x1000

    NAND write: device 0 offset 0x0, size 0x1000

    0 bytes written: ERROR

    => nand read 0xcd000000 0x0 0x1000

    NAND read: device 0 offset 0x0, size 0x1000

    NAND read from offset 0 failed -74

    0 bytes read: ERROR

    =>

    I added some prints in U-boot code and I found at the after writing into the flash the verifying of data is happening at that time the data is read from the flash and loading required values into NANDF* register to verify the ECC after verifying if we read NANDFSR register the register is returning Value=0x01 i.e. Errors cannot be corrected (5 or more)” ECC_STATE field.

    (Note: I am using 4-bit ECC in u-boot with eight data lines in u-boot).

    mercury_v1_OMAP-L138_NAND-Interface.pdf

    The attached document is schematic how NAND flash is interfaced with LCDK OMAP-L138. I am using the “linux-4.14.79+gitAUTOINC+e669d52447-ge669d52447” processor SDK linux and “u-boot-2018.01+gitAUTOINC+313dcd69c2-g62981096f2” u-boot. Please figure out what is the issue It is a very important interface we needed as soon as possible.

    Thank you

    Deepak.H.M

  • Hi Yordan,

    I Missed one Information, In the schematic they showed 16-bit data line but we are using only 8 -bit lines.

    Thank You

  • Deepak,

    Can you please indicate how you are reading and writing to the NAND. Using Uboot NAND driver or Linux  driver? 

    Please not that the ECC layout expected by ROM when booting Uboot and NAND ECC Layout used by Linux driver on this device is different. Please refer to this wiki to see how the NAND ECC layout is used in the two components:

    http://processors.wiki.ti.com/index.php/DM365_Nand_ECC_layout

    Please ensure that you are programming the NAND and ECC correctly. If you are using 8 bit NAND then look at the DA850 EVM or OMAPL138 EVM configuration as default setting may be 16 bit NAND as that is what is used on the TI LCDK platform. 

    Also, please check that there is a know issue reported with mtd_stresstest on this device with latest NAND driver with LCPD-10777 which may also relate to use of NAND ECC With UBIFS filesystem on NAND.

    Regards,

    Rahul

  • Hi Rahul thanks for your response,

    I saw the link and found the patch in it, I observed few things, that patch is not supported or belongs to my kernel my kernel version is 4.14 and that patch supports kernel release v2.6.33. moreover that patch is for 4 Bit ECC our board is configured for 1-bit ECC. I can't found structure for ECC positions, ECC -bytes and oobfree for 1-bit ECC. I added some prints in kernel to make sure that ECC offset and obbfree values are correct or not for 4-bit ECC as per the patch I got the correct values i.e. eccpos  offset=24 and length=40, for oobfree legnth=22 and offset=2. Please give me the configuration for 1-bit ECC not for 4-bit please share me the layout for 1-bit ECC. Because our NAND device support only 1 bit ECC.

    Note: We are not using NAND to store u-boot or kernel only we are using it for storage i.e. filesystem and we are using ubifs filesystem.

    Thank you

    Deepak.H.M

  • Hi, Deepak,

    Sorry for the slow response. It took me some time to understand the issue. When you write to the NAND, did you erase the NAND first before writing to it? If not, could you try erase it first?

    Thanks!

    Rex 

  • Hi Rex,

    Thanks for your replay. Ya, I am erasing before writing into flash, if I read the erased flash I am getting "ff" in all the regions correctly. The issue is with ECC, the parity in NANDFnECC  is not generating while writing. At the time of reading I am getting  some values but not sure weather it is correct or not. I tried 4-bit ECC also I observed that there is a parity mismatch while writing and reading i.e. the values in (NAND4BITECC1 - (NAND4BITECC4) is different while writing and reading. Irrespective of ECC data write and read is happening properly. Did I missed any configuration or I need any patch for the latest davinchi_nand.c driver.

    Thank you

    Deepak. H M

  • Hi, Deepak,

    I browse through the code and this is what I see in the code. The DTS file configures 4 bit ECC. When you tried with 4-bit ECC, is NANDF4ECC generated? Could you also check NANDFCR register is the corresponding bits are set? I also see that the patch Rahul mentioned is implemented in davinci_nand.c and all the #ifdef are defined in include/configs/omapl138_lcdk.h. So, 4-bit should work. I'll need some time to poke around 1-bit ECC.

    Rex

  • Hi, Deepak,

    It looks to me that the U-Boot only has code for 4-bit ECC, not 1-bit ECC. You will need to write those functions for 1-bit ECC calculation and correction functions.

    Rex

  • Hi Rex,



    Sorry for the late replay, As you said in dts 4-bit ECC is mentioned, but I changed it to 1 bit because our NAND supports only 1-bit ECC, and NANDF4ECC register is for 1-bit ECC register for CS5 and ours is NANDF2ECC the parity is not generated in this register but if read back from the NAND the parity is generating so that is the issue.

    Thank You

    Deepak H M

  • In u-boot I have not verified 1-bit ECC, I ll try and I ll update you.

    Thank You

    Deepak

  • Hi, Deepak,

    U-boot has its own dts file. I would suggest both u-boot and kernel have the same configuration. 

    Rex

  • Hi Rex,

    Sorry for the late replay, We replaced the NAND flash from WINBOND to MICRON i.e. MT29F4G08ABADA part number which is 8-bit NAND flash can support 4-bit ecc we changed the device tree nodes for 4-bit ECC still we are getting ECC errors. Can I know the davinci driver is working proper or not.

    Thank you

    Deepak H M

  • Hi, Deepak,

    Yes, we ran NAND ECC tests. Below are the snippets from automation test results running 4.14.79 kernel (PLSDK 5.2)

    NAND ECC Test

    root@omapl138-lcdk:~# 
    - 21:48:57 [INFO] Host: uname -a
    - 21:48:57 [INFO] Target: 
    uname -a
    Linux omapl138-lcdk 4.14.79-g3438de3474 #1 PREEMPT Wed Nov 14 23:04:38 UTC 2018 armv5tejl GNU/Linux
    root@omapl138-lcdk:~# 
    :::::::::::::::::
    <<<test_start>>>
    tag=NAND_M_MODLTP: starting NAND_M_MODULAR_MTD_NANDECCTEST (source "mtd_common.sh"; install_modules.sh "nand"; rmmod.sh "mtd_nandecctest"; part=`get_mtd_partition_number.sh "nand"`; do_cmd printout_mtdinfo /dev/mtd$part; do_cmd flash_eraseall -q /dev/mtd$part; do_cmd sleep 1; dmesg -c > /dev/null; do_cmd insmod.sh "mtd_nandecctest"; dmesg; dmesg | grep -E "not\s*ok" && (rmmod.sh "mtd_subpagetest";exit 1) || (rmmod.sh "mtd_subpagetest";exit 0) )
    ULAR_MTD_NANDECCTEST stime=1542056841
    cmdline="source "mtd_common.sh"; install_modules.sh "nand"; rmmod.sh "mtd_nandecctest"; part=`get_mtd_partition_number.sh "nand"`; do_cmd printout_mtdinfo /dev/mtd$part; do_cmd flash_eraseall -q /dev/mtd$part; do_cmd sleep 1; dmesg -c > /dev/null; do_cmd insmod.sh "mtd_nandecctest"; dmesg; dmesg | grep -E "not\s*ok" && (rmmod.sh "mtd_subpagetest";exit 1) || (rmmod.sh "mtd_subpagetest";exit 0) "
    contacts=""
    analysis=exit
    <<<test_output>>>
    incrementing stop
    |TRACE LOG|Inside do_cmd:CMD=modprobe -r mtd_nandecctest|
    |TRACE LOG|Inside do_cmd:CMD=lsmod | grep 'mtd_nandecctest ' && die mtd_nandecctest should not be seen in lsmod || exit 0|
    |TRACE LOG|Inside do_cmd:CMD=printout_mtdinfo /dev/mtd2|
    |TRACE LOG|Inside do_cmd:CMD=mtdinfo /dev/mtd2|
    mtd2
    Name:                           free-space
    Type:                           nand
    Eraseblock size:                131072 bytes, 128.0 KiB
    Amount of eraseblocks:          4087 (535691264 bytes, 510.9 MiB)
    Minimum input/output unit size: 2048 bytes
    Sub-page size:                  512 bytes
    OOB size:                       64 bytes
    Character device major/minor:   90:4
    Bad blocks are allowed:         true
    Device is writable:             true
    
    |TRACE LOG|Inside do_cmd:CMD=flash_eraseall -q /dev/mtd2|
    flash_eraseall has been replaced by `flash_erase <mtddev> 0 0`; please use it
    |TRACE LOG|Inside do_cmd:CMD=sleep 1|
    |TRACE LOG|Inside do_cmd:CMD=insmod.sh mtd_nandecctest|
    |TRACE LOG|Inside do_cmd:CMD=depmod -a|
    |TRACE LOG|Inside do_cmd:CMD=modprobe mtd_nandecctest|
    mtd_nandecctest: ok - no-bit-error-256
    mtd_nandecctest: ok - single-bit-error-in-data-correct-256
    mtd_nandecctest: ok - single-bit-error-in-ecc-correct-256
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - double-bit-error-in-data-detect-256
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - single-bit-error-in-data-and-ecc-detect-256
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - double-bit-error-in-ecc-detect-256
    mtd_nandecctest: ok - no-bit-error-512
    mtd_nandecctest: ok - single-bit-error-in-data-correct-512
    mtd_nandecctest: ok - single-bit-error-in-ecc-correct-512
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - double-bit-error-in-data-detect-512
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - single-bit-error-in-data-and-ecc-detect-512
    __nand_correct_data: uncorrectable ECC error
    mtd_nandecctest: ok - double-bit-error-in-ecc-detect-512
    mtd_nandecctest         3234  0
    nand_ecc                3837  2 nand,mtd_nandecctest
    [32m[  179.563249] [0m[33mmtd_nandecctest[0m: ok - no-bit-error-256
    [32m[  179.574302] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-data-correct-256
    [32m[  179.622836] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-ecc-correct-256
    [32m[  179.675959] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  179.680040] [0m[33mmtd_nandecctest[0m: ok - double-bit-error-in-data-detect-256
    [32m[  179.685238] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  179.731281] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-data-and-ecc-detect-256
    [32m[  179.755919] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  179.759997] [0m[33mmtd_nandecctest[0m: ok - double-bit-error-in-ecc-detect-256
    [32m[  179.765190] [0m[33mmtd_nandecctest[0m: ok - no-bit-error-512
    [32m[  179.815200] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-data-correct-512
    [32m[  179.835432] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-ecc-correct-512
    [32m[  179.863556] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  179.931403] [0m[33mmtd_nandecctest[0m: ok - double-bit-error-in-data-detect-512
    [32m[  179.980163] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  179.984245] [0m[33mmtd_nandecctest[0m: ok - single-bit-error-in-data-and-ecc-detect-512
    [32m[  180.089594] [0m[33m__nand_correct_data[0m[31m: uncorrectable ECC error[0m
    [32m[  180.093683] [0m[33mmtd_nandecctest[0m: ok - double-bit-error-in-ecc-detect-512
    |TRACE LOG|Inside do_cmd:CMD=modprobe -r mtd_subpagetest|
    |TRACE LOG|Inside do_cmd:CMD=lsmod | grep 'mtd_subpagetest ' && die mtd_subpagetest should not be seen in lsmod || exit 0|
    <<<execution_status>>>
    initiation_status="ok"
    duration=24 termination_type=exited termination_id=0 corefile=no
    cutime=335 cstime=992
    <<<test_end>>>
    INFO: ltp-pan reported all tests PASS
    

    NAND Func BCH8 ECC test

    <<<test_start>>>LTP: starting NAND_S_FUNC_ECC_2K_BCH8_8ERRS_NO_OOB_ERR (source "common.sh"; nandecc_tests.sh -r "0:0:0xFF"~"1:1:0xFF"~"2:2:0xFF"~"3:3:0xFF")
    
    tag=NAND_S_FUNC_ECC_2K_BCH8_8ERRS_NO_OOB_ERR stime=1542056874
    cmdline="source "common.sh"; nandecc_tests.sh -r "0:0:0xFF"~"1:1:0xFF"~"2:2:0xFF"~"3:3:0xFF""
    contacts=""
    analysis=exit
    <<<test_output>>>
    
    ::::::::::::

    |TRACE LOG|Dumpping nand...|
    |TRACE LOG|Inside do_cmd:CMD=nanddump -l 2048 -f /tmp/ltp-XXXXhkt2Ig/testfile_nanddump.corrected /dev/mtd2 |
    ECC failed: 0
    ECC corrected: 0
    Number of bad blocks: 0
    Number of bbt blocks: 4
    Block size 131072, page size 2048, OOB size 64
    Dumping data starting at 0x00000000 and ending at 0x00000800...
    ECC: 4 corrected bitflip(s) at offset 0x00000000
    result is 0
    |TRACE LOG|Check if the errors are got corrected|
    |TRACE LOG|Inside do_cmd:CMD=hexdump -C /tmp/ltp-XXXXhkt2Ig/testfile_nanddump.original > /tmp/ltp-XXXXhkt2Ig/original |
    |TRACE LOG|Inside do_cmd:CMD=hexdump -C /tmp/ltp-XXXXhkt2Ig/testfile_nanddump.corrected > /tmp/ltp-XXXXhkt2Ig/corrected |
    |TRACE LOG|diff /tmp/ltp-XXXXhkt2Ig/original /tmp/ltp-XXXXhkt2Ig/corrected |
    |TRACE LOG|Nand ECC Test Pass|
    |TRACE LOG|Inside do_cmd:CMD=flash_erase -q /dev/mtd2 0 0|
    <<<execution_status>>>
    initiation_status="ok"
    duration=47 termination_type=exited termination_id=0 corefile=no
    cutime=366 cstime=1469
    <<<test_end>>>

  • Hi Rex,

    Can I know how to run this test. I ll also try and update you

  • Hi, Deepak,

    I extract the command out of the test result logs. 

    Non-OOB test

    - 21:49:00 [INFO] Host: cd /opt/ltp
    - 21:49:00 [INFO] Host: ./runltp -P omapl138-lcdk -f ddt/nand_mtdtests -s "NAND_M_MODULAR_MTD_NANDECCTEST "
    - 01:02:03 [INFO] Host: ./runltp -P omapl138-lcdk -f ddt/nand_ecc_tests -s "NAND_S_FUNC_ECC_2K_BCH8_8ERRS_NO_OOB_ERR "

    OOB Test

    - 01:13:16 [INFO] Host: ./runltp -P omapl138-lcdk -f ddt/nand_ecc_tests -s "NAND_S_FUNC_ECC_2K_BCH8_8ERRS_W_OOB_ERR "
    

    Rex

  • Hi Rex,

    I tried these 3 test 2 of them got failed i.e. 

    - 01:02:03 [INFO] Host: ./runltp -P omapl138-lcdk -f ddt/nand_ecc_tests -s "NAND_S_FUNC_ECC_2K_BCH8_8ERRS_NO_OOB_ERR " : Failed 

    - 01:13:16 [INFO] Host: ./runltp -P omapl138-lcdk -f ddt/nand_ecc_tests -s "NAND_S_FUNC_ECC_2K_BCH8_8ERRS_W_OOB_ERR " :  Failed 

    nand_logs.log






    I attached the logs kindly check this and update me if you find anything wrong. 

    Thank you

    Deepak

  • Hi, Deepak,

    I ran the tests using PLSDK 6.x, and it fails on me as well. We'll need to look into the cause. At the mean time, could you roll back to PLSDK 5.x which I ran successfully.

    Rex

  • Hi, Deepak,

    I checked our latest system test result from PLSDK 6.1. The test was successful. Below are the snippets from that test result:

    I can't explain why it failed on me but successful in our test farm.

    root@omapl138-lcdk:~# 
    - 11:40:45 [INFO] Sleeping 15 secs to allow systemd to finish starting processes...
    - 11:41:00 [INFO] Disconnected serial from omapl138-lcdk
    - 11:41:00 [INFO] Connected to omapl138-lcdk via serial 
    - 11:41:00 [INFO] Host: uname -a
    - 11:41:00 [INFO] Target: 
    uname -a
    Linux omapl138-lcdk 4.19.59-gb7ab997cac #1 PREEMPT Tue Aug 20 08:17:10 UTC 2019 armv5tejl GNU/Linux
    root@omapl138-lcdk:~# 
    root@omapl138-lcdk:~# 
    - 11:40:45 [INFO] Sleeping 15 secs to allow systemd to finish starting processes...
    - 11:41:00 [INFO] Disconnected serial from omapl138-lcdk
    - 11:41:00 [INFO] Connected to omapl138-lcdk via serial 
    - 11:41:00 [INFO] Host: uname -a
    - 11:41:00 [INFO] Target: 
    uname -a
    Linux omapl138-lcdk 4.19.59-gb7ab997cac #1 PREEMPT Tue Aug 20 08:17:10 UTC 2019 armv5tejl GNU/Linux
    root@omapl138-lcdk:~# 

    ::::::::::::::::::::::::::::::

    |TRACE LOG|Dumpping nand...|
    |TRACE LOG|Inside do_cmd:CMD=nanddump -l 2048 -f /tmp/ltp-XXXXjbjeuC/testfile_nanddump.corrected /dev/mtd3 |
    ECC failed: 0
    ECC corrected: 0
    Number of bad blocks: 0
    Number of bbt blocks: 4
    Block size 131072, page size 2048, OOB size 64
    Dumping data starting at 0x00000000 and ending at 0x00000800...
    ECC: 4 corrected bitflip(s) at offset 0x00000000
    result is 0
    |TRACE LOG|Check if the errors are got corrected|
    |TRACE LOG|Inside do_cmd:CMD=hexdump -C /tmp/ltp-XXXXjbjeuC/testfile_nanddump.original > /tmp/ltp-XXXXjbjeuC/original |
    |TRACE LOG|Inside do_cmd:CMD=hexdump -C /tmp/ltp-XXXXjbjeuC/testfile_nanddump.corrected > /tmp/ltp-XXXXjbjeuC/corrected |
    |TRACE LOG|diff /tmp/ltp-XXXXjbjeuC/original /tmp/ltp-XXXXjbjeuC/corrected |
    |TRACE LOG|Nand ECC Test Pass|
    |TRACE LOG|Inside do_cmd:CMD=flash_erase -q /dev/mtd3 0 0|
    <<<execution_status>>>
    initiation_status="ok"
    duration=27 termination_type=exited termination_id=0 corefile=no
    cutime=354 cstime=1693
    <<<test_end>>>
    INFO: ltp-pan reported all tests PASS
    LTP Version: 20180118
            
           ###############################################################"
            
                Done executing testcases."
                LTP Version:  20180118
                Result log is in the /tmp/tmp.1NsqUk "
           ###############################################################"
           
    Test Start Time: Tue Aug 20 07:34:03 2019
    -----------------------------------------
    Testcase                       Result     Exit Value
    --------                       ------     ----------
    NAND_S_FUNC_ECC_2K_BCH8_8ERRS_ PASS       0    
    
    -----------------------------------------------
    Total Tests: 1
    Total Skipped Tests: 0
    Total Failures: 0
    Kernel Version: 4.19.59-gb7ab997cac
    Machine Architecture: armv5tejl
    Hostname: omapl138-lcdk