This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Bad CRC Data loading kernel and u-boot migration strategy

Hi all,

We are suffering a strange behaviour in our dm355 based product. Some units after some time fail during boot process. The error appears at u-boot stage just afterloading kernel image from NAND.

The traces I see in u-boot are these:

## Booting image at 80700000 ...
   Image Name:   Linux-2.6.10_mvl401
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1594772 Bytes =  1.5 MB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... Bad Data CRC
DM355 EVM #

The issue is the same that other users report:

http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/100/t/134817.aspx

I have found some references to ECC errors with old version of u-boot loader:

http://e2e.ti.com/support/embedded/linux/f/354/t/73788.aspx

I realized that I should migrate to a newer u-boot.  Would it be possible to use newer u-boot (2009.3 version for instance), and keep UBL, flash partitions and kernel version?

Details of my platform:

dm355 based.

DVSDK 1.3 (montavista-pro.4.0.1)
Kernel: 2.6.10
u-boot: 1.2.0
NAND: Micron MT29F2G08AADWP (NAND 256MiB 3,3V 8-bi)

SoC (Processor): TMS320 DM355ZCE 270


Flash partitions:
0x00000000-0x001e0000 : "bootloader"  (/dev/mtd0)
0x001e0000-0x00200000 : "params" (/dev/mtd1)
0x00200000-0x00700000 : "kernel"  (/dev/mtd2)
0x00700000-0x00c00000 : "initrd"  (/dev/mtd3)
0x00c00000-0x01000000 : "Persistent" (/dev/mtd4 jffs2)
0x01000000-0x08300000 : "RootFS" (/dev/mtd5 yaffs2)
0x08300000-0x10000000 : "StoragePool_01" /dev/mtd6 yaffs2)

Boot parameters:
bootcmd=setenv bootargs video=dm355fb:output=$(videostd); nand read 0x82000000 0x700000 0x500000;nboot 0x80700000 0 0x200000;bootm
bootargs=mem=123M console=ttyS0,115200n8 initrd=0x82000000,5M ip=off root=/dev/mtdblock5 video=dm355fb:output=pal


Thanks in advance,


Paco

  • Paco,

    i feel that its easier to fix the issue in your current u-boot than to migrate to newer u-boot.

    Can you confirm which is the ECC algorithm that you are using in your current u-boot? Is it 1-bit or 4-bit? If you are unsure, please execute "nand dump 0x200000" and share the logs. The ECC algorithm can be guessed from the logs.

  • Hi Renjith,

    We have two boards, one with with 1 bit ECC (nand MT29F2G08AADWP:D)  and other with 4 bit ECC (nand MT29F2G08ABAEAWP:E.  )

    This dump is from 4 bit ECC:
    # nand dump 0x200000"
    Page 00200000 dump:
        27 05 19 56 c0 df 79 9e  50 89 61 92 00 18 56 7c
        80 00 80 00 80 00 80 00  7e 5a ad 1d 05 02 02 00
        4c 69 6e 75 78 2d 32 2e  36 2e 31 30 5f 6d 76 6c
        34 30 31 00 00 00 00 00  00 00 00 00 00 00 00 00
        00 00 a0 e1 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1
        00 00 a0 e1 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1
        02 00 00 ea 18 28 6f 01  00 00 00 00 7c 56 18 00
        01 70 a0 e1 00 80 a0 e3  00 20 0f e1 03 00 12 e3
        01 00 00 1a 17 00 a0 e3  56 34 12 ef 00 20 0f e1
        c0 20 82 e3 02 f0 21 e1  00 00 00 00 00 00 00 00
        c8 00 8f e2 7e 30 90 e8  01 00 50 e0 0a 00 00 0a
        00 50 85 e0 00 60 86 e0  00 c0 8c e0 00 20 82 e0
        00 30 83 e0 00 d0 8d e0  00 10 96 e5 00 10 81 e0
        04 10 86 e4 0c 00 56 e1  fa ff ff 3a 00 00 a0 e3
        04 00 82 e4 04 00 82 e4  04 00 82 e4 04 00 82 e4
        03 00 52 e1 f9 ff ff 3a  28 00 00 eb 0d 10 a0 e1
        01 28 8d e2 02 00 54 e1  14 00 00 2a 01 05 84 e2
        05 00 50 e1 11 00 00 9a  02 50 a0 e1 05 00 a0 e1
        07 30 a0 e1 07 0b 00 eb  7f 00 80 e2 7f 00 c0 e3
        00 10 85 e0 59 2f 8f e2  50 30 9f e5 03 30 82 e0
        00 3f b2 e8 00 3f a1 e8  00 3f b2 e8 00 3f a1 e8
        03 00 52 e1 f9 ff ff 3a  c8 00 00 eb 00 f0 85 e0
        04 00 a0 e1 07 30 a0 e1  f6 0a 00 eb 57 00 00 ea
        30 01 00 00 7c 56 18 00  b4 da 18 00 00 80 00 80
        00 00 00 00 10 56 18 00  70 56 18 00 b4 ea 18 00
        90 02 00 00 00 00 00 00  00 00 00 00 00 00 00 00
        08 30 a0 e3 4e 00 00 ea  01 39 44 e2 ff 30 c3 e3
        3f 3c c3 e3 03 00 a0 e1  20 89 a0 e1 08 89 a0 e1
        01 92 88 e2 12 10 a0 e3  03 1b 81 e3 01 29 83 e2
        08 00 51 e1 0c 10 81 23  09 00 51 e1 0c 10 c1 23
        04 10 80 e4 01 16 81 e2  02 00 30 e1 f7 ff ff 1a
        1e 10 a0 e3 03 1b 81 e3  2f 2a a0 e1 02 1a 81 e1
        02 01 83 e0 04 10 80 e4  01 16 81 e2 00 10 80 e5
        0e f0 a0 e1 0e c0 a0 e1  e2 ff ff eb 00 00 a0 e3
        9a 0f 07 ee 17 0f 08 ee  10 0f 11 ee 05 0a 80 e3
        30 00 80 e3 12 00 00 eb  00 00 a0 e3 17 0f 08 ee
        0c f0 a0 e1 0e b0 a0 e1  17 0f 07 ee f0 ff ff eb
        17 0f 07 ee ac 00 00 eb  0b f0 a0 e1 0e c0 a0 e1
        d0 ff ff eb 00 00 a0 e3  10 0f 07 ee 10 0f 05 ee
        30 00 a0 e3 02 00 00 eb  00 00 a0 e3 10 0f 05 ee
        0c f0 a0 e1 0d 00 80 e3  00 10 e0 e3 10 3f 02 ee
        10 1f 03 ee 10 0f 01 ee  0e f0 a0 e1 00 00 00 00
        00 80 85 e0 04 10 a0 e1  0d 3e b5 e8 0d 3e a1 e8
        0d 3e b5 e8 0d 3e a1 e8  0d 3e b5 e8 0d 3e a1 e8
        0d 3e b5 e8 0d 3e a1 e8  08 00 55 e1 f5 ff ff 3a
        6a 00 00 eb 51 00 00 eb  00 00 a0 e3 07 10 a0 e1
        04 f0 a0 e1 1c c0 8f e2  10 6f 10 ee 00 10 9c e5
        04 20 9c e5 06 10 21 e0  02 00 11 e1 03 f0 8c 00
        14 c0 8c e2 f8 ff ff ea  00 06 56 41 e0 ff ff ff
        4c 00 00 ea 4b 00 00 ea  0e f0 a0 e1 00 00 00 00
        00 f0 00 00 0e f0 a0 e1  0e f0 a0 e1 0e f0 a0 e1
        00 70 00 41 00 fe f8 ff  44 00 00 ea 43 00 00 ea
        0e f0 a0 e1 00 72 80 41  00 ff ff ff b0 ff ff ea
        35 00 00 ea 0e f0 a0 e1  00 70 00 00 00 f0 00 00
        0e f0 a0 e1 0e f0 a0 e1  0e f0 a0 e1 00 a1 01 44
        e0 ff ff ff a6 ff ff ea  2b 00 00 ea 47 00 00 ea
        10 b1 01 69 f0 ff ff ff  a1 ff ff ea 26 00 00 ea
        42 00 00 ea 00 60 05 69  00 f0 ff ff a8 ff ff ea
        20 00 00 ea 58 00 00 ea  00 00 02 00 00 00 0f 00
        97 ff ff ea 1c 00 00 ea  38 00 00 ea 00 00 05 00
        00 00 0f 00 92 ff ff ea  17 00 00 ea 33 00 00 ea
        00 00 06 00 00 00 0f 00  8d ff ff ea 12 00 00 ea
        2e 00 00 ea 00 00 07 00  00 00 0f 00 88 ff ff ea
        0d 00 00 ea 23 00 00 ea  00 00 00 00 00 00 00 00
        0e f0 a0 e1 0e f0 a0 e1  0e f0 a0 e1 00 00 00 00
        00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
        0c 30 a0 e3 ae ff ff ea  ff ff ff ea 10 0f 11 ee
        0d 00 c0 e3 10 0f 01 ee  00 00 a0 e3 17 0f 07 ee
        17 0f 08 ee 0e f0 a0 e1  30 00 a0 e3 01 00 00 ea
        70 00 a0 e3 ff ff ff ea  10 0f 01 ee 00 00 a0 e3
        10 0f 07 ee 10 0f 05 ee  0e f0 a0 e1 00 00 00 00
        00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
        10 30 a0 e3 96 ff ff ea  00 10 a0 e3 1e 1f 07 ee
        15 1f 07 ee 1f 1f 07 ee  9a 1f 07 ee 0e f0 a0 e1
        01 28 a0 e3 20 b0 a0 e3  30 3f 10 ee 06 00 33 e1
        09 00 00 0a 23 19 a0 e1  07 10 01 e2 01 2b a0 e3
        12 21 a0 e1 01 09 13 e3  a2 20 82 10 23 36 a0 e1
        03 30 03 e2 08 b0 a0 e3  1b b3 a0 e1 3f 10 cf e3
        02 20 81 e0 0b 30 91 e6  02 00 31 e1 fc ff ff 1a
        15 1f 07 ee 16 1f 07 ee  9a 1f 07 ee 0e f0 a0 e1
        00 10 a0 e3 10 0f 07 ee  0e f0 a0 e1 35 1f 09 ee
        1f 1c a0 e3 e0 10 81 e3  5e 1f 07 ee 01 11 91 e2
        fc ff ff 3a 20 10 51 e2  fa ff ff 5a 0e f0 a0 e1
        00 00 a0 e1 00 00 a0 e1  00 00 a0 e1 00 00 a0 e1
        0d c0 a0 e1 00 d8 2d e9  04 b0 4c e2 01 30 40 e2
        a3 05 a0 e1 00 30 e0 e3  a0 00 b0 e1 01 30 83 e2
        fc ff ff 1a 03 00 a0 e1  00 a8 9d e8 0d c0 a0 e1
        00 d8 2d e9 04 b0 4c e2  00 30 91 e5 00 00 50 e3
        02 09 a0 03 00 00 a0 13  03 30 60 e0 09 00 81 e8
        00 a8 9d e8 0d c0 a0 e1  00 d8 2d e9 04 b0 4c e2
        03 01 80 e2 00 a8 9d e8  0d c0 a0 e1 00 d8 2d e9
        04 b0 4c e2 01 01 80 e2  00 a8 9d e8 0d c0 a0 e1
        00 d8 2d e9 04 b0 4c e2  03 01 80 e2 00 a8 9d e8
        0d c0 a0 e1 00 d8 2d e9  04 b0 4c e2 01 01 80 e2
        00 a8 9d e8 0d c0 a0 e1  00 d8 2d e9 04 b0 4c e2
        10 20 9f e5 00 30 a0 e3  01 30 83 e2 02 00 53 e1
        fc ff ff 9a 00 a8 9d e8  ff 01 00 00 0d c0 a0 e1
        00 d8 2d e9 04 b0 4c e2  f1 ff ff eb 04 30 9f e5
        14 30 93 e5 00 a8 9d e8  00 00 c2 01 0d c0 a0 e1
        10 d8 2d e9 04 b0 4c e2  04 d0 4d e2 ff 40 00 e2
        0a 00 54 e3 01 00 00 1a  0d 00 a0 e3 f6 ff ff eb
        08 30 9f e5 00 40 83 e5  10 68 9d e9 ea ff ff ea
        00 00 c2 01 0d c0 a0 e1  10 d8 2d e9 04 b0 4c e2
        04 d0 4d e2 00 40 a0 e1  00 00 d0 e5 00 00 50 e3
        10 a8 9d 09 ff 00 00 e2  e7 ff ff eb 01 00 f4 e5
        00 00 50 e3 fa ff ff 1a  10 a8 9d e9 0d c0 a0 e1
        00 d8 2d e9 04 b0 4c e2  a1 22 a0 e1 00 00 52 e3
        0b 00 00 da 00 30 a0 e3  04 30 80 e4 01 20 42 e2
        03 00 52 e1 04 30 80 e4  04 30 80 e4 04 30 80 e4
        04 30 80 e4 04 30 80 e4  04 30 80 e4 04 30 80 e4
        f2 ff ff ea 10 00 11 e3  00 30 a0 13 04 30 80 14
        04 30 80 14 04 30 80 14  04 30 80 14 08 00 11 e3
        00 30 a0 13 04 30 80 14  04 30 80 14 04 00 11 e3
        00 30 a0 13 04 30 80 14  02 00 11 e3 00 30 a0 13
        01 30 c0 14 01 30 c0 14  01 00 11 e3 00 30 a0 13
        00 30 c0 15 00 a8 9d e8  0d c0 a0 e1 00 d8 2d e9
        04 b0 4c e2 a2 e1 a0 e1  00 00 5e e3 00 c0 a0 e1
        12 00 00 da 01 30 d1 e4  01 30 cc e4 01 e0 4e e2
        00 00 5e e3 01 30 d1 e4  01 30 cc e4 01 30 d1 e4
        01 30 cc e4 01 30 d1 e4  01 30 cc e4 01 30 d1 e4
        01 30 cc e4 01 30 d1 e4  01 30 cc e4 01 30 d1 e4
        01 30 cc e4 01 30 d1 e4  01 30 cc e4 eb ff ff ea
        04 00 12 e3 07 00 00 0a  01 30 d1 e4 01 30 cc e4
        01 30 d1 e4 01 30 cc e4  01 30 d1 e4 01 30 cc e4
        01 30 d1 e4 01 30 cc e4  02 00 12 e3 01 30 d1 14
        01 30 cc 14 01 30 d1 14  01 30 cc 14 01 00 12 e3
        00 30 d1 15 00 30 cc 15  00 a8 9d e8 0d c0 a0 e1
        00 d8 2d e9 04 b0 4c e2  00 a8 9d e8 0d c0 a0 e1
    OOB:
        ff ff ff ff ff ff 24 d6
        00 36 59 45 52 1f 28 42
        ff ff ff ff ff ff 4d da
        2b 98 f1 e5 75 df 6b 37
        ff ff ff ff ff ff d0 2b
        6c f4 72 ac 11 d1 ab 9e
        ff ff ff ff ff ff 58 05
        a0 d2 aa a9 fc 76 dc e4
    DM355 EVM #

    Regarding the ECC algorithm, how can I guess it? Which logs should I look for?
    Thanks
    Paco
  • Hi Paco,

    Are you facing the issue on both the boards? Or just 1-bit alone? 

    Francisco Javier Cabello Torres said:
    OOB:
        ff ff ff ff ff ff 24 d6
        00 36 59 45 52 1f 28 42

    In the OOB area, you can see that this algorithm uses 10 bytes of ECC for every 512 bytes. These many ECC bytes will be generated by 4-bit ECC algorithm. If it is 1-bit it will generate only 3 bytes of ECC will be generated for every 512 bytes.

  • Hi,

    Let me apologize. I think both NANDs use 4 bit ECC. Forgot about 1-bit ECC.

    Thanks,

    Paco

  • Paco,

    I didn't understand. Does this mean that your problem is resolved?

  • Hi Renjith,

    No, the problem is not resolved. I just told you that both NAND use 4 bit ECC.

    How should I proceed now?

    Thanks,

    Paco

  • Paco,

    Can you enable more debug logs in the u-boot NAND driver? See whether the NAND driver is calling correct() function if found ECC errors are found. If ECC errors are found generally it should print during the read function itself. I'm not sure about your particular u-boot revision though.

  • Hi Renjith,

    The problem is that I can't see any error traces while u-boot is reading kernel image, but I get 'Bad CRC Data'.

    If I download the same kernel image to memory the unit boots. Once from Linux console, I am able to read kernel image from NAND and the content is correct (I check md5).


    Do you want me to enable ECC error traces? I can't see any error during boot process, just the 'Bad Data CRC'.


    Thanks


    Paco

  • Paco,

    Can you check whether you are reading the complete size of the kernel? Use "nand read" and "bootm" commands to verify this. Why I'm asking is even if the complete image is not read, this can happen. Since you are not seeing any error logs, I suspect this could be one of the reasons for CRC failure. 

    If this is verified, as a next step its better you enable ECC error traces.

  • Hi Renjith,

    I think the whole kernel image is loaded to RAM because I have checked the size of the kernel and the bytes read from NAND.

    I will enable ECC error traces but taking into account this post, it seems that u-boot 1.2.0 has a wrong implementation of ECC. That's the reason of my original email, looking for migration strategy to a newer u-boot.

    Thanks,

    Paco

  • Paco,

    Then you can try one more thing. Erase and write one page of the NAND with some valid data, say 0xaa55aa55 like that. And you modify one or two bytes of data in the whole page to 0. Then without erasing just do a nand write once more. This will cause bit flips in the page and will corrupt ECC as well. Now you try to read back the same page using "nand read" command. See whether it reports ECC error now. If it doesn't report error now, then as you said the current ECC implementation is a disaster.

  • Hello all,

    I would like to know if someone knows how to reproduce this problem, because we havethe same problem, but we don't know how reproduce it. we trhink that it is already fixed but we need to test it.

    Sincerely regards

  • Hi,

    I have enabled some traces from file arm926ejs/davinci/nand.c and it seems that there are errors in the following addresses:

    Reading BBT:

    nand.c: VT: Fixing error at address 000001ba
    nand.c: VT: Fixing error at address 000000b4
    nand.c: VT: Fixing error at address 000000ac
    nand.c: VT: Fixing error at address 0000004d
    nand.c: VT: Fixing error at address 000000c0

    Reading initrd:
    nand.c: VT: Fixing error at address 00000148
    nand.c: VT: Fixing error at address 00000016

    Reading kernel:
    nand.c: VT: Fixing error at address 00000056
    nand.c: VT: Fixing error at address 000001ba

    I have enabled traces in function nand_davinci_4bit_calculate_ecc

    There seem to be a lot of errors.

    Looking at the code I realise that it only fixes error if address is lower than 512 but my NAND has 2048 bytes pages. Does it mean that it won't fix errors in the whole nand page?

    Thanks,

    Paco

  • Paco,

    I haven't seen the code that your are referring to. But in general it will read in 512 bytes subpages and calculates and correct the ecc for 512 bytes and proceed with the next 512 bytes. But if you are seeing more errors, then it could be possible that there could be ECC algorithm mismatch between read and write. 

  • Renjith,

    I checked the code and it was reading 512 bytes each time, so it is correcting 2048 bytes. 

    Thanks,

    Paco

  • Paco,

    I lost track of the thread as its being long time. So, does it mean that the issue is resolved or still pending?

  • Renjith,

    I was able to fix errors using a modified version of u-boot 1.3.5. I didn't modified kernel 2.6.10 but I feel that ECC is working properly at kernel level.

    I have a question for you. U-boot is able to fix some errors while reading from NAND. Should we rewrite faulty blocks in order to clean errors? If I don't fix the errors, it's possible that a new error appears and the limit of error that ECC could fix will be reach. Let me know which is the correct way to proceed.

    Thanks!

    Paco

  • Paco,

    Glad that you are able to fix the issues. What you've asked for is already taken care by the flash file system. This is known as bad block management and wear leveling. Depending on the errors and the frequency of errors, the data will be moved to a different page and the original page will be marked as bad. This happens over a period of time, and is taken care to a good extent by FFS (JFFS, UBI, YAFFS etc.) However in u-boot this feature is not available, but never mind for file system its taken care. 

  • Hi Paco,

    What U-boot changes did you have to make to resolve this issue?  I've run into the same issue with the u-boot in the v5.1 drop from TI.   Did you end up changing the # of bits for ECC?

    Regards,

    Edwin Bland

  • I'm also having the same issues but using a new u-boot.  Any other tips for how to fix this error? 


    I'm getting:

    Read error Verifying Checksum ... Bad Data CRC ERROR: can't get kernel image

    This seems to randomly occur on systems that have been booting fine for months, and on systems that have only been booted a handful of times, and everything in between.

    We're using a DM368, with UBL version 1.65 from the PSP 03.21.00.04 release.  U- boot version 2013.04,  kernel 2.6.32.17 with some patches to update it. 

    Updated to the newest UBL and Uboot a while back to try to eliminate these issues, and the kernel version stayed as is due to inplace changes and requirements to keep using the DVSDK 4.02 for existing code base.