This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SLC Nand files are nullified if powered down after writing

Other Parts Discussed in Thread: AM3505

Hi

Simple to reproduce it on my board, running a file update (from a PC) during which a file is being uploaded and rewritten to the flash (5K filesize), 0.5-2 seconds after the operation I cut down the power. Then, the system boots up with the file being 0 bytes. 

Suspect it can be an "UBIFS unstable bits" issue read about here: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits

Have an AM3505 with Micron Nand MT29F8G08ADBDAH4 worked generally fine with BCH4 support

Using SDK 04.02.00.07.

I reckon ubifs should be very stable and protected from these kind of problems.

Wondering if kernel 2.6.37 has been updated with related stability patches.

A really serious customers/production facing issue for my company.

Please advise,

Kind regards

Yakir

  • Since you are using UBIFS, I can assume that this happens under Linux? Please let me know how you are uploading the file.

    How is the 0.5-2 seconds delay related to the problem?

    Does this happen with any file?

    I'm not familiar with the AM35x SDK, but if you are using both U-Boot and Linux to manipulate the contents of the NAND flash, make sure that you are using the X-Loader, U-Boot and Linux from the same SDK. The ECC scheme should be the same.

    Best regards,
    Miroslav

  • #include <unistd.h>

    // update:

    fopen(...);

    fwrite(...);

    fclose(...);

    sync(); // write file system metadata into flash

    // done...

  • Yes, I am under Linux (2.6.37 TI's). To put it simple, just copying the file from a ram partition to the flash.

    The delay is just a figure of speech, I mean that when I power down after a short period after writing a file, the problem shows up. Therefore, I suspect the power down is related to the write operation.

    Yes, it did happen with other files in similar scenarios.

    The AM35x SDK is using X-loader which is 1-bit Hamming and U-Boot uses BCH4 like the Linux does. Again, the UBIFS is Linux only. I haven't done any experiments in u-boot, I am less concerned about such problems there...

    In regards to the experiments I've done with brown outs, I doubt that the writing to flash by UBIFS is atomic....

    Regards

    Yakir

  • Wolfgang,

    Thanks for the proposal. Unfortunately, I am using OS commands like "cp" or "mv" which go down to the file system to perform the necessary ops. Therefore, if the FS is not robust enough, it should be patched with an improved procedure I guess.

    Regards

  • If you are operating with OS commands, there is a command named "sync".

    The behaviour of the file system is intentional. It's a balance between caching, performance and safety.

    If you want your data to be on disk immediately after writing, you can

    a) use the "sync" command.

    b) mount the file system with the option "sync".

    Note, that option b) incures a performance penalty (and is not used in practice).

    regards

    Wolfgang

  • Wolfgang,


    The ubifs should be robust against brown outs from the documentation I saw. Your comment might be valid though the mtd doc is telling the following,

    ...

    The solution is to teach UBIFS to erase-cycle any LEB which could potentially be written to when the power cut happened. This is not only about the journal LEBs, but also LPT, log, master and orphan LEBs. This means that the valid data from this LEB has to be read (and only once!) and then it should be written back to this LEB using the atomic LEB change UBI operation. This has to be done even if the LEB looks all-right - no corruptions, all 0xFFs at the end.

    ....

    After that they provide a git source of the ubifs that contains the fix. Since I am using a relatively old version of linux kernel, here I am after the right advise of what patch I should apply to the /fs section.

    Again, as far as I understand - it is not about "what I can do with it" but  about "how the kernel should do it".

    Regards

    Yakir

    PS. I have addressed the mtd doc in the initial post.

  • Yakir,

    I have doubt that it is the MTD issue with incomplete writes that you have.

    If a write is interrupted for a power down, only 512 oder 2048 Bytes of data will be invalid, nothing more. And the write time per sector is only a very short period, so it is unlikely to trigger this.

    Have you checked to add a sync command after your cp/move? Does the behaviour change?

    regards

    Wolfgang

  • Thanks a lot Wolfgang


    I've tried your bits (mount -o sync) and using sync commands directly in partitions where I don't want to mount like that and so far I've failed to reproduce the bug. It is looking good. My appreciation.

    Regards

    Yakir