This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3354: NAND loses data after reboot

Part Number: AM3354

Tool/software: Linux

Hi

   we've been using AM3354 for a while on many different products.The processor SDK we use is 03.02.00.05.The linux kernel version is 4.4

   although this happens quite rarely.But once in a while we would run into a board with this problem (happened 3 times so far)

   1.use the board for file operation or something else. Nothing really special .Copy or read file or something.

   2.type "reboot" command in the linux terminal console to issue a reboot. also very common operation here.

   3.then the am3354 couldn't boot up.

      After dumping the entire nand flash onto a mmc card.I found that the entire nand flash is all "0xFF".Like as if it's been chip erased.

     but indeed there's no way we could have erased it by accident.Cause all i did is just reboot.

     besides. even if i really wanted to erase the entire SLC. it will take about a minute to erase the entire area. Cause nand erase is quite slow.

    the model of the nand flash is Spansion S34ML08G101TF100 

    Also this problem shouldn't relate to a specific processor SDK.Because we have product running on earlier SDK with the 3.2.0 kernel.And found this problem never the less.

   Any ideas as to why this kind of thing could happen?

thank you people.

yan dong

  

  • We have never seen this happen. Can you capture the boot messages and the reboot messages on a failing board? I know this might take a while but maybe I might see something in there. I am getting with our hardware folks to see if they could think of a reason.

    Steve K.
  • Thank you Steve.

        we were logged in using telnet. So when my colleague issued the reboot command.the telnet session is just immediately shut down. we were not able to see any reboot messages at that time.

        As to the boot messages.There's just this "CCCCCC" being printed through the serial console after reboot.because the nand data is lost and am3354 couldn't start from nand boot mode anymore.

       Please keep me informed once you have any updates.Thank you!!

    yandong

       

  • hi Steve: 

        Any updates?

  • I cannot re-create this on a TI EVM. Are you able to get an EVM and try it out?

    Also for your custom board can you verify that you have the NAND timings correct in the GPMC registers?

    Steve K.
  • Hi Steve

    i don't have a EVM at hand. I suppose that the EVM should be ok.It's something wrong with our own custom board.

    it's quite hard to re-create the problem as well.So far i haven't really find the key to re-create the problem at will.

    the recorded history to re-create the problem was:

    1.copy files using FTP with an ethernet connection to the board.

    2.reboot the board using the TELNET connection

    So far i tried to keep rebooting the board once it's started up.But after hundreds of times of rebooting.I wasn't able to re-create it simply by rebooting.

    So i would at least suspect that it's not something with the reboot itself.

    The problem could be related to files writing using FTP.Or some ESD issue maybe.I don't know.

    i will check the GPMC timing..

  • our nand timing setting are the default parameter from the kernel.

    on our 3.2 kernel.it's like :

    static struct gpmc_timings am335x_nand_timings = {
    .sync_clk = 0,

    .cs_on = 0,
    .cs_rd_off = 44,
    .cs_wr_off = 44,

    .adv_on = 6,
    .adv_rd_off = 34,
    .adv_wr_off = 44,
    .we_off = 40,
    .oe_off = 54,

    .access = 64,
    .rd_cycle = 82,
    .wr_cycle = 82,

    .wr_access = 40,
    .wr_data_mux_bus = 0,
    };

     

    on 4.4 kernel. it's like: (extracted from am335x-evm.dts)

    nand-bus-width = <8>;
    gpmc,device-width = <1>;
    gpmc,sync-clk-ps = <0>;
    gpmc,cs-on-ns = <0>;
    gpmc,cs-rd-off-ns = <44>;
    gpmc,cs-wr-off-ns = <44>;
    gpmc,adv-on-ns = <6>;
    gpmc,adv-rd-off-ns = <34>;
    gpmc,adv-wr-off-ns = <44>;
    gpmc,we-on-ns = <0>;
    gpmc,we-off-ns = <40>;
    gpmc,oe-on-ns = <0>;
    gpmc,oe-off-ns = <54>;
    gpmc,access-ns = <64>;
    gpmc,rd-cycle-ns = <82>;
    gpmc,wr-cycle-ns = <82>;
    gpmc,wait-on-read = "true";
    gpmc,wait-on-write = "true";
    gpmc,bus-turnaround-ns = <0>;
    gpmc,cycle2cycle-delay-ns = <0>;
    gpmc,clk-activation-ns = <0>;
    gpmc,wait-monitoring-ns = <0>;
    gpmc,wr-access-ns = <40>;
    gpmc,wr-data-mux-bus-ns = <0>;

     


    as you can see. 3.2 and 4.4 kernel are actually using the very same nand timing (the default timing)

    i'm not sure in most cases.are these timing parameters need to be changed for each and every different nand flash chip?

    i do have our spansion flash spec. 

    3034.S34ML08G1.pdf

    but i didn't find any timing detail in it.

    don't know how to change the timing according to the spec as well....

  • You usually have to change timings when you change NAND flash. I suggest you get more details from your NAND vendor.

    Steve K.
  • thanks for the tip Steve

    i suspect if it's timing that caused this.

    if it's a wrong timing issue.there would have been a constant problem when using the flash i think.

    like whenever i try to write/read it .there's a certain possibility that it would fail.

    but right now it's working completely ok for daily based use.

    except for the data totally loss part.

    what really puzzles me is that even if the timing is wrong.

    it's not quite possible that a wrong timing can accidentally perform the necessary nand command sequence to flash the entire chip? this seems quite impossible.

    this problem is really out of my scope of knowledge now...

    yandong