This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430F5437: MSP430F5437 / CC2520 - OSAL - NV issue

Part Number: MSP430F5437
Other Parts Discussed in Thread: Z-STACK, CC2538

Are there any known issues relating to OSAL NV element write reliability? I am seeing an issue with data corruption (not just random bit flips) but it looks like the data ended up in the wrong physical location in flash. It looks like in osal_nv_write (osal_nv.c) that if the data for an element is changed it moves the location in physical memory with an update with the new value. It looks like a simple wear leveling algorithm. I suspect an issue there (possibly if there is a power cut in the middle of that function) . 

This is a problem not easily reproduced (only one time in the lab). There are a number of failures in the field, but it is not reproducible very often. Still working on finding the right procedure to make the problem happen repeatedly in the lab.

The revision date for the OSAL_Nv.c is 2010-08-19, revision : 23457 from the file. Where can I find the source tree for this so I can see what has changed since the version we have?

  • An interesting pattern of corruption was found. In the data read back from an NV item, you can see OSAL NV item header data structures of smaller items within the payload area of the larger items. I was even able to decode it and even the data checksums matched up.



    Raw Data:

    1A0F1000490AFFFF

    D164F4EBBACB82B7

    DA04ADD2DEA60096

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    1A0F                                    1000                     490A                    FFFF

    element (TEK1)                 16 bytes               0x0a49 cksum    erased

    Data:

    D164F4EBBACB82B7

    DA04ADD2DEA60096

     

     

     

    Raw Data:

    880F10000000FFFF

    0000000000000000

    0000000000000000

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    880F                                    1000                     0000                     FFFF

    element (TEK111)             16 bytes               0x0000 cksum    erased

    Data:

    0000000000000000

    0000000000000000

     

     

     

    Raw Data:

    890F10000000FFFF

    0000000000000000

    0000000000000000

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    890F                                    1000                     0000                     FFFF

    element (TEK112)             16 bytes               0x0000 cksum    erased

    Data:

    0000000000000000

    0000000000000000

     

     

     

    Raw Data:

    000F10000000FFFF

    0000000000000000

    0000000000000000

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    000F                                    1000                     0000                     FFFF

    ??                                         16                          0x0000 cksum    erased

    Payload Data:

    0000000000000000

    0000000000000000

     

     

     

    Raw Data:

    010F10000000FFFF

    0000000000000000

    0000000000000000

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    010F                                    1000                     0000                     FFFF

    transceiver params          16 bytes               0x0000                 erased

    Payload Data:

    0000000000000000

    0000000000000000

     

     

     

     

    Raw Data:

    100F10005D09FFFF

    65F8DF4CE552DE3C

    id (LE)                                  len (LE)  CKSUM (LE)         STATUS(LE)

    100F                                    1000                     5D09                     FFFF

    GEK1                                    16 bytes               0x095D                erased

    Payload Data (incomplete due to it surpassing the length of the NVM block):

    65F8DF4CE552DE3C

  • Hi Adam,

    What Z-Stack release are you using? This hardware platform was deprecated so there have been no relevant changes to osal_nv.c since the revision you are using.

    As you mentioned there could definitely be issues if you lose power during an NV operation. For us to debug this further though we will need more information about where the point of failure is. Can you explain a little more about what you mean by data corruption? Is it only happening like what you show in your second post or is there more?
  • Thank you for the response.

    The data above is the corruption we see in the larger 64 byte items. We also have smaller 16 byte items in NV and it looks like the 64 byte item was put on top of a few of the smaller 16 byte items. We can even see the NV item header of the smaller items in the data area of the larger item. In the smaller items we have observed failures where they end up with all 0's in the data. I think that the smaller items just don't span large enough flash memory region to see other items within them. So I *think* these are all basically the same type of corruption of data.

    we are currently struggling reproducing this in the lab to pinpoint the exact timing of the failure. The problem with the units from the field is that we can't connect JTAG to these to completely dump the NV flash area. We just have some of the item headers that ended up within the 64 byte NV items data area as a starting point.

     We are still trying to reproduce this further. Is there a particular weak point in the OSAL NV code where a power cut could cause similar corruption? I was thinking that if a page erase was interrupted after the page header was updated causing the payload area to be not erased, but the header indicates the page is free to use. Any other suggestions would be helpful.

  • Hi Adam,
    Did you check your Vcc?
    With my CC2538, Vcc can be from 1.9V to 3.3V for normal operation, BUT for flash writes, it must be greater than 2.4V (Vcc-min + 35%). The Z-Stack firmware does not allow flash writes at Vcc lower than this.
  • We've been looking into power cuts and low voltage without luck in reproducing the issue in the lab. There is a check in the osal write function that should cause old data to be there instead of the pattern of corruption we see as noted above.

    Are there any known issues or weak points?

    I noticed in a later version of Z-Stack there is a more recent version of the OSAL_NV.c file. That file has a number of whitespace changes, some vars renamed, and a couple of other things.

    The only thing I noticed that *might* be something interesting is that a new parameter was added to the osalNvHdr_t struct . In examining the change myself, I didn't see a meaningful change in operation, but I might be missing something. Was that parameter added in to resolve a specific issue?