This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS1114: FEE Data lost using Ti Fee 01.19.03

Part Number: TMS570LS1114

We ran into an issue where data was lost using Ti Fee 01.17.02.

We then checked if the latest version of Ti Fee 01.19.03 fixes the issue.

After a review of the code we noticed the following problems in the implementation.

  1. When the size of a block is changed - for instance in the course of a software update the block size is not detected correctly.

    As an example we had a block that was 40 bytes in an old software version and the size was increased for a newer software release to 48 bytes. During TI_FeeInternal_UpdateBlockOffsetArray the size from the configuration is used (48bytes) while the block in flash is actually smaller (40 bytes). This causes the next block offset to be miscalculated which in turn causes the blocks size to be misinterpreted. This means that it is "random" where the block after that is searched - possibly skipping blocks. The data of these blocks is still in flash but will not be interpreted then.
  2. The second problem we saw is related to a power cut during writing of block.

    When the block is in state Start Program Block (0xFFFFFFFFFFFF0000) then the offset of the next block seems to be incorrectly calculated. I have attached a FEE dump.

    At offset 0x8B0 you can find the block (block id 0x607) that is in programming state. The blocks size (0x38) was already written - however at the correct offset (0x900) of the next block ther is no new block. It seems that the next block starts at 0x8D0 - which is 32 bytes after the start of the block in programming state. This does not seem to be correct behaviour - since it cannot be known if there was already data writting at this address for the block starting at 0x8B0 - so the should probably have been skipped.

    /cfs-file/__key/communityserver-discussions-components-files/312/2555.iccpd4_5F00_1115.zip
  • Hello Matthias,

    I have forwarded your post to our software team so they can address the points you have made regarding the FEE operation.
  • When the size of a block is changed - for instance in the course of a software update the block size is not detected correctly.

    As an example we had a block that was 40 bytes in an old software version and the size was increased for a newer software release to 48 bytes. During TI_FeeInternal_UpdateBlockOffsetArray the size from the configuration is used (48bytes) while the block in flash is actually smaller (40 bytes). This causes the next block offset to be miscalculated which in turn causes the blocks size to be misinterpreted. This means that it is "random" where the block after that is searched - possibly skipping blocks. The data of these blocks is still in flash but will not be interpreted then.

    If I understand the Block size is changed in the FEE config file? I am not sure whether FEE driver supports changing the Block Size. I need to double check.

    The second problem we saw is related to a power cut during writing of block.

    When the block is in state Start Program Block (0xFFFFFFFFFFFF0000) then the offset of the next block seems to be incorrectly calculated. I have attached a FEE dump.

    At offset 0x8B0 you can find the block (block id 0x607) that is in programming state. The blocks size (0x38) was already written - however at the correct offset (0x900) of the next block ther is no new block. It seems that the next block starts at 0x8D0 - which is 32 bytes after the start of the block in programming state. This does not seem to be correct behaviour - since it cannot be known if there was already data writting at this address for the block starting at 0x8B0 - so the should probably have been skipped.

    /cfs-file/__key/communityserver-discussions-components-files/312/6165.iccpd4_5F00_1115.zip

    We need ti_fee_cfg.c / ti_fee_cfg.h file information to decode this binary. could you please send it to us. If you cannot post it here, pls send it to me as private message.  

  • Hello Prathap,

    I have sent you a PM with the configuration attached.

    Best regards, Matthias
  • We have identified a possible problem with TI_FeeInternal_PollFlashStatus. In the current implementation the function is called in many more places then in the old release. The function has a loop with a upper limit of 0xFFFF0000U.

    In the worst case this might not return in a timely manner.

    Can you give us some more information on why this polling is necessary - shouldn't this be handled by the Fee state machine so that no polling would be necessary? Were there issues regarding the flashing process which made the new calls to TI_FeeInternal_PollFlashStatus necessary?
  • Hi Matthias,

    During any active Flash Module Operation ( Active commands like Erase, program, verify..etc) on a particular Bank ( here it is Bank 7), any access to that Bank read or write will eventually stall your system.. Stall here we mean CPU is halted, Interrupts are frozen...
    So we must avoid getting in to this situation, because this is more critical that regular polling delay.

    TI_FeeInternal_PollFlashStatus is called just to avoid getting in to this situation. In the older version of driver we missed few places which was allowing a small window that might fall in to above scenario. With latest version 1.19.03 we addressed this gap..

    I do not see it as problem, because we never hit upper limit unless there is a real problem with the Flash Hardware. Usually for these type of issues are monitored through windowed Watchdog or RTOS task priorities.
  • Coming Back to original issue 

    Problem 1: FEE Block Data Length Change....

    Changing the Block Length once the Block is written to the Flash Bank is not recommended and FEE driver naturally does not support it ( There is workaround, but it is quite cumbersome.. I will come to that later.. if you really need).

    As you pointed out already, during FEE Init, while parsing for the valid blocks, the Block offset address information is manipulated from the config file ( 48 bytes - new) . If the FEE sector data written already is with old configuration ( 40 bytes), naturally there will be skip of next block header, eventually skipping that block.

    Workaround 1:

    Now: How to change the Block Size ( TI strongly recommend not to, fix all sizes upfront, once written should not change)

    1)  With the old software ( FEE configuration) make sure you read the particular Block ( Say Block 10)  for which data length has to be changed.

    2)  With the old software Invalid the Block 10.

    3)  Initiate a Virtual Sector Copy, so that only Valid Blocks move to the other Virtual Sector, and set that Virtual sector as Active.  

        How to initiate a VIrtual sector copy? Is there an separate API?

        a ) No separate API, but Keep writing any Blocks other than 10 so that it comes to the end of current active Virtual sector, further write will initiate a Copy operation.

    4) The new Active Virtual Sector will have other Blocks except Block 10.

    5) Update your Software, with new FEE config ( Block 10 data length changed),

    6) Perform FEE Init, followed by write to Block 10.

    7) Everything will be smooth from here...

    Again TI recommends to freeze the Block Configuration upfront, do not change it during the course of the product. Adding new blocks is easier, but changing block configuration is difficult following above steps...  

    Workaround 2

    1) Take backup of all Block data, store it in Flash

    2) Erase / Format FEE

    3) Put the new software initialize and restore the Block data...

     

    Problem 2: Power cut during writing of the Block..

    As I mentioned already the block size is interpreted from configuration file.

    Question is that the FEE config file used while writing the Block and the current FEE init operation carried are same. If the answer is different config file then this type of issue can occur, if same config file, but the issue is explained as above then we must dig deep.

    Can you confirm this please? 

  • Problem 1 - block sizes)

    Why use the configuration size when there is a different information in Flash - we prepared a fix for the issue by always using the size from the flash. This was also previously the case for unconfigured blocks and should not create an issue. The only question is what we should do when we read a block and the size that should be read is different from the size that is currently configured? Currently we do not get the information if all data (48bytes) were read or only part of the data (40bytes) after the update. This means that we cannot react accordingly in the upper software layers. Most of the blocks defined are of static size - the problem is very confined to a small number of blocks carrying data structures that might change size between software versions.

    Problem 2 - reset during flash operation)

    The problem here is that the flash driver does not finish a block write - in some cases not even an 8 byte write inside the header. Currently there is no information if the header is valid - there is only one state "Start Program Block" which is left when both - header and data - is written. This means that we cannot use the size from the header in this situation - actually we cannot even use the block id since it might be wrong. In my opinion there needs to be another state - something like "Header Written" in order to make sure that the data in the header is valid and usable. Our workaround for this would be to copy all data that was found up to this point to a new sector and erase the corrupted one.

    Problem 3 - TI_FeeInternal_PollFlashStatus)

    Fortunately we are using a watchdog - but still - this behaviour is less than desirable. Wouldn't it be possible to just poll the flash status a small amount of times when necessary and leave TI_Fee_MainFunction? The TiFee state machine should handle continuing the current job on the next call to TiFeeMainFunction. Our workaround for this issue is to seperate TiFee to an own task.

    Problem 4 - No error handling during initialization

    There are some situations during initialization which are not handled as errors but the flash driver tries to continue normal operation. When no more blocks are found in a flash sector (Empty Header found) - it is checked if there is actually more data later in the sector and parsing is continued there. This cannot be an expected situation and some error handling should occur. It might be possible to parse on at the found address but after the driver is done parsing I think the information should be copied to a new sector and the corrupted sector should be erased. The same is true if a corrupt block (all zeros header) is found. This cannot be expected behaviour.

  • Matthias Weyh said:

    Problem 1 - block sizes)

    Why use the configuration size when there is a different information in Flash - we prepared a fix for the issue by always using the size from the flash. This was also previously the case for unconfigured blocks and should not create an issue. The only question is what we should do when we read a block and the size that should be read is different from the size that is currently configured? Currently we do not get the information if all data (48bytes) were read or only part of the data (40bytes) after the update. This means that we cannot react accordingly in the upper software layers. Most of the blocks defined are of static size - the problem is very confined to a small number of blocks carrying data structures that might change size between software versions.

    - Hercules FEE driver is designed in this way reading the Size info. Ofcourse there are pros and cons. I do not know the history behind this implementation, FEE reads size from EEP Block Header only If some blocks were previously written but deleted from configuration. Again there is a different use case for this when using the Boot loader ( different Config ) and Application ( Different config). There is no support for same Block with varying Block length from version to version of application software.

    - TI_Fee_Read(uint16 BlockNumber,uint16 BlockOffset,uint8* DataBufferPtr,uint16 Length)

    In the Read API there is Block offset and length parameter, that can be used to read subset of data from the Block.

    - Today TI FEE supported only Static Block Size configuration. I do not think we will support Dynamic/Variable Block Size feature anytime sooner.. Since it was never a requirement.

    Matthias Weyh said:

    Problem 2 - reset during flash operation)

    The problem here is that the flash driver does not finish a block write - in some cases not even an 8 byte write inside the header. Currently there is no information if the header is valid - there is only one state "Start Program Block" which is left when both - header and data - is written. This means that we cannot use the size from the header in this situation - actually we cannot even use the block id since it might be wrong. In my opinion there needs to be another state - something like "Header Written" in order to make sure that the data in the header is valid and usable. Our workaround for this would be to copy all data that was found up to this point to a new sector and erase the corrupted one.

    - If you look at 1.19.03 version we have removed the check for "Start Program Block" during FEE Init. So we are not using the size of this incomplete Block.

    Matthias Weyh said:

    Problem 3 - TI_FeeInternal_PollFlashStatus)

    Fortunately we are using a watchdog - but still - this behaviour is less than desirable. Wouldn't it be possible to just poll the flash status a small amount of times when necessary and leave TI_Fee_MainFunction? The TiFee state machine should handle continuing the current job on the next call to TiFeeMainFunction. Our workaround for this issue is to seperate TiFee to an own task.

    - I understand the challenge, your workaround is good. As as I mentioned earlier, with our implementation we avoid more critical cases that is Bank access ( further FEE operations) during a active Flash Operation, which will stall the system. 

    - Except Erase all other Command takes very less time. TI_FEE_Mainfunction is the one that ususally performs Erase, we recommend this particular function must be called in a low priority task / Idle task.. 

    Matthias Weyh said:

    Problem 4 - No error handling during initialization

    There are some situations during initialization which are not handled as errors but the flash driver tries to continue normal operation. When no more blocks are found in a flash sector (Empty Header found) - it is checked if there is actually more data later in the sector and parsing is continued there. This cannot be an expected situation and some error handling should occur. It might be possible to parse on at the found address but after the driver is done parsing I think the information should be copied to a new sector and the corrupted sector should be erased. The same is true if a corrupt block (all zeros header) is found. This cannot be expected behaviour.

    - I am not sure you are using the #define TI_FEE_USEPARTIALERASEDSECTOR..
    --- By default  TI_FEE_USEPARTIALERASEDSECTOR is STD_OFF, what this means is that if any partially erased bits identified after the Sector Erase operation, Fee will issue an other Erase command until the Sector is erased successfully with no partially erased bit. 
    --- If TI_FEE_USEPARTIALERASEDSECTOR is STD_ON, even though there is a failure in the Erase operation, the sector will be allowed to be used. Concept is if one or few bits are faulty, why waste the entire sector? The FEE driver will skip these location and continue writing Blocks. 
    - Where there will be patch of 0xFFFFFFFF in the Active virtual sector is when there a partially erased bit meaning while performing write operation as initial step to ensure empty space, Blank check is carried out and it failed. SO the driver will skip and start writing to it. So thats the reason we do not stop at 0xFFFFFFFF, we continue until end of Active virtual sector to fins any other Valid Blocks are available. This does not take much time since we do Blank check, not a read compare to check 0xFFFFFFFF..