This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: FLASH ECC Fault Detection

Part Number: TMS570LC4357
Other Parts Discussed in Thread: TMS570LS3137

Back with the FLASH ECC issue.

We have just discovered a weird behaviour when injecting faults in the FLASH ECC (not the FLASH itself, but in the ECC data) through the Linker option --ecc:ecc_error.

My expectation was that the ESM would notify if 1 error detected and corrected or 2 errors (or more) detected. But the behaviour is completely dependant on the error mask.

We have tried these masks (hex values), and obtained the following results:

01 -> No Flag (assumed corrected, no way to detect it)
02 -> No Flag (assumed corrected, no way to detect it)
03 -> Flagged
07 -> Flagged
0a -> Flagged
0f -> Flagged
1a -> No Flag
2a -> No Flag
30 -> DAbort
aa -> DAbort
af -> Flagged
ea -> DAbort
f1 -> No Flag
fa -> DAbort
ff -> Flagged

"Flagged" means ESM Group 2, channel 3 flag becomes "1".

I may assume that ESM group 2, channel 3 only gets flagged when more than 1 bit is wrong. But I don't understand when the DABORT is trigered, and even less why a "03" mask and a "30" mask produce different behaviour, when both are introducing 2 wrong bits.

Does anybody know the answer or is able to point us to the right documentation to explain how the ECC module works? I've seen in the manual "The Cortex R5F CPU may generate speculative fetches to any location within the Flash memory space.A speculative fetch to a location with invalid ECC, which is subsequently not used, will not create an abort, but will set the ESM flags for a correctable or uncorrectable error", but we are injecting the error always in the same place, which gets executed, so that does not sound like a reasonable explanation.

Thank you in advance.

  • Hi Txema,

    I would like to verify your code.

    Even you can send private message to me by zip your complete project, so that i can easily debug your issue?

    --

    Thanks & regards,
    Jagadish. 

  • Hi Jagadish.

    Obviously I cannot send you my full code. Can you tell me what specific you want to check?

    Fault injection is done through Code Composer Linker options.

    Then the code just looks for the status of the ESM registers to flag any occurrence so it can be reported through telemetry.

    Bad ECC injection is done in a function called "BSP_1Hz", so the fault is detected every second.

    Do you have any idea on why different masks produce so different results? Why flipping bits 0 and 1 (mask 0x03) gets flagged in the ESM module as Group 2 Channel 3 fault (Flash ECC), but flipping bits 4 and 5 (mask 0x30) produces a DABORT? That behaviour is completely unrelated to my code.

    Thank you.

  • Hi Txema,

    Apologies for the delay,

    I need some time to create the project based on the issue you are mentioning, once i create the project i will debug and update you.

    --

    Thanks & regards,
    Jagadish.

  • Hi Jagadish.

    That sounds great, thank you.

    If it helps, that's how I'd reproduce it:

    1) Write some code that calls a function every so often (say 1 second)

    2) Check in the .map file the address of that function

    3) Configure the linker to inject an ECC error at that address, with different masks (as described in my message)

    4) Set a breakpoint in the dabort() function, so you know if it gets called

    5) Set a breakpoint in the 1s function, so you can trigger the fault stepping execution

    6) Read the status of the ESM module through the debugger

    Thank you.

  • Hi Txema,

    please refer to the Syndrome Table on page 344 of device TRM.

  • Hello Wang.

    Yes, I saw that table, but I don't know how to interpret it:

    1."The bit in error can be a bit among the 64 data bits or a bit among the 8 ECC check bits" -> What is that 64 Bit data? And where in the table are the 8 ECC check bits?

    2. Table 7-3 relates the 8 Bit Syndrome into 4 actions, E0x, Dxx, D and M. The first two are correctable, while D and M are not

    3. I don't know the relationship between the "mask" used in fault injection and this table

    4. Neither do I know why sometimes the ECC flags the error, sometimes it does not and sometimes DABORT gets called

    Could you help me understand these questions so I can close the cause-effect loop?

    Thank you so much.

  • 1."The bit in error can be a bit among the 64 data bits or a bit among the 8 ECC check bits" -> What is that 64 Bit data? And where in the table are the 8 ECC check bits?

    If ECC[0] is flipped, the value of CELL[0,1] of Syndrome table =E00 --> single-bit ECC error. The error is detected and corrected when reading

    If ECC flip bits are 0x1A, the value of ECC[1, A] of syndrome table = D22 --> single-bit data error, its is corrected when reading

    2. Table 7-3 relates the 8 Bit Syndrome into 4 actions, E0x, Dxx, D and M. The first two are correctable, while D and M are not

    The Table 7-3 in TRM is not correct (for little-endian). Please use the table in TMS570LS3137 TRM. You are correct: D is not correctable, but detectable. M means not detectable and not correctable.

    3. I don't know the relationship between the "mask" used in fault injection and this table

    Mask value is the XOR result of correct ECC value and corrupted ECC value. You can use this value to find the cell value of syndrome table.

    For example, if MASK = XY (hex value), the value of cell(row y, col x) of syndrome table indicates if the error is correctable or not.

    4. Neither do I know why sometimes the ECC flags the error, sometimes it does not and sometimes DABORT gets called

    Please use the flash module register (FMC_ECC) to inject ECC/Data fault in diag mode 7. I haven't tried to inject ECC error using linker.