This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137 (rev B) - SRAM ECC - creating faults



Hello denizens of TI-land,

We recently enabled SRAM ECC for our app.  To verify that it's working, I wanted to inject some faults and observe the ECC response.

To do so, I've been following the sensible recommendations of the ECC app note (spna126) and initialization app note (spna106a).  These say to do this: Disable ECC, modify some bits, re-enable ECC, try to read back the modified value.  Makes sense.  Suggestions vary on whether the modified bits should be the data bits or the ECC bits; ideally I'd be able to test both ways.  However, I have run into problems.

- If I modify the data bits with ECC disabled, and then re-enable ECC, I read back the modified data bits with no corrections or errors.  Using the debugger (sketchy, I know), I notice that when I modify the data bits, the ECC bits get updated, even though ECC checking is disabled -- so that's why the reads proceed normally once ECC is re-enabled.  Perhaps there's a separate bit somewhere that controls whether ECC bits are computed and updated on writes, independently of whether it's checked?

- If I modify the ECC bits, then I do get errors.  The first time, everything works as expected.  Single-bit errors increase the error counter (one of the RAMOCCUR values); double-bit errors cause an immediate abort.  However, _after_ I trigger a single-bit error, subsequent attempts to trigger any kind of error by modifying ECC bits seem to be ignored entirely (no errors, no aborts).

One hypothesis is that this could be related to our use of the MPU.  In particular, the SRAM bank has been set to type "normal".  Maybe for this kind of testing it should be set to "strongly ordered" or the like?  My understanding of the ARM memory model is admittedly slightly fuzzy, but I didn't think this part had the sort of data cache that would get in the way.

Anyway, here is what my code currently looks like:

=====

// Edit ECC memory directly to create an error.
void CorruptEcc(int bits, volatile void *pointer) {
  RAMCTRL1_bit()->ECC_WR_EN = 1;
  RAMCTRL2_bit()->ECC_WR_EN = 1;

  // Prepare to write the ECC bank, which is offset from the RAM bank;
  // the same 8 parity bits are repeated 8 times for every 64-bit word.
  bits = bits | (bits << 8) | (bits << 16) | (bits << 24);
  uint32_t address = (reinterpret_cast<uint32_t>(pointer) + 0x400000) & ~7;

  // Do this all in asm to avoid memory access while ECC is off.
  asm("mrc p15, #0, r0, c1, c0, #1\n\t"   // Load Aux Control Reg.
           "and r2, r0, #0xF3FFFFFF\n\t"       // Disable SRAM ECC
           "dsb\n\t"                           // Memory barrier voodoo
           "mcr p15, #0, r2, c1, c0, #1\n\t"   // Store ACR back
           "isb\n\t"                           // Instruction barrier voodoo

           "mrc p15, #0, r1, c15, c0, #0\n\t"  // Load 2nd Aux Control
           "and r2, r1, #0xFFFFFFFD\n\t"       // Disable BTCM Read-Modify-Write.
           "dsb\n\t"
           "mcr p15, #0, r2, c15, c0, #0\n\t"  // Store 2AC back
           "isb\n\t"

           "ldr r2, [%1]\n\t"                  // Load from memory
           "eor r2, r2, %0\n\t"                // XOR with bits
           "str r2, [%1]\n\t"                  // Store it back

           "dsb\n\t"
           "mcr p15, #0, r1, c15, c0, #0\n\t"  // Restore saved 2AC
           "mcr p15, #0, r0, c1, c0, #1\n\t"   // Restore saved ACR
           "isb\n\t"
           : : "r"(bits), "r"(address) : "r0", "r1", "r2");
}

=====

To recap, my questions are:

- Should it be possible to modify either data bits _or_ ECC bits to generate a fault?

- Is there a secondary control to change whether ECC is _generated_, independently of it being checked?

- Can you think of any reason the code above would successfully generate a fault once, but not twice?

Bonus question:

- There are two SRAM banks, which I understand are interleaved into each word (both data and ECC).  How are faults "attributed"?  If there's an ECC fault (correctable or not) in a particular word, which RAM's fault counters should get incremented?

Thanks!

-- egnor

  • Egnor,

    Please find my answers embedded below (TI >>)

    - Should it be possible to modify either data bits _or_ ECC bits to generate a fault?

    TI >> Yes

    - Is there a secondary control to change whether ECC is _generated_, independently of it being checked?

    TI >> No

    - Can you think of any reason the code above would successfully generate a fault once, but not twice?

    TI >> just need a clarification are you repeating the entire code mentioned  every time you want to insert an error

    Bonus question:

    - There are two SRAM banks, which I understand are interleaved into each word (both data and ECC).  How are faults "attributed"?  If there's an ECC fault (correctable or not) in a particular word, which RAM's fault counters should get incremented?

    TI >> If  you are referring to even(B0 ) and odd(B1 ) TCM interfaces then there is a fault counters per interface. Within an interface there is only one fault counter independent of how the banks are internally interleaved which directly maps to the logical address

    Hercules forum support

  • Yes, I am re-running that entire code every time I wish to simulate a fault.

  • Egnor,

      One correction on the answer to your question below:

    - Should it be possible to modify either data bits _or_ ECC bits to generate a fault?

    TI >> You can not modify the data bits to create fault. The reason is that the CPU will generate the corresponding ECC bits for the modified data even when ECC is disabled in the auxiliary control register. Once you enable ECC again, the read from the location with the modified data will have the proper ECC and hence will not perform any correction. You will read back just the modified data.

    TI Forum Support

  • Okay, that matches my experience (can't write data bits without implicitly writing ECC) -- glad to know I'm not insane!

    Please considering updating application report spna126, "ECC Handling in TMSx70-Based Microcontrollers"; section 3.4, "Introducing an Error into RAM ECC", suggests changing the data bits.  It should probably be revised to suggest changing the ECC bits (after setting the appropriate bits to make them writable).

    ===> I am still confuzzled however by my inability to inject additional faults after the first!  This applies to _both_ SRAM ECC and Flash ECC.  (My fault injection procedure is different for Flash; I use the F021 API to write bad ECC data.)  In both cases, the _first_ fault (single-bit or double-bit) works precisely as expected, however subsequent faults go completely uncorrected and unreported.  This is mysterious and disconcerting, do you have any idea why this might happen?  Does continuing to operate after a single-bit correction event (which ends up setting an ESM flag, triggering nERROR, etc.) somehow disable ECC?