Hello denizens of TI-land,
We recently enabled SRAM ECC for our app. To verify that it's working, I wanted to inject some faults and observe the ECC response.
To do so, I've been following the sensible recommendations of the ECC app note (spna126) and initialization app note (spna106a). These say to do this: Disable ECC, modify some bits, re-enable ECC, try to read back the modified value. Makes sense. Suggestions vary on whether the modified bits should be the data bits or the ECC bits; ideally I'd be able to test both ways. However, I have run into problems.
- If I modify the data bits with ECC disabled, and then re-enable ECC, I read back the modified data bits with no corrections or errors. Using the debugger (sketchy, I know), I notice that when I modify the data bits, the ECC bits get updated, even though ECC checking is disabled -- so that's why the reads proceed normally once ECC is re-enabled. Perhaps there's a separate bit somewhere that controls whether ECC bits are computed and updated on writes, independently of whether it's checked?
- If I modify the ECC bits, then I do get errors. The first time, everything works as expected. Single-bit errors increase the error counter (one of the RAMOCCUR values); double-bit errors cause an immediate abort. However, _after_ I trigger a single-bit error, subsequent attempts to trigger any kind of error by modifying ECC bits seem to be ignored entirely (no errors, no aborts).
One hypothesis is that this could be related to our use of the MPU. In particular, the SRAM bank has been set to type "normal". Maybe for this kind of testing it should be set to "strongly ordered" or the like? My understanding of the ARM memory model is admittedly slightly fuzzy, but I didn't think this part had the sort of data cache that would get in the way.
Anyway, here is what my code currently looks like:
=====
// Edit ECC memory directly to create an error.
void CorruptEcc(int bits, volatile void *pointer) {
RAMCTRL1_bit()->ECC_WR_EN = 1;
RAMCTRL2_bit()->ECC_WR_EN = 1;
// Prepare to write the ECC bank, which is offset from the RAM bank;
// the same 8 parity bits are repeated 8 times for every 64-bit word.
bits = bits | (bits << 8) | (bits << 16) | (bits << 24);
uint32_t address = (reinterpret_cast<uint32_t>(pointer) + 0x400000) & ~7;
// Do this all in asm to avoid memory access while ECC is off.
asm("mrc p15, #0, r0, c1, c0, #1\n\t" // Load Aux Control Reg.
"and r2, r0, #0xF3FFFFFF\n\t" // Disable SRAM ECC
"dsb\n\t" // Memory barrier voodoo
"mcr p15, #0, r2, c1, c0, #1\n\t" // Store ACR back
"isb\n\t" // Instruction barrier voodoo
"mrc p15, #0, r1, c15, c0, #0\n\t" // Load 2nd Aux Control
"and r2, r1, #0xFFFFFFFD\n\t" // Disable BTCM Read-Modify-Write.
"dsb\n\t"
"mcr p15, #0, r2, c15, c0, #0\n\t" // Store 2AC back
"isb\n\t"
"ldr r2, [%1]\n\t" // Load from memory
"eor r2, r2, %0\n\t" // XOR with bits
"str r2, [%1]\n\t" // Store it back
"dsb\n\t"
"mcr p15, #0, r1, c15, c0, #0\n\t" // Restore saved 2AC
"mcr p15, #0, r0, c1, c0, #1\n\t" // Restore saved ACR
"isb\n\t"
: : "r"(bits), "r"(address) : "r0", "r1", "r2");
}
=====
To recap, my questions are:
- Should it be possible to modify either data bits _or_ ECC bits to generate a fault?
- Is there a secondary control to change whether ECC is _generated_, independently of it being checked?
- Can you think of any reason the code above would successfully generate a fault once, but not twice?
Bonus question:
- There are two SRAM banks, which I understand are interleaved into each word (both data and ECC). How are faults "attributed"? If there's an ECC fault (correctable or not) in a particular word, which RAM's fault counters should get incremented?
Thanks!
-- egnor