This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM48L952: ECC detects double bit errors but won't respond to single bit errors

Part Number: RM48L952
Other Parts Discussed in Thread: NOWECC, HALCOGEN

I am trying out ECC on my device and am having some issues. I use HALCoGen to generate the ECC code and nowECC to inject errors. When I inject a double bit error, everything works as I would expect; it gets into the prefetch abort once the 64 bit chunk with the error has been loaded. If I inject only a single bit error, no correction or exception seems to happen. This picture shows the issue happening. The upper highlighted part shows the push command at the beginning of a function. nowECC modified this line to push r5 when it should not. In the lower highlighted region, you can see it does not pop r5 off which causes some issues. From the registers displayed, it looks like it detects an uncorrectable ECC error, but it looks like it may not be referring to this section in flash.

Can anyone explain why this is happening?

  • Westin,

    Can you put the deliberate single-bit error in a piece of code that is not part of the data abort or prefetch abort handler? Also, how is the CPU coprocessor cp15 register setup for handling TCM ECC errors?

    Regards,
    Sunil
  • Sunil,

    I apologize for the confusion. This is not in the abort handler. This test I have is for exercising the abort handlers. This function executes some code that would cause an exception if a variable is previously set.

    I'm just using the Halcogen code for initializing it. Here are the CP1 register values related to TCM:

    CP15_TCM_BTCM_REGION 0x08000039
    CP15_TCM_ATCM_REGION 0x00000039
    CP15_TCM_TCM_SELECTION 0x00000000

    Here is the code Halcogen generates to set it up.

    /* Enable CPU Event Export */
    /* This allows the CPU to signal any single-bit or double-bit errors detected
    * by its ECC logic for accesses to program flash or data RAM.
    */
    _coreEnableEventBusExport_();
    /* USER CODE BEGIN (9) */
    /* USER CODE END */

    /* Enable response to ECC errors indicated by CPU for accesses to flash */
    flashWREG->FEDACCTRL1 = 0x000A060AU;

    /* USER CODE BEGIN (10) */
    /* USER CODE END */

    /* Enable CPU ECC checking for ATCM (flash accesses) */
    _coreEnableFlashEcc_();

  • Westin,

    In your original post you stated:

    "The upper highlighted part shows the push command at the beginning of a function. nowECC modified this line to push r5 when it should not. In the lower highlighted region, you can see it does not pop r5 off which causes some issues."

    I think the function was supposed to push only R3 and LR to the stack. When you changed the ECC code using nowECC, the CPU's single-bit-error correction mechanism corrected the op code to also include R5 in the PUSH instruction. There is a single-bit different between the two following instructions:
    PUSH {R3, LR}
    PUSH {R3, R5, LR}

    So, I think the ECC logic is working correctly. Do you see an ECC correction indicated in the FMC register: FEDCSTATUS.bit 1? This would also be reflected in the ESM group1 status register bit 6.

    Regards,
    Sunil
  • I agree that the function is supposed to only push R3 and LR to the stack. If I don't modify the bits with nowECC, that is what happens. How could ECC be working and correct the instruction to something that it was not supposed to be when there was only a single bit error?

    I couldn't find the register FEDCSTATUS. Where do I find that? When I typed that in the expressions tab in the same way I did in the image above, it says it couldn't find it. I looked in the ESM registers. The register the debugger calls "Stat1" is all zeros. Is that ESM group1 status?

    Thanks,

    Westin

  • Does anybody have any ideas on this? This seems very strange to me. I looked at the FCORERRADD register and found that the address of the correctable error it found is 0x34590 as expected.
  • Westin,

    I had a typo in the register name that I mentioned. The FEDACSTATUS is at address 0xFFF8_701C.

    Can you also post the contents of the other FCOR* registers along with the FEDACSTATUS register?

    Regards,

    Sunil

  • The FEDACSTATUS register can be seen in the image in my original post, but I looked again and it is different now. I'm not sure what would have changed. The value is now 0x102 when on the corrupted push {r3, r5, lr} instruction.

    This image shows the registers.

  • Westin,

    Can you put a breakpoint on address 0x3459C and then read out the flash module registers when the execution stops here? Please also check the ESM status registers. The FEDACSTATUS register shows a detected uncorrectable error (bit 8). Can you clear this bit in the pre-fetch abort handler?

    Regards,
    Sunil

  • Okay, I caught the break point on the first cycle after powering up, so no exceptions have been thrown. Here is the state of the registers.

    FlashWrapper
    FRdCntl 0x00000411 Read Control Register [Memory Mapped]
    FSpRd 0x00000000 Special Read Control Register [Memory Mapped]
    FEdacCtrl1 0x000A060A Error Correction Control Register1 [Memory Mapped]
    FEdacCtrl2 0x00000000 Error Correction Control Register2 [Memory Mapped]
    FCorErrCnt 0x00000000 Error Correction Counter Register [Memory Mapped]
    FCorErrAddr 0x00034590 Correctable Error Address [Memory Mapped]
    FCorErrPos 0x00000000 Correctable Error Position Register [Memory Mapped]
    FEdacStat 0x00000002 Error Status Register [Memory Mapped]
    FUncErrAddr 0x00000000 Un-correctable Error Address [Memory Mapped]
    FEdacSDis 0x00000000 Error Detection Sector Disable [Memory Mapped]
    FPprimAddrTag 0x000345B0 Primary Address Tag Register [Memory Mapped]
    FReduAddrTag 0x000345B0 Redundant Address Tag Register [Memory Mapped]
    FBnkProt 0x00000000 Bank Protection Register [Memory Mapped]
    FBnkSec 0x00000000 Bank Sector Enable Register [Memory Mapped]
    FBusy 0x0000007C Bank Busy Register [Memory Mapped]
    FBnkAcc 0x0000000F Bank Access Control Register [Memory Mapped]
    FBnkFallback 0x0000FFFF Bank Fallback Power Register [Memory Mapped]
    FBnkPmpRdy 0x007C80FF Bank/Pump Ready Register [Memory Mapped]
    FPmpAcc1 0x00C80001 Pump Access Control Register 1 [Memory Mapped]
    FPmpAcc2 0x00000000 Pump Access Control Register 2 [Memory Mapped]
    FMdlAcc 0x00000007 Module Access Control Register [Memory Mapped]
    FMdlStat 0x00000000 Module Status Register [Memory Mapped]
    FEmuDatMsw 0x00000000 EEPROM Emulation Data MSW Register [Memory Mapped]
    FEmuDatLsw 0x00000000 EEPROM Emulation Data LSW Register [Memory Mapped]
    FEmuEcc 0x00000000 EEPROM Emulation ECC Register [Memory Mapped]
    FLock 0x00000000 Flash Lock Register [Memory Mapped]
    FEmuAddr 0x00000000 EEPROM Emulation Address [Memory Mapped]
    FDiagCtrl 0x000A0000 Diagnostic Control Register [Memory Mapped]
    FRawDataH 0x00000000 Uncorrected Raw Data High [Memory Mapped]
    FRawDataL 0x00000000 Uncorrected Raw Data Low [Memory Mapped]
    FRawEcc 0x00000000 Uncorrected Raw ECC [Memory Mapped]
    FParOvr 0x00005400 Parity Override [Memory Mapped]
    FEdacSDis2 0x00000000 Error Detection Sector Disable Register 2 [Memory Mapped]




    Esm
    IflErrPinSet1 0x00000000 Influence Error Pin Set/Status Register 1 [Memory Mapped]
    IflErrPinClr1 0x00000000 Influence Error Pin Clear/Status Register 1 [Memory Mapped]
    IntEnaSet1 0x00000000 Interrupt Enable Set/Status Register 1 [Memory Mapped]
    IntEnaClr1 0x00000000 Interrupt Enable Clear/Status Register 1 [Memory Mapped]
    IntLvlSet1 0x00000000 Interrupt Level Set/Status Register 1 [Memory Mapped]
    IntLvlClr1 0x00000000 Interrupt Level Clear/Status Register 1 [Memory Mapped]
    Stat1 0x00000040 Status Register 1 [Memory Mapped]
    Stat2 0x00000000 Status Register 2 [Memory Mapped]
    Stat3 0x00000000 Status Register 3 [Memory Mapped]
    ErrPinStat 0x00000001 Error Pin Status Register [Memory Mapped]
    IntOffstHgh 0x00000000 Interrupt Offset High Register [Memory Mapped]
    IntOffstLow 0x00000000 Interrupt Offset Low Register [Memory Mapped]
    LtCnt 0x00003FFF Low-Time Counter Register [Memory Mapped]
    LtCntPre 0x00003FFF Low-Time Counter Preload Register [Memory Mapped]
    ErrKey 0x00000000 Error Key Register [Memory Mapped]
    ShdwStat2 0x00000000 Status Shadow Register [Memory Mapped]
    IflErrPinSet4 0x00000000 Influence Error Pin Set/Status Register 4 [Memory Mapped]
    IflErrPinClr4 0x00000000 Influence Error Pin Clear/Status Register 4 [Memory Mapped]
    IntEnaSet4 0x00000000 Interrupt Enable Set/Status Register 4 [Memory Mapped]
    IntEnaClr4 0x00000000 Interrupt Enable Clear/Status Register 4 [Memory Mapped]
    IntLvlSet4 0x00000000 Interrupt Level Set/Status Register 4 [Memory Mapped]
    IntLvlClr4 0x00000000 Interrupt Level Clear/Status Register 4 [Memory Mapped]
    Stat4 0x00000000 Status Register 4 [Memory Mapped]
  • Thanks for the register content. The FEDACSTATUS register shows that a correctable error occurred where a zero was read as a one from address 0x34590. Also this error is indicated in the ESM group1 status register bit 6. So the CPU's single-bit error signaling mechanism is verified.

    Now, do you see R5 actually pushed onto the stack, or was the instruction corrected to only push R3 and LR onto the stack? You can review the stack content at the same breakpoint with known values in R3, R5 and see if they are both getting pushed onto the stack.

    Regards,
    Sunil
  • I'm pretty sure the corrupted instruction is being executed and not just displayed in the debugger. I made a break point on the push command and ran to it. At this point, the SP register was 0x08000EC8. I stepped over it and the SP register changed to 0x08000EBC. This indicates to me that three registers are being pushed onto the stack. Additionally, the PC register goes to 0 after we pop from the stack because R5 is zero and gets popped onto PC.

  • Westin,

    Can you please put a breakpoint at 0x3459c and then check the stack contents? I want to rule out any interaction between debug state and the ECC correction logic. Also, can you read out the CPU's correctable fault location register (CFLR)?
  • Is CFLR the same as FCorErrAddr in the FlashWrapper group in CCS? That value is zero when I break at address 0x3459C.
    The stack frames shown in CCS break once I step over the push command. The stack frame below my current one says "0x00000000 (no symbols are defined)"
  • Is there anything else I should look into?

    Thanks
  • Westin,

    I will write a test to check this behavior and get back to you with my results.

    Regards,
    Sunil
  • Okay, thank you.