This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS1224: FEE, RAM and FLASH ECC Error Handling and Test with SafeTI Diagnostic Library

Part Number: TMS570LS1224
Other Parts Discussed in Thread: HALCOGEN

Hello all, 

I need to Handle ECC FEE, RAM and FLASH Errors to bring the device i a save state in case of these Errors. Therefor I need to verrify that EEC erros will be detected by using the Test Functions from the SafeTI Diagnostic Library 2.2.0 on a TMS570LS1224 Lauchpad.

I use these function to test Error Handling of ECC Errors:

b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT, false, &fee_stResult);

I get an esm group 1 exception channel 35; nError lights up until esmREG->EKR = 0x5U; is executed after restart by excuting systemREG1->SYSECR = 0x0000C000; nError lights up again and test can't be repeated; only pressing reset button or powercycle clears nError

Why does the nError lights up again after retarting by systemREG1->SYSECR = 0x0000C000?

b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_2BIT_FAULT_INJECT, false, &fee_stResult);

I get an esm group 1 exception channel 36; nError lights up until esmREG->EKR = 0x5U; after restart by excuting systemREG1->SYSECR = 0x0000C000; test can be repeated -> works as expected

b_Result = SL_SelfTest_SRAM(SRAM_ECC_1BIT_FAULT_INJECTION, true, &SelfTestResult);

I get an esm group 1 exception for channel 26 and channel 28; nError light up until esmREG->EKR = 0x5U; is executed; after restart by excuting systemREG1->SYSECR = 0x0000C000; test can be repeated -> works as expected

b_Result = SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

I do not get any Exception so far; but nError light up for a short moment; I expected to get into group2 notification or dabort Handler? 

Any help is appreciated.
Best Regards
Jens

 

 

  • Hi Jens,

    Thank you for your question. The persistent nERROR assertion after software reset for the 1-bit FEE error test is caused by the Flash Error Detection and Correction Status Register (FEDACSTATUS) not being properly cleared before reset. When a single-bit correctable error is discovered using diagnostic mode, the Diagnostic Correctable Error Status Flag (D_COR_ERR, bit 3) in the FEDACSTATUS register is set.

    This is because the ESM channel 35 corresponds to FMC correctable ECC errors on Bank 7 accesses. When you clear the ESM error key register (esmREG->EKR = 0x5U) and perform a software reset (systemREG1->SYSECR = 0x0000C000), the FEDACSTATUS register flags are not automatically cleared. Upon restart, the system re-evaluates the error condition and re-asserts nERROR because the D_COR_ERR flag remains set.

    Before performing the software reset, you must explicitly clear the FEDACSTATUS register flags. Write to the appropriate bits in FEDACSTATUS to clear the D_COR_ERR flag. Only a hardware reset (power cycle or reset button) clears all flash controller registers to their default state, which explains why those methods work.

    The 2-bit FEE error test works correctly because it triggers ESM Group 1 channel 36 (FMC uncorrectable error). The Diagnostic Uncorrectable Error Flag (D_UNC_ERR, bit 12) is set in FEDACSTATUS [3], but the handling differs from correctable errors, allowing proper reset and repetition.

    The SRAM 1-bit error tests correctly trigger ESM Group 1 exceptions:

    • Channel 26: B0TCM (even bank) correctable ECC error.
    • Channel 28: B1TCM (odd bank) correctable ECC error.

    These are handled as ESM interrupts and can be properly cleared and repeated.

    SRAM 2-Bit Error Test Issue

    Your expectation of Group 2 notification or abort handler is incorrect. According to the TMS570LS1224 error response table, 2-bit SRAM ECC errors (uncorrectable) trigger Group 3 ESM events, not Group 2.

    Error Source
    Error Response
    ESM Channel
    B0 TCM (even) ECC double error
    Abort (CPU), ESM => nERROR
    3.3
    B1 TCM (odd) ECC double error
    Abort (CPU), ESM => nERROR
    3.5

    Expected Behavior: A 2-bit SRAM ECC fault should:

    1. Generate a CPU data abort exception
    2. Trigger ESM Group 3 channels 3 or 5 (depending on which bank)
    3. Assert nERROR pin directly 

    The brief nERROR pulse you observe suggests the error is being detected, but you may not be seeing the exception because:

    • Your data abort handler may not be properly configured
    • The ESM Group 3 interrupt handler may not be set up
    • The fault occurs in a context where the abort cannot be properly serviced

    Recommendations:

    1. For FEE 1-bit test: Clear FEDACSTATUS register before software reset:

      // Clear the diagnostic correctable error flag
      flashWREG->FEDACSTATUS = 0x8; // Clear D_COR_ERR (bit 3)
      // Then perform software reset
      systemREG1->SYSECR = 0x0000C000;
    2. For SRAM 2-bit test:

      • Implement a data abort handler to catch the CPU abort
      • Configure ESM Group 3 channels 3 and 5 handlers
      • Verify that ESM Group 3 error handling is enabled in your ESM configuration
      • Check that the abort exception vector is properly configured in your startup code
    3. Enable ECC properly: Ensure ECC is enabled in the Flash wrapper by writing '1010' to the EDACEN bits (bits 3-0 of FEDACCTRL1) before enabling ECC in the CPU.

    4. ESM Configuration: The Error Signaling Module monitors all device errors and determines whether an interrupt is generated or the nERROR pin is triggered.Verify your ESM group configurations match the expected error routing.

    Thanks and Regards,

    Ira

  • Hello Ira,

    thank you very much for your prompt reply. I tried to clear the D_COR_ERR (bit 3) by calling flashWREG->FEDACSTATUS = 0x8; but without success. But your hint got me on right track and I compared the flashWREG contents before and after calling
    b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT, false, &fee_stResult); 

    Before flashWREG->EESTATUS was 0x0 and after calling FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT is was 0x0b.

    So I cleared EE_D_UNC_ERR by executing flashWREG->EESTATUS &= 0xb; // Clear EE_D_UNC_ERR (Bit 12) and then the the nError did not light up afer restarting with systemREG1->SYSECR = 0x0000C000; And I can repead the Test FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT also.

    But I wonder why this  EE_D_UNC_ERR  (An uncorrectable error was detected in diagnostic mode 1. This means two or more bits in
    the data or ECC field have been found in error, or one or more bits in the address have been found in error.) happens with FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT Test? I would exprect thes with the FEE_ECC_TEST_MODE_2BIT_FAULT_INJECT?

    In Halocogen I can configure the ESM Group 1 Cahnnels 0 to 63. But there is no configuration tab for the Group 2 and Group 3 Channels. How do I configure resp enable group 2 cahnnels. The VIM Cahnnel 0 (ESM High) ist allways active and mapped to FIQ. It cant be changed beause it is grayed out.

    "Enable ECC properly: Ensure ECC is enabled in the Flash wrapper by writing '1010' to the EDACEN bits (bits 3-0 of FEDACCTRL1) before enabling ECC in the CPU" how do I check this (EDACEN before ECC in the CPU)?


     I have a _dabort handler but it does not get called:

        .extern custom_dabort   
        .weak _dabort   
        .type _dabort, %function

    _dabort:
            stmfd   r13!, {r0 - r12, lr}@ push registers and link register on to stack

            ldr     r12, esmsr3         @ ESM Group3 status register
            ldr     r0,  [r12]
            tst     r0,  #0x8           @ check if bit 3 is set, this indicates uncorrectable ECC error on B0TCM
            bne     ramErrorFound
            tst     r0, #0x20           @ check if bit 5 is set, this indicates uncorrectable ECC error on B1TCM
            bne     ramErrorFound2

    noRAMerror:
            tst     r0, #0x80           @ check if bit 7 is set, this indicates uncorrectable ECC error on ATCM
            bne     flashErrorFound

            bl      custom_dabort       @ custom data abort handler required
                                        @ If this custom handler is written in assembly, all registers used in the routine
                                        @ and the link register must be saved on to the stack upon entry, and restored before
                                        @ return from the routine.

            ldmfd   r13!, {r0 - r12, lr}@ pop registers and link register from stack
            subs    pc, lr, #8          @ restore state of CPU when abort occurred, and branch back to instruction that was aborted

    ramErrorFound:
            ldr     r1, ramctrl         @ RAM control register for B0TCM TCRAMW
            ldr     r2, [r1]
            tst     r2, #0x100          @ check if bit 8 is set in RAMCTRL, this indicates ECC memory write is enabled
            beq     ramErrorReal
            mov     r2, #0x20
            str     r2, [r1, #0x10]     @ clear RAM error status register

            mov     r2, #0x08
            str     r2, [r12]           @ clear ESM group3 channel3 flag for uncorrectable RAM ECC errors
            mov     r2, #5
            str     r2, [r12, #0x18]    @ The nERROR pin will become inactive once the LTC counter expires

            ldmfd   r13!, {r0 - r12, lr}
            subs    pc, lr, #4          @ branch to instruction after the one that caused the abort
                                        @ this is the case because the data abort was caused intentionally
                                        @ and we do not want to cause the same data abort again.

    ramErrorFound2:
            ldr     r1, ram2ctrl        @ RAM control register for B1TCM TCRAMW
            ldr     r2, [r1]
            tst     r2, #0x100          @ check if bit 8 is set in RAMCTRL, this indicates ECC memory write is enabled
            beq     ramErrorReal
            mov     r2, #0x20
            str     r2, [r1, #0x10]     @ clear RAM error status register

            mov     r2, #0x20
            str     r2, [r12]           @ clear ESM group3 flags channel5 flag for uncorrectable RAM ECC errors
            mov     r2, #5
            str     r2, [r12, #0x18]    @ The nERROR pin will become inactive once the LTC counter expires

            ldmfd   r13!, {r0 - r12, lr}
            subs    pc, lr, #4          @ branch to instruction after the one that caused the abort
                                        @ this is the case because the data abort was caused intentionally
                                        @ and we do not want to cause the same data abort again.


    ramErrorReal:
            b       ramErrorReal        @ branch here forever as continuing operation is not recommended

    flashErrorFound:
            ldr     r1, flashbase
            ldr     r2, [r1, #0x6C]     @ read FDIAGCTRL register

            mov     r2, r2, lsr #16
            tst     r2, #5              @ check if bits 19:16 are 5, this indicates diagnostic mode is enabled
            beq     flashErrorReal
            mov     r2, #1
            mov     r2, r2, lsl #8      
            
            str     r2, [r1, #0x1C]     @ clear FEDACSTATUS error flag

            mov     r2, #0x80
            str     r2, [r12]           @ clear ESM group3 flag for uncorrectable flash ECC error
            mov     r2, #5
            str     r2, [r12, #0x18]    @ The nERROR pin will become inactive once the LTC counter expires

            ldmfd   r13!, {r0 - r12, lr}
            subs    pc, lr, #4          @ branch to instruction after the one that caused the abort
                                        @ this is the case because the data abort was caused intentionally
                                        @ and we do not want to cause the same data abort again.


    flashErrorReal:
            b       flashErrorReal      @ branch here forever as continuing operation is not recommended
            
    esmsr3:      .word 0xFFFFF520    
    ramctrl:     .word 0xFFFFF800    
    ram2ctrl:    .word 0xFFFFF900    
    ram1errstat: .word 0xFFFFF810    
    ram2errstat: .word 0xFFFFF910    
    flashbase:   .word 0xFFF87000    


    Best regards
    Jens
     

  • Hello Ira,

    I just want to come back to the RAM ECC 2Bit Errors. I still was not able to get a notification when I execute  SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

    But something must happen since I see the nError LED light up for a short period.

    As you said:

    For SRAM 2-bit test:

    • Implement a data abort handler to catch the CPU abort
      -> data abort handler is implemeneted
    • Configure ESM Group 3 channels 3 and 5 handlers
      -> How to do that?
    • Verify that ESM Group 3 error handling is enabled in your ESM configuration
      -> Can not configure ESM Group 3 in Hancogen since it is grayed out
    • Check that the abort exception vector is properly configured in your startup code
      -> abort exception vector is properly configured in my startup code

    Any help is appreciated.
    Best Regards
    Jens


  • Hi Jens,

    We have one internal AI tool which can analyze all the issues related to this controller along with all the database, i got some useful suggestions from it could you please try them on your first phase, if it didn't help you then i will dig further to rectify this issue.

    ESM Group 3 cannot generate CPU interrupts — it is hardwired to assert the nERROR pin only. This is by design: Group 3 errors are classified as critical, non-maskable hardware faults. HALCoGen grays it out precisely because there is nothing to configure in software. You will never receive an ESM callback for a Group 3 event, and you don't need one.

    Regarding ESM Group 3 Channel 5 — you can ignore it entirely. It is not mapped to any SRAM error source on this device. Channel 3 is the only relevant channel for L2 SRAM ECC double-bit errors.

    The hardware response chain for a 2-bit ECC error is:

    2-bit ECC error detected

    ├──► ESM Group 3, Channel 3 → asserts nERROR pin (LED lights up).

    └──► Bus Error → CPU Data Abort Exception.

    Since your abort handler is implemented but not triggering a visible notification, the most likely causes are:

    1. Your Abort Handler Is Not Clearing the Required Flags

    The data abort handler for a 2-bit ECC error must perform all of the following steps, or execution will not return cleanly to the self-test function

    void dataAbortHandler(void) {
    // 1. Read DFSR to determine fault type (bits [3:0] and S-bit [10])
    // 2. Read DFAR to get the faulting address (synchronous aborts)
    // 3. Clear ESM Group 3 error flag: write to esmREG->SR1[2]
    // 4. Clear RAMERRSTATUS flags (DRDE bit 4 and DWDE bit 3)
    // 5. Return to R14_abt - 8 (the faulting instruction address)
    }

    if ((esmREG->SR1[2]) != 0U) {
    // ESM Group 3 error confirmed — log/flag it here
    esmREG->SR1[2] = 0x1; // Write 1 to clear
    }

    After a successful test, RAMERRSTATUS should read 0x18 (DRDE=1, DWDE=1) [6]. If you're not seeing this, the abort handler isn't completing correctly.

    2. The MPU May Be Interfering

    If the Memory Protection Unit is enabled in your configuration, it can trigger its own data abort when the diagnostic test accesses the L2RAM test region — before the ECC fault even fires. Disable it before calling the self-test:

    _mpuDisable_();
    retVal = SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

    3. The Abort Handler May Not Be Returning to the Right Address

    The faulting instruction is at R14_ABT − 8. If your handler returns to the wrong address, execution won't resume inside SL_SelfTest_SRAM, and the function will never populate SelfTestResult with a valid result.

    --

    Thanks & regards,
    Jagadish.