TMS570LS1224: FEE, RAM and FLASH ECC Error Handling and Test with SafeTI Diagnostic Library

Jens Benner

Part Number: TMS570LS1224
Other Parts Discussed in Thread: HALCOGEN

Hello all,

I need to Handle ECC FEE, RAM and FLASH Errors to bring the device i a save state in case of these Errors. Therefor I need to verrify that EEC erros will be detected by using the Test Functions from the SafeTI Diagnostic Library 2.2.0 on a TMS570LS1224 Lauchpad.

I use these function to test Error Handling of ECC Errors:

b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT, false, &fee_stResult);

I get an esm group 1 exception channel 35; nError lights up until esmREG->EKR = 0x5U; is executed after restart by excuting systemREG1->SYSECR = 0x0000C000; nError lights up again and test can't be repeated; only pressing reset button or powercycle clears nError

Why does the nError lights up again after retarting by systemREG1->SYSECR = 0x0000C000?

b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_2BIT_FAULT_INJECT, false, &fee_stResult);

I get an esm group 1 exception channel 36; nError lights up until esmREG->EKR = 0x5U; after restart by excuting systemREG1->SYSECR = 0x0000C000; test can be repeated -> works as expected

b_Result = SL_SelfTest_SRAM(SRAM_ECC_1BIT_FAULT_INJECTION, true, &SelfTestResult);

I get an esm group 1 exception for channel 26 and channel 28; nError light up until esmREG->EKR = 0x5U; is executed; after restart by excuting systemREG1->SYSECR = 0x0000C000; test can be repeated -> works as expected

b_Result = SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

I do not get any Exception so far; but nError light up for a short moment; I expected to get into group2 notification or dabort Handler?

Any help is appreciated.
Best Regards
Jens

4 months ago

0 Ira Thete 4 months ago

TI__Expert 4725 points

Hi Jens,

Thank you for your question. The persistent nERROR assertion after software reset for the 1-bit FEE error test is caused by the Flash Error Detection and Correction Status Register (FEDACSTATUS) not being properly cleared before reset. When a single-bit correctable error is discovered using diagnostic mode, the Diagnostic Correctable Error Status Flag (D_COR_ERR, bit 3) in the FEDACSTATUS register is set.

This is because the ESM channel 35 corresponds to FMC correctable ECC errors on Bank 7 accesses. When you clear the ESM error key register (esmREG->EKR = 0x5U) and perform a software reset (systemREG1->SYSECR = 0x0000C000), the FEDACSTATUS register flags are not automatically cleared. Upon restart, the system re-evaluates the error condition and re-asserts nERROR because the D_COR_ERR flag remains set.

Before performing the software reset, you must explicitly clear the FEDACSTATUS register flags. Write to the appropriate bits in FEDACSTATUS to clear the D_COR_ERR flag. Only a hardware reset (power cycle or reset button) clears all flash controller registers to their default state, which explains why those methods work.

The 2-bit FEE error test works correctly because it triggers ESM Group 1 channel 36 (FMC uncorrectable error). The Diagnostic Uncorrectable Error Flag (D_UNC_ERR, bit 12) is set in FEDACSTATUS [3], but the handling differs from correctable errors, allowing proper reset and repetition.

The SRAM 1-bit error tests correctly trigger ESM Group 1 exceptions:

Channel 26: B0TCM (even bank) correctable ECC error.
Channel 28: B1TCM (odd bank) correctable ECC error.

These are handled as ESM interrupts and can be properly cleared and repeated.

SRAM 2-Bit Error Test Issue

Your expectation of Group 2 notification or abort handler is incorrect. According to the TMS570LS1224 error response table, 2-bit SRAM ECC errors (uncorrectable) trigger Group 3 ESM events, not Group 2.

Error Source	Error Response	ESM Channel
B0 TCM (even) ECC double error	Abort (CPU), ESM => nERROR	3.3
B1 TCM (odd) ECC double error	Abort (CPU), ESM => nERROR	3.5

Expected Behavior: A 2-bit SRAM ECC fault should:

Generate a CPU data abort exception
Trigger ESM Group 3 channels 3 or 5 (depending on which bank)
Assert nERROR pin directly

The brief nERROR pulse you observe suggests the error is being detected, but you may not be seeing the exception because:

Your data abort handler may not be properly configured
The ESM Group 3 interrupt handler may not be set up
The fault occurs in a context where the abort cannot be properly serviced

Recommendations:

For FEE 1-bit test: Clear FEDACSTATUS register before software reset:

// Clear the diagnostic correctable error flag

flashWREG->FEDACSTATUS = 0x8; // Clear D_COR_ERR (bit 3)

// Then perform software reset

systemREG1->SYSECR = 0x0000C000;
For SRAM 2-bit test:
- Implement a data abort handler to catch the CPU abort
- Configure ESM Group 3 channels 3 and 5 handlers
- Verify that ESM Group 3 error handling is enabled in your ESM configuration
- Check that the abort exception vector is properly configured in your startup code
Enable ECC properly: Ensure ECC is enabled in the Flash wrapper by writing '1010' to the EDACEN bits (bits 3-0 of FEDACCTRL1) before enabling ECC in the CPU.
ESM Configuration: The Error Signaling Module monitors all device errors and determines whether an interrupt is generated or the nERROR pin is triggered.Verify your ESM group configurations match the expected error routing.

Thanks and Regards,

Ira

0 Jens Benner 4 months ago in reply to Ira Thete

Intellectual 340 points

Hello Ira,

thank you very much for your prompt reply. I tried to clear the D_COR_ERR (bit 3) by calling flashWREG->FEDACSTATUS = 0x8; but without success. But your hint got me on right track and I compared the flashWREG contents before and after calling
b_Result = SL_SelfTest_FEE (FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT, false, &fee_stResult);

Before flashWREG->EESTATUS was 0x0 and after calling FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT is was 0x0b.

So I cleared EE_D_UNC_ERR by executing flashWREG->EESTATUS &= 0xb; // Clear EE_D_UNC_ERR (Bit 12) and then the the nError did not light up afer restarting with systemREG1->SYSECR = 0x0000C000; And I can repead the Test FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT also.

But I wonder why this EE_D_UNC_ERR (An uncorrectable error was detected in diagnostic mode 1. This means two or more bits in
the data or ECC field have been found in error, or one or more bits in the address have been found in error.) happens with FEE_ECC_TEST_MODE_1BIT_FAULT_INJECT Test? I would exprect thes with the FEE_ECC_TEST_MODE_2BIT_FAULT_INJECT?

In Halocogen I can configure the ESM Group 1 Cahnnels 0 to 63. But there is no configuration tab for the Group 2 and Group 3 Channels. How do I configure resp enable group 2 cahnnels. The VIM Cahnnel 0 (ESM High) ist allways active and mapped to FIQ. It cant be changed beause it is grayed out.

"Enable ECC properly: Ensure ECC is enabled in the Flash wrapper by writing '1010' to the EDACEN bits (bits 3-0 of FEDACCTRL1) before enabling ECC in the CPU" how do I check this (EDACEN before ECC in the CPU)?

I have a _dabort handler but it does not get called:

.extern custom_dabort
.weak _dabort
.type _dabort, %function

_dabort:
stmfd r13!, {r0 - r12, lr}@ push registers and link register on to stack

ldr r12, esmsr3 @ ESM Group3 status register
ldr r0, [r12]
tst r0, #0x8 @ check if bit 3 is set, this indicates uncorrectable ECC error on B0TCM
bne ramErrorFound
tst r0, #0x20 @ check if bit 5 is set, this indicates uncorrectable ECC error on B1TCM
bne ramErrorFound2

noRAMerror:
tst r0, #0x80 @ check if bit 7 is set, this indicates uncorrectable ECC error on ATCM
bne flashErrorFound

bl custom_dabort @ custom data abort handler required
@ If this custom handler is written in assembly, all registers used in the routine
@ and the link register must be saved on to the stack upon entry, and restored before
@ return from the routine.

ldmfd r13!, {r0 - r12, lr}@ pop registers and link register from stack
subs pc, lr, #8 @ restore state of CPU when abort occurred, and branch back to instruction that was aborted

ramErrorFound:
ldr r1, ramctrl @ RAM control register for B0TCM TCRAMW
ldr r2, [r1]
tst r2, #0x100 @ check if bit 8 is set in RAMCTRL, this indicates ECC memory write is enabled
beq ramErrorReal
mov r2, #0x20
str r2, [r1, #0x10] @ clear RAM error status register

mov r2, #0x08
str r2, [r12] @ clear ESM group3 channel3 flag for uncorrectable RAM ECC errors
mov r2, #5
str r2, [r12, #0x18] @ The nERROR pin will become inactive once the LTC counter expires

ldmfd r13!, {r0 - r12, lr}
subs pc, lr, #4 @ branch to instruction after the one that caused the abort
@ this is the case because the data abort was caused intentionally
@ and we do not want to cause the same data abort again.

ramErrorFound2:
ldr r1, ram2ctrl @ RAM control register for B1TCM TCRAMW
ldr r2, [r1]
tst r2, #0x100 @ check if bit 8 is set in RAMCTRL, this indicates ECC memory write is enabled
beq ramErrorReal
mov r2, #0x20
str r2, [r1, #0x10] @ clear RAM error status register

mov r2, #0x20
str r2, [r12] @ clear ESM group3 flags channel5 flag for uncorrectable RAM ECC errors
mov r2, #5
str r2, [r12, #0x18] @ The nERROR pin will become inactive once the LTC counter expires

ldmfd r13!, {r0 - r12, lr}
subs pc, lr, #4 @ branch to instruction after the one that caused the abort
@ this is the case because the data abort was caused intentionally
@ and we do not want to cause the same data abort again.

ramErrorReal:
b ramErrorReal @ branch here forever as continuing operation is not recommended

flashErrorFound:
ldr r1, flashbase
ldr r2, [r1, #0x6C] @ read FDIAGCTRL register

mov r2, r2, lsr #16
tst r2, #5 @ check if bits 19:16 are 5, this indicates diagnostic mode is enabled
beq flashErrorReal
mov r2, #1
mov r2, r2, lsl #8

str r2, [r1, #0x1C] @ clear FEDACSTATUS error flag

mov r2, #0x80
str r2, [r12] @ clear ESM group3 flag for uncorrectable flash ECC error
mov r2, #5
str r2, [r12, #0x18] @ The nERROR pin will become inactive once the LTC counter expires

ldmfd r13!, {r0 - r12, lr}
subs pc, lr, #4 @ branch to instruction after the one that caused the abort
@ this is the case because the data abort was caused intentionally
@ and we do not want to cause the same data abort again.

flashErrorReal:
b flashErrorReal @ branch here forever as continuing operation is not recommended

esmsr3: .word 0xFFFFF520
ramctrl: .word 0xFFFFF800
ram2ctrl: .word 0xFFFFF900
ram1errstat: .word 0xFFFFF810
ram2errstat: .word 0xFFFFF910
flashbase: .word 0xFFF87000

Best regards
Jens

0 Jens Benner 4 months ago in reply to Jens Benner

Intellectual 340 points

Hello Ira,

I just want to come back to the RAM ECC 2Bit Errors. I still was not able to get a notification when I execute SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

But something must happen since I see the nError LED light up for a short period.

As you said:

For SRAM 2-bit test:

Implement a data abort handler to catch the CPU abort
-> data abort handler is implemeneted
Configure ESM Group 3 channels 3 and 5 handlers
-> How to do that?
Verify that ESM Group 3 error handling is enabled in your ESM configuration
-> Can not configure ESM Group 3 in Hancogen since it is grayed out
Check that the abort exception vector is properly configured in your startup code
-> abort exception vector is properly configured in my startup code

Any help is appreciated.
Best Regards
Jens

0 jagadish gundavarapu 4 months ago in reply to Jens Benner

TI__Guru* 78176 points

Hi Jens,

We have one internal AI tool which can analyze all the issues related to this controller along with all the database, i got some useful suggestions from it could you please try them on your first phase, if it didn't help you then i will dig further to rectify this issue.

ESM Group 3 cannot generate CPU interrupts — it is hardwired to assert the nERROR pin only. This is by design: Group 3 errors are classified as critical, non-maskable hardware faults. HALCoGen grays it out precisely because there is nothing to configure in software. You will never receive an ESM callback for a Group 3 event, and you don't need one.

Regarding ESM Group 3 Channel 5 — you can ignore it entirely. It is not mapped to any SRAM error source on this device. Channel 3 is the only relevant channel for L2 SRAM ECC double-bit errors.

The hardware response chain for a 2-bit ECC error is:

2-bit ECC error detected
│
├──► ESM Group 3, Channel 3 → asserts nERROR pin (LED lights up).
│
└──► Bus Error → CPU Data Abort Exception.

Since your abort handler is implemented but not triggering a visible notification, the most likely causes are:

1. Your Abort Handler Is Not Clearing the Required Flags

The data abort handler for a 2-bit ECC error must perform all of the following steps, or execution will not return cleanly to the self-test function

void dataAbortHandler(void) {
// 1. Read DFSR to determine fault type (bits [3:0] and S-bit [10])
// 2. Read DFAR to get the faulting address (synchronous aborts)
// 3. Clear ESM Group 3 error flag: write to esmREG->SR1[2]
// 4. Clear RAMERRSTATUS flags (DRDE bit 4 and DWDE bit 3)
// 5. Return to R14_abt - 8 (the faulting instruction address)
}

if ((esmREG->SR1[2]) != 0U) {
// ESM Group 3 error confirmed — log/flag it here
esmREG->SR1[2] = 0x1; // Write 1 to clear
}

After a successful test, RAMERRSTATUS should read 0x18 (DRDE=1, DWDE=1) [6]. If you're not seeing this, the abort handler isn't completing correctly.

2. The MPU May Be Interfering

If the Memory Protection Unit is enabled in your configuration, it can trigger its own data abort when the diagnostic test accesses the L2RAM test region — before the ECC fault even fires. Disable it before calling the self-test:

_mpuDisable_();
retVal = SL_SelfTest_SRAM(SRAM_ECC_2BIT_FAULT_INJECT, true, &SelfTestResult);

3. The Abort Handler May Not Be Returning to the Right Address

The faulting instruction is at R14_ABT − 8. If your handler returns to the wrong address, execution won't resume inside SL_SelfTest_SRAM, and the function will never populate SelfTestResult with a valid result.

Thanks & regards,
Jagadish.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LS1224: FEE, RAM and FLASH ECC Error Handling and Test with SafeTI Diagnostic Library

SRAM 2-Bit Error Test Issue

1. Your Abort Handler Is Not Clearing the Required Flags

2. The MPU May Be Interfering

3. The Abort Handler May Not Be Returning to the Right Address