TMS570LC4357: Cache ECC and ESM Group 3 channel 9

Gael Le Moing

Hi all,

I am trying to understand how the Cache ECC works in the Cortex-R5F, coupled with the TMS570LC and the ESM, in order to be able to properly manage uncorrectable ECC errors occurring in the system.

In the Cortex-R5 documentation, it is written that the cache can be configured as "Do not generate aborts, force write through, enable hardware recovery" (see chapter 8.5 of Cortex-R5 TRM).

When configured this way, uncorrectable ECC errors in cache are silent and data is reloaded from the L2 memory (in fact, these are not uncorrectable errors...?). As the region operates in write through mode, no line can be dirty so no data in cache can be lost.

That would be perfect to make uncorrectable ECC errors in data cache invisible to software and to increase the system availability (note: in our system, ESM nERROR pin is routed to the system validity logic.).

But then I saw that in the ESM, group 3 channel 9 is generated when "data cache data/tag/dirty RAM fatal errors" are signalled by the Cortex-R5 through the Event bus, that would make the nERROR pin go in fault (in our system, this makes the system unavailable).

Can anyone confirm this statement or explain to me if the ESM group 3.9 is not triggered if the cache is configured "Do not generate abort, force write-through, enable hardware recovery"? As far as I understand, the event bus of the Cortex-R5 will signal all events, regardless of the configuration for aborts, but I'd like the experts to correct me if I'm wrong.

In the case the cortex-R5 will signal the event regardless of the cache abort configuration, is there a way to prevent it to be signalled on the event bus (like an event bus events configuration register)?

I was looking for an configuration of the Cortex R5 and TMS in order to make cache ECC errors silent, but with the nERROR pin connected to our system validity logic, is there one possible?

Thanks.

Gael

over 7 years ago

0 QJ Wang over 7 years ago

TI__Guru**** 199326 points

Hello Gael,

When ECC checking is enabled, hardware recovery is always enabled. When an 1-bit ECC error is detected, the corrected data is then reloaded from the L2 memory system. If an uncorrectable error is detected, an abort is always generated because data might have been lost. It is expected that such a situation can be fatal to the software process running.

If one of the force write-though settings is enabled (CEC=b010, or b110), memory behaves as write-though, and the cache lines can never be dirty, therefore the error can always be recovered.

If the abort is disabled, the SW is not notified of the errors. The Event is still generated and exported to ESM.

How did you generate 2-bit uncorrected cache ECC error (event value 0x60/0x61)? I'd like to do a test on my bench. Thanks

0 Gael Le Moing over 7 years ago in reply to QJ Wang

Expert 1020 points

Hi QJ.

Sorry, that was not clear for me: so you confirm that even if aborts are disabled, the error is still forwarded to the ESM, that will flag a fatal error of group 3?

I didn't generate a fatal ECC error yet, my assumptions are only theoretical based on the documentation. Sooner or later I will seek for doing the testing of the cache ECC feature (I will have to embed a test also in our software).

As far as I can see in the Cortex-R5 documentation, there is a AXI Slave access to cache RAM that can be enabled in the ACTLR (bit 24). With that, we may be able to access the cache data RAM and inject new data when ECC is disabled in order to generate 1-bit errors and 2-bit errors. To be tested.

There is also the DR2B bit in the Secondary Auxiliary Control Register (event if note b indicates that it is available only for parity checking enabled cores: the description of the bit indicates this is intended for when ECC checking is enabled). This can help too, but in that case the documentation doesn't indicate if the error is injected in the data RAM or in the tag or dirty RAM.

If you do some testing, I would be interested in knowing what we can or can't do and how you did it.

Thanks,

0 QJ Wang over 7 years ago in reply to Gael Le Moing

TI__Guru**** 199326 points

Gael,

Yes, when the CPU detects an ECC error (1-bit or 2-bit), it signals this error on a dedicated "Event" bus. This "Event" bus is enabled by setting "X" bit of PMCR register (ARM). The ECC error event exported onto the Event bus is first captured by the EPC module, then generates error signal to ESM module. Even if the abort is disabled (CEC of ACRLR register), the ECC error event is forwarded to ESM.

I will do a test in the next couple weeks.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: Cache ECC and ESM Group 3 channel 9