TMDX570LC43HDK: Error management

Christopher Garcia

Part Number: TMDX570LC43HDK
Other Parts Discussed in Thread: TMS570LC4357

We are using TMS570LC4357 microcontroller for space applications with very hard constraints. One of the most complex points we must accomplish is what is called "last-survivor policy", this means that in case of anomalies detection (both hardware and software) during flight, the equipment shall notify it immediately and shall try to continue, as far as possible, carrying on its tasks, data processing and functionality. For our project, to be compliant with this requirement, we use the ESM module for error detection but we have to decide what recovery action has to be performed. For most of the cases, we just get the error data and send it to another subsystem via telemetry, but there are some cases that we don't know which is the correct decision. We plan to implement two alternatives:

Try to continue as other error sources or
Try to perform a recovery action, at least leave the system in a well-known state

In particular, we don't know how to proceed with the following error signals:

ACP d-cache invalidate;ESM Group 1;Channel - 47
CPU Interconnect Subsystem
Global error;ESM Group 1;Channel - 52
CPU Interconnect Subsystem - Global Parity Error; ESM Group 1; Channel - 53
Cortex-R5F Core - All fatal bus error events; ESM Group 2; Channel - 3
EPC - Uncorrectable Error;ESM Group 2;Channel - 21
DMA - ECC uncorrectable error;ESM Group 1;Channel - 3
EMIF 64-bit Bridge I/F ECC uncorrectable error;ESM Group 1;Channel - 84
L2FMC - parity error;ESM Group 2;Channel - 17
L2FMC - double-bit ECC error-error due to implicit OTP reads; ESM Group 2; Channel - 19
L2FMC - uncorrectable error;ESM Group 3;Channel - 13
L2RAMW - Uncorrectable error type B;ESM Group 2;Channel - 7
L2RAMW - double-bit ECC uncorrectable error;ESM Group 3;Channel - 3
L2RAMW - Uncorrectable error Type A;ESM Group 3;Channel - 14
L2RAMW - Address/Control parity error;ESM Group 3;Channel - 15
VIM RAM - ECC uncorrectable error;ESM Group 1;Channel - 15

Once an error is triggered, Can we retry to access the same operation successfully (especially on cache and transactions errors)? or is this error permanent? Can we try to write a known pattern data (for example 0x0000) at the data fault address so the next time this address is accessed we assure that the value is correct? Is there any other alternative?

Thanks in advance,

Christopher García Ambrozaitis

over 4 years ago