This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6422: When injected a DED error into L2 Data Cache of A53, an "Unhandled Exception in EL3" will cause the system to stop running.

Part Number: AM6422

Hello, experts...

I followed your suggestions to inject the DED error into L2 Data Cache for the purpose of functional safety self-test. The simple steps look like these:

1. Set A53SS ECC aggregator in force_n_row mode on vectors of L2 Data Cache.

2. Read or write data from external memory (DDR) to trigger an event.

However, even if I only turned on that mode in step 1, an "Unhandled Exception in EL3" prompt would pop up, and the system would stop running.

I want to know if the EL3 event is a normal phenomenon? If it is inevitable, how can I make my system return to normal after this?

  • I suspect this is expected behavior. A double bit error in A53 L2 cache is not recoverable, it is detect only. The L2 cache is one shared monolithic resource, the double bit error detection is just able to detect there are not mechanisms to roll back to the moment in time before the error was detected.

      Pekka

  • Here is a summary from Arm:

    Asynchronous external aborts are typically fatal, as the memory location which caused the abort is unknown. That is, the memory system now contains a corrupted memory location that can be consumed at a later point. 

    This being said, if the Asynchronous external abort correlated to a 2-bit ECC error in the L2 data RAM, software, in theory software could correlate the information from the L2MERRSR_EL1 with corresponding L2 Tag data to calculate the physical address of the corrupted memory location. If this physical address can be correlated to an application, then software could terminate that particular application. 

    If no correlation can be made, then the state of the system is no longer valid and the error is likely considered fatal. 

    What fatal means would be specific to your end application, this may be a system reset. 

      Pekka