This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4AH-Q1: About "SError" CPU exception

Part Number: TDA4AH-Q1
Other Parts Discussed in Thread: TDA4VH

Tool/software:

When running our proprietary OS on TDA4AH/TDA4VH, CPU exception often occurs.
The value of Exception Syndrome Register (ESR) is 0xBF000002, I recognized that the reason of CPU exception is SError.
AArch64 ESR decoder

In the case of SError, the information in the ISS field of ESR register is IMPLEMENTATION DEFINED.
As far as I checked the Technical Reference Manual for Cortex-A72, the contents of the ISS field indicated "Slave error".
ARM Cortex-A72 MPCore Processor Technical Reference Manual r0p3


[Questions]
1) When the "SError" CPU exception occured, is there any IMPLEMENTATION DEFINED information that TDA4AH/TDA4VH records in the ISS field of ESR register?
2) If the answer to 1) is "No", the information in the ISS field is considered to conform to the Cortex-A72 specification(= Slave error occurs).
    - Which devices or peripherals does "Slave" refer to?
    - Are there any registers that should be checked when investigating the Slave error?

Thanks.

  • Hello, 

    Thanks for your question. We are assigning this to our expert.

    Regards,
    Sarabesh S.

  • Hello,

    Scanning the information in this thread I suspect the decoder you are referencing does not have the full information to give a proper decoding.  The decoder says that bit 24 is IDS (for implementation specific) however the A72 only defines a couple implementation specific errors around instructions and these do not map to what was provided. To fully decode an ESR more context is needed.  If I use a different decoder it believes that bit 24 should be ISV (Instruction syndrome valid).  Even in this case I don't think enough info is available to the decoder but I think its a better guess then what the web decoder gave.  Looking at it I'd tend to guess the L2 memory system tables might have an issue.

     

    I do often run Linux and QNX on TDA4VH and I do not see this error often.  I'd guess something in your MMU or memory setup may be not per specification or if its a custom board perhaps there is some kind of voltage issue.

    I would suggest running something like ETM hardware processor trace and look at the activity around the abort.  If the address range is systematic then turning on and off the PMU counters for memory errors would provide another level of clues.

    It is noteworthy that you should ENSURE your L2 cache memory times are adjusted from their default.  If your custom OS did not do this I can see this a possible root cause (along with some memory attribute misconfiguration).  I recall a customer using a custom OS was missing critical patches in this are.  You should ensure you are setting the L2 data ram timings to 4 (not the default of 2).

    https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/heads/master/plat/ti/k3/common/k3_helpers.S#108

    Also I would strongly recommend enabling ECC for each cluster.

    https://review.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/heads/master/plat/ti/k3/common/k3_helpers.S#117

    A quick way to test is using a JTAG debugger like TRACE32.

    Regards,
    Richard W.