TMS570LC4357: ESM errors

Alex Spivak

Part Number: TMS570LC4357
Other Parts Discussed in Thread: TMS570LS3137, , HALCOGEN

Hello,

I'm using TMS570LC43 HDK board with our software that I ported from TMS570LS3137 and I am getting some ESM error when I power up:

1. ESM Group1 bit 47 - ACP d-cache invalid.

2. ESM Group 1 bit 52 - CPU Interconnect Subsystem-Global Error appears sometimes after power up (intermittent).

3. After I connect with the emulator (XDS560V2 STM USB), I get ESM Group 1 bit31 and Group 2 bit 2 (CCM-R5F - self-test and CCM-R5F - CPU Compare error. I do not reset the processor on connect, however the error pops up on connection. I transmit the ESM status registers on CAN and monitor it and I see errors (1) and (2), but not this one, so I know that it is on connect, as I see it on the ESM registers on the debugger.

After those errors are cleared in ESM High Interrupt routine, the code keeps running with no errors.

Any suggestions why I get those errors and how can I avoid them?

Thanks,

Alex

over 6 years ago

0 QJ Wang over 6 years ago

TI__Guru**** 200256 points

Hi Alex,

TMS570LC4357 is ARM Cortex-R5F with D-Cache and I-Cache, but TMS570LS3137 is ARM Cortex-R4F without cache.

On TMS570LC4357 device, the L1 caches, L2 flash, SRAM ,and peripheral memories are ECC-protected.

On TMS570LS3137 device, the flash and SRAM are ECC-protected. The peripheral memories are parity-protected.

The startup code (sys_startup.c) contains the workarounds for several erratas. When you porting code from LS31x to LC43x, please take care of the difference between LS31x and LC43x.

Please refer to the HL_sys_startup.c generated through HALCOGen, and the HL_sys_startup.c used in Safety Diagnostic Lib example.

0 Alex Spivak over 6 years ago in reply to QJ Wang

Intellectual 930 points

Hi Wang,

OK your solution helps to resolve issues (1) and (2), since on startup, esmInit() function is called to clear all the ESM status registers and I don't observe those error anymore. However issue (3) is not a part of startup and I should be able to connect with the debugger without resetting, so I can continue running from wherever the debugger pauses with no errors. Any ideas on how to resolve that?

0 QJ Wang over 6 years ago in reply to Alex Spivak

TI__Guru**** 200256 points

Hello Alex,

The #3 is explained in device errata:

DEVICE#56 nERROR assertion on debugger connect

Severity 4-Low
Expected Behavior No errors should be detected when connecting to the device by JTAG
Issue Sometimes a CPU compare error (ESM Group 2 channel 2) is generated when the debugger connects to device.
Condition Upon a debugger initially connecting to the device.
Implication(s) The nError pin will toggle upon initial connection with the debugger.
Workaround(s) Clear the nERROR by writing 0x5 to the ESMEKR key register in the ESM module and ignore the nERROR pin toggle which happens immediately upon the debugger connecting.

0 Alex Spivak over 6 years ago in reply to QJ Wang

Intellectual 930 points

Hi Wang,

I see now that it is in the Errata. In our hardware we have jumper to disable external watchdog when we debug. We will need now to modify our hardware and add a jumper to disable the nError pin from resetting the processor when we debug. Not convenient, but manageable.

Thanks,

Alex

0 Sunil Oak over 6 years ago in reply to Alex Spivak

TI__Mastermind 49120 points

Alex,

You should also not see this error on connect if the entire SRAM is initialized using the built-in hardware memory initialization method before any RAM access is made (even any stack access).

Also from a functional safety perspective, it is not recommended to wire nERROR to a reset. nERROR is driven active during several diagnostic checks and this should not result in a reset of the processor.

Regards, Sunil

0 Alex Spivak over 6 years ago in reply to Sunil Oak

Intellectual 930 points

Hi Sunil,

The SRAM is initialized as the first step in _c_int00() function. I send out the ESM registers on CAN and there is no problem on power up. The problem appears only after I connect with the emulator without resetting. It looks like the JTAG confuses the CPU compare function and sets the nError active.

So we connected the nError through some timing filter and power up delay to the reset line. In that way the power-up tests are not causing the reset. However, since the nError active state is a critical processor failure, we assumed that the processor hard reset attempt to resolve it, maybe a good idea. What do you suggest we should do otherwise with the nError pin?

Thanks,

Alex

0 Sunil Oak over 6 years ago in reply to Alex Spivak

TI__Mastermind 49120 points

Hi Alex,

If you halt the CPU in debug mode after the CCM self-tests are completed (start them quite early after power-up) then there should be no cause for nERROR, although there will be no CPU compare function after that point.

As for handling nERROR, this depends on the end equipment and the safety function being implemented. Errors detected on-chip that are connected to Error Signaling Module groups 2 and 3 are indicated on nERROR by default. The application can choose selected ESM group 1 errors to also assert nERROR.

One such ESM group 3 error is a double-bit ECC error on read-modify-write access to the CPU SRAM. A group 3 error is not signaled to the CPU (no interrupt) and is only signaled on nERROR. The actual system response to a condition where the one of the CPU-accessed SRAM location has a double-bit ECC error is entirely up to the system designer. This error could be due to permanent faults or transient faults. One response could be to force the device to reset (your current approach), review the ESM status registers to identify an SRAM fault, and run the start-up checks again including CPU SRAM self-test (PBIST). This PBIST would fail in case there is a permanent fault in the SRAM, or pass if the fault was a transient fault. In case of a permanent fault, the application would be stuck in this reset -> nERROR loop, which is probably not a safe state for the system you are working on.

Do you route the nERROR signal back to nPORRST or nRST? Asserting nPORRST would result in most of the error status registers to be reset (ESM group 3 status register is not reset unless there is a real power down/up cycle), so the application won't be able to identify the cause of the previous nERROR.

Hope this helps.

Regards, Sunil

0 Alex Spivak over 6 years ago in reply to Sunil Oak

Intellectual 930 points

Hi Sunil,

I have all CCM self-tests running on startup. It seems not to be related to the nError I get when connecting with the debugger. Interesting that on TM570LS31x the CCMR4 self-test would fail if running with the debugger, but with the TMS570LC4537 all the self-tests are passing with the debugger connected (after resetting with the debugger).

We route nError signal to nPORST and the device that routs it has a register that we can read and see if nError occurred. Also this device resets the processor up to 3 times and then just hold the processor in reset until the next power recycle.

I guess, since there is a documented errata on this issue we can live with it, as we will need to updated our hardware anyway.

Thanks,

Alex

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: ESM errors