TMS570LS3137: ESM interruption

liu peng

Tool/software:

Hello, our chip triggered the ESM interrupt during operation. The trigger source is RAM even bank (B0TCM). The safety manual requires that the system must be shut down when there is an abnormal memory self-check. May I ask if there are any hardware or software solutions that can avoid or reduce the abnormal memory self-check? What are the possible causes of memory self-check anomalies?

9 months ago

0 jagadish gundavarapu 9 months ago

TI__Guru* 77156 points

Hi Liu,

Can you please try to clear the ESM error once and see whether it is occurring or not after clearing, i mean is it one time or repeating:

You can clear it by following below e2e:

(+) [FAQ] TMS570LC4357: How to set ESM error pin to High - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

--
Thanks & regards,
Jagadish.

0 liu peng 9 months ago in reply to jagadish gundavarapu

Prodigy 70 points

Since this fault occurred during the on-site operation of the product, we are not allowed to perform a clear operation. However, the board card can operate normally after a restart, so it is highly likely that it only occurred once. Now we want to know how to avoid this problem.

0 jagadish gundavarapu 9 months ago in reply to liu peng

TI__Guru* 77156 points

Hi Liu,

We never come across this issue before.

Which ESM error you are getting i mean RAM single bit error or double bit error.

And also, we need to verify these address registers for single bit or double bit based on the error.

If we find the address, then we can check in map file and can verify what is the condition to access the corresponding address. We can write a code and can access and see if that corresponding location really corrupted or not. If it is corrupted at hardware level, then we should get ESM error for every access of the corresponding address.

--
Thanks & regards,
Jagadish.

0 liu peng 9 months ago in reply to jagadish gundavarapu

Prodigy 70 points

It is a single bit error because when we read the register, we see the channel that triggers the interrupt as shown in the following figure

This error cannot be reproduced. Repowering the chip will allow it to operate normally

+1 jagadish gundavarapu 8 months ago in reply to liu peng

TI__Guru* 77156 points

Hi Liu Peng,

Apologies for the delay, it is difficult to conclude exact root cause for this kind of issues, however i used our (TI) internal AI tool to find possible root causes and solutions and i got some useful information. So, please verify this information once.

Hardware Solutions:

Use DMA for memory scrubbing to detect and correct single-bit errors efficiently (1)
Enable the IP1ECCERREN register's IP1_ECC_KEY field to properly record single-bit errors in the EPC module when using DMA (1)
Ensure proper memory protection and access configurations in the MPU settings (1)

Software Solutions:

Implement periodic memory scrubbing using either CPU reads or DMA transfers to detect errors early (1)
Enable Error Detection and Correction (EDAC) mode with proper configuration:
- Set single-bit error threshold appropriately (e.g., threshold = 1 for immediate notification) (1)
- Enable Error profiling (1)
- Configure proper error detection and correction settings in the control registers (1)

Possible Causes of Memory Self-Check Anomalies:

Hardware-related:
- Power supply issues affecting memory stability (2)
- Environmental factors (temperature, electromagnetic interference)
- Physical memory cell degradation
Software-related:
- Improper memory protection configurations (1)
- Cache-related issues (as seen in some cases where disabling cache resolved memory issues) (3)
- Incorrect ECC configuration settings (1)

Preventive Measures:

Implement proper error handling routines for both single-bit and double-bit errors
Use SECDED (Single Error Correction, Double Error Detection) capabilities (1)
Regular memory scrubbing to detect and correct errors before they accumulate (1)
Proper system initialization and configuration of error detection mechanisms (1)

Monitoring and Debugging:

Use the EPC (Error Profiling and Control) module to track error occurrences (1)
Monitor ESM flags for both single-bit and double-bit errors (1)
Implement proper error logging and reporting mechanisms

For your specific case where the safety manual requires system shutdown on abnormal memory self-check, you should:

Configure the ESM module to properly detect and report errors
Implement periodic memory scrubbing using DMA for efficiency (1)
Set appropriate error thresholds and enable error profiling (1)
Implement a proper shutdown sequence when critical errors are detected

--
Thanks & regards,
Jagadish.