This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137: ESM interruption

Part Number: TMS570LS3137

Tool/software:

Hello, our chip triggered the ESM interrupt during operation. The trigger source is RAM even bank (B0TCM). The safety manual requires that the system must be shut down when there is an abnormal memory self-check. May I ask if there are any hardware or software solutions that can avoid or reduce the abnormal memory self-check? What are the possible causes of memory self-check anomalies?

  • Hi Liu,

    Can you please try to clear the ESM error once and see whether it is occurring or not after clearing, i mean is it one time or repeating:

    You can clear it by following below e2e:

    (+) [FAQ] TMS570LC4357: How to set ESM error pin to High - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

    --
    Thanks & regards,
    Jagadish.

  • Since this fault occurred during the on-site operation of the product, we are not allowed to perform a clear operation. However, the board card can operate normally after a restart, so it is highly likely that it only occurred once. Now we want to know how to avoid this problem.

  • Hi Liu,

    We never come across this issue before.

    Which ESM error you are getting i mean RAM single bit error or double bit error.

    And also, we need to verify these address registers for single bit or double bit based on the error.

    If we find the address, then we can check in map file and can verify what is the condition to access the corresponding address. We can write a code and can access and see if that corresponding location really corrupted or not. If it is corrupted at hardware level, then we should get ESM error for every access of the corresponding address.

    --
    Thanks & regards,
    Jagadish.

  • It is a single bit error because when we read the register, we see the channel that triggers the interrupt as shown in the following figure

    This error cannot be reproduced. Repowering the chip will allow it to operate normally

  • Hi Liu Peng,

    Apologies for the delay, it is difficult to conclude exact root cause for this kind of issues, however i used our (TI) internal AI tool to find possible root causes and solutions and i got some useful information. So, please verify this information once.

    1. Hardware Solutions:
    • Use DMA for memory scrubbing to detect and correct single-bit errors efficiently (1)
    • Enable the IP1ECCERREN register's IP1_ECC_KEY field to properly record single-bit errors in the EPC module when using DMA (1)
    • Ensure proper memory protection and access configurations in the MPU settings (1)
    1. Software Solutions:
    • Implement periodic memory scrubbing using either CPU reads or DMA transfers to detect errors early (1)
    • Enable Error Detection and Correction (EDAC) mode with proper configuration:
      • Set single-bit error threshold appropriately (e.g., threshold = 1 for immediate notification) (1)
      • Enable Error profiling (1)
      • Configure proper error detection and correction settings in the control registers (1)
    1. Possible Causes of Memory Self-Check Anomalies:
    • Hardware-related:
      • Power supply issues affecting memory stability (2)
      • Environmental factors (temperature, electromagnetic interference)
      • Physical memory cell degradation
    • Software-related:
      • Improper memory protection configurations (1)
      • Cache-related issues (as seen in some cases where disabling cache resolved memory issues) (3)
      • Incorrect ECC configuration settings (1)
    1. Preventive Measures:
    • Implement proper error handling routines for both single-bit and double-bit errors
    • Use SECDED (Single Error Correction, Double Error Detection) capabilities (1)
    • Regular memory scrubbing to detect and correct errors before they accumulate (1)
    • Proper system initialization and configuration of error detection mechanisms (1)
    1. Monitoring and Debugging:
    • Use the EPC (Error Profiling and Control) module to track error occurrences (1)
    • Monitor ESM flags for both single-bit and double-bit errors (1)
    • Implement proper error logging and reporting mechanisms

    For your specific case where the safety manual requires system shutdown on abnormal memory self-check, you should:

    1. Configure the ESM module to properly detect and report errors
    2. Implement periodic memory scrubbing using DMA for efficiency (1)
    3. Set appropriate error thresholds and enable error profiling (1)
    4. Implement a proper shutdown sequence when critical errors are detected

    --
    Thanks & regards,
    Jagadish.