This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS3137 and ESM 3.7 error

Other Parts Discussed in Thread: TMS570LS3137, TMS570LS1227

Hello, 

can you describe better ESM 3.7 microcontroller behavior?
See to SPNS162C chaper 6.19 Reset / Abort / Error Sources and Table 6-35. ESM Channel Assignments

I found that it can be caused by unprogrammed part in flash and prefetch unit. (see to https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/228184)

But there is still few serious problems:

  • When I try to read from uninitialized flash it cause ESM 2.4, not ESM 3.7. Why? Where is difference?
  • Documentation doesn't describe what is FMC. Flash memory controler?
  • Documentation doesn't describe what is FMC Bus1 and Bus2. It is not described on datasheet (SPNS162C) TRM (SPNU499B) and ARM documentation (ARM DDI 0406C.b; ARM DDI 0363E and others). Can you describe it better?
  • How to catch this error? It activate external nERROR signal, but nothing inside MCU (abort etc.)
  • Main problem is documentation. It is MCU designed for the safety. And we need write something about this into own approval documentation about this case, but we don't have material in TI documents (what is bus1 bus2 etc)

Have a nice day,

Jiri

  • We are in the process of updating the TRM for the TMS570LS3137. The TRM for the TMS570LS1227 is more up to date and the flash wrapper is the same. The address parity error (ESM2.4) means that the address coming from the CPU to the flash had a parity error. The flash uncorrectable error (ESM3.7) means the data read from the flash had a multi-bit error.
  • And the only way, how the application can detect this error is by polling the ESMEPSR register?

  • Thanks for information, but it is little bit opposite to observation.

    When I try to read (LDR or LDM instruction) flash without correct ECC it throw ECC 2.4 error. This is clear uncorrectable ECC data error, nothing related "address". We try it.

    This is quote from TMS570LS1227 datasheet for ESM 2.4:
    "FMC - uncorrectable address parity error on accesses to main flash"

    And this is for ESM 3.7:
    "FMC - uncorrectable ECC error: ATCM and Flash OTP interfaces (does not include address parity error and errors on accesses to Bank 7 data memory)"

    And one point is clear: We can't have one sector without correct ECC.

  • I created a project that did not program the ECC of the unused flash. That code then did an intentional read of an unprogrammed location in the ATCM. The result was a data abort with ESM group 3 channel 7 bit set. What exactly are you doing that creates the ESM2.4 error?
  • Thanks for support. I must apologies little bit. You are right, access to unprogrammed memory generate ESM 3.7 error.

    But it have one significant difference. When I try to access to unprogrammed flash, MCU set ESM 3.7 and  force abort at the same time (data abort or prefetch abort depending to type of access). But we have cases with ESM 3.7 without abort. 

    And it make question where is difference between "FMC uncorrectable error - Bus1 accesses User/Privilege Abort (CPU), ESM => 3.7 (does not include address parity error)" and "FMC uncorrectable error - Bus2 accesses (does not include address parity error and EEPROM bank User/Privilege ESM => nERROR 3.7 accesses)"

    Documentation for TMS570LS1227 don't contain option to activate ESM3.7 without MCU abort. Documentation for TMS570LS3137 contain this option, see to SPNS162C Table 6-36. Reset/Abort/Error Sources. 

  • A read of a flash location in the ATCM space will generate ESM 3.7. If that read is used by the CPU, it will generate an abort. The Cortex R4 has the ability to initiate a read in anticipation that the value will be used by the CPU. (It looks at the instructions in the pipeline before they get to the execute stage.) If this speculative read has an uncorrectable ECC error, ESM 3.7 is set. If the value read is not used by the CPU (perhaps it was a conditional load instruction and in the execute stage the condition was not true) then no abort is generated. Therefore we recommend that all of the main flash (ATCM space) have proper ECC programmed.

    The bus2 accesses are the reads of OTP or direct reads of the ECC for banks 0 or 1.
  • Thanks for the information.
    This means, that we need to solve situation when flash is fully programmed, but fails on speculative read. In this case, MCU activates nERROR signal and this signal forces to set remaining HW to "safe" state.
    Because we need to know it in SW, we have three choices:
    1) route this signal directly on indirectly back to MCU interrupt pin (sometimes one more lost pin)
    2) polling of ESMEPSR register (sometimes could be too slow)
    3) don't use nERROR signal to set HW into "safe" state. Impossible, it is in conflict with "safe" design.

    It does not have clear solution. There does exist some another source of nERROR activation without interrupt or abort routing inside MCU?

    Best regards,
    Jiri Dobry
  • If you have programmed the entire program flash with proper ECC, speculative fetches will not cause an ESM 3.7 error.

    All group 3 errors will activate nERROR with no guarantee of an abort or interrupt since these errors are severe enough that you can no longer count on proper operation of the CPU.
  • Bob I must update your statement little bit because this sentence: "If you have programmed the entire program flash with proper ECC, speculative fetches will not cause an ESM 3.7 error. " Is correct only without HW failure.

    I understand that after layer 3 error MCU not quarantine functionality. Problem is when SW continue without knowledge about this situation. This code normally refresh internal and external WDT. Anything else (including reset) is better than this situation. 

  • You are correct that my statement only applies if there is no hardware failure. A safe system must be able to go to a safe state based on the nERROR signal without proper CPU operation. Take for example the single point failure of the flash pump read voltage. If it fails, you will get an uncorrectable error on instruction execution, the CPU vectors to an abort routine and gets another uncorrectable error. Even the abort routine cannot be executed. Since it is an uncorrectable error, the nERROR pin will activate.