TMS570LC4357:FLASH ECC Problem

Part Number: TMS570LC4357


Tool/software:

Hi, Gundavarapu

I have several questions about the ecc functionality of flash.

1. flash ecc is an error correction mechanism for flash data inversion, are spi ecc, sci ecc is the same?

2. flash ecc can be tested by the following method, which describes the CPU error handler can catch the error, could you please explain how to know ECC error occurs? Only by reading the FEDAC_GBLSTATUS register?

 

3. When an ECC error occurs, how do you know where it occurred?

  • Hi Sam,

    1. flash ecc is an error correction mechanism for flash data inversion, are spi ecc, sci ecc is the same?

    No, they are not the same.

    Flash ECC is applicable for all the flash memory available in the controller.

    As you can see there is 4MB flash in the device right, so this 4MB flash will has its corresponding ECC memory to store the ECC of the written data. And this ECC will act as verification bits for your data. That means if any corruption occurs in the data, then we will get events.

    Whereas SPI ECC or SCI ECC is specific to the corresponding peripheral.

    For example, if you verify MibSPI there will be a RAM to store the data received or data to be transmitted right, so this memory will also have ECC validation.

    2. flash ecc can be tested by the following method, which describes the CPU error handler can catch the error, could you please explain how to know ECC error occurs? Only by reading the FEDAC_GBLSTATUS register?

    Not only registers, you will also get ESM events for flash errors based on type of the error.

    For Correctable errors in flash you will get ESM 1.4:

    For Uncorrectable errors you will get ESM2.3 or ESM3.13

    You can download our diagnostic library from below link:

    SAFETI_DIAG_LIB Driver or library | TI.com

    In this diagnostic library you can find example codes for all the diagnostics in the device.

    For flash related diagnostics, you can find with below API name:

    SL_SelfTest_Flash

    3. When an ECC error occurs, how do you know where it occurred?

    EPC module will help to capture the address of the ECC errors. Please refer this chapter once for more details.

    --
    Thanks & regards,
    Jagadish.

  • Hi,jundavarapu

    My goal is to test the functionality of FLASH ECC. If a single-bit error or double-bit error occurs, I want to know where it happened and then record it (single-bit errors need to be counted).

    First, I need to simulate FLASH ECC errors. Based on your reply in another forum, I downloaded your FEE_ECC_Errors_TEST_LC4357.zip project.

    I have some questions about this project that I don't quite understand.

    -------------------------------------------- TEST DIAGMODE = 7------------------------------------------

    As shown in the figure below, I executed the code at the horizontal line in the figure.

    Since mode 7 is used, as expected, PortAErrStat should not be 0, the ESM Stat1 register should not be 0, Group1_Error should be set, and the ERROR light should not be on, but the result is the opposite, as shown in the figure below.

    Can you explain what this phenomenon is all about?

    Also, what are the concepts of PoartA and PoartB in the figure below?

    What problems will arise if the ECC error interrupt for Port B cannot be triggered?

    ---------------------------------------------TEST DIAGMODE = 5----------------------------------------------

    Perform the self-test shown in the figure below.

    The results are shown below.

    The selected register states are all as expected, but I noticed that regardless of whether the parameter passed is FLASH_ECC_TEST_MODE_1BIT or FLASH_ECC_TEST_MODE_2BIT, the Group2Error variable is set. Why is the Group1Error variable not set when the parameter passed is FLASH_ECC_TEST_MODE_1BIT?
    In mode 5 diagnostic mode, is the address tag error in Group3 always triggered?

  • Hi,jundavarapu

    Please reply as soon as possible, thank you.

  • Hi Sam,

    Apologies for the delayed response, i was off for few days due to personal work and didn't get time to work on this issue.

    Are you still stuck with this issue?

    --
    Thanks & regards,
    Jagadish.

  • Hi Gundavarapu,

    My question is still not solved, please help me answer, thank you very much.

  • Hi Sam,

    I think you downloaded the code from below link right:

    (+) TMS570LC4357: Problems when trying to run diag mode 7 for Port A and B for 1Bit and 2bit error injection - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

    Did you download this code only?

    If yes, then as mentioned in this comment this code is only working for port-A. So, can you please try for port-A first?

    --
    Thanks & regards,
    Jagadish.

  • I executed the code as shown below.

    The register values are as follows:

    The values are as follows:

    Is this right?Why doesn't a 1bit ecc error trigger an interrupt for group 1?

    In addition, please help to introduce the concept of POART_A and POART_B, thank you.

  • Hello, please reply as soon as possible, thank you

  • Hi Sam,

    My apologies for the delayed response, i was off for few days and didn't get time to work on your issue.

    Are you still facing this issue?

    --
    Thanks & regards,
    Jagadish.

  • Yes, please help me with that. Thank you.

  • Hi Sam,

    Apologies for the delayed response!

    In addition, please help to introduce the concept of POART_A and POART_B, thank you.

    Each of these ports corresponds to one Flash interface to the memory bank. Both are used internally by different masters (CPU and DMA etc.)

    You're testing CPU flash read ECC correction then use Port A (DIAG_BUF_SEL = 0x0)

    You're testing DMA flash read ECC behavior then use port B (DIAG_BUF_SEL = 0x4).

    So, in our example it would always better to use port A as we are reading memory area using CPU only.

    --
    Thanks & regards,
    Jagadish.

  • Hello,Gundavarapu

    Thank you for your reply about the concept of port A and port B.

    As shown in my operation, I executed FLASH_ECC_TEST_MODE_1BIT test on port A. According to your previous reply, the interrupt of channel 4 of group 1 should be triggered. May I ask why I see Group2Error set here?

  • Hi Sam,

    Apologies for the delayed response. Due to lot of other priority issues i didn't get time for this issue to work on it.

    As shown in my operation, I executed FLASH_ECC_TEST_MODE_1BIT test on port A. According to your previous reply, the interrupt of channel 4 of group 1 should be triggered. May I ask why I see Group2Error set here?

    You are right about it; this should not trigger group2 errors. Is it possible for you to share your code to quickly verify and test it on my end?

    --
    Thanks & regards,
    Jagadish.

  • Hi

    I've attached the code—it's actually your code—and would appreciate it if you could take a look. Thank you.


    FEE_ECC_Errors_TEST_LC4357_1127.rar

  • Hi, Gundavarapu

    Could you please respond as soon as possible? Thank you.

  • Hi Sam,

    I understood, why you are getting ESM2.3 error in your code. It is because of improper ECC values in flash, 

    Refer below thread once:

    (+) TMS570LC4357: Uniflash tool could not erase the flash memory correctly. - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

    So, this is the reason when we read any flash area it is triggering, Uncorrectable ESM flag that is ESM2.3.

    So, what i did is that i unchecked Auto ECC generation option in CCS.

    And i generated ECC using Linker CMD file for entire flash, including unused areas of the flash:

    After doing this modification, if i trigger 1-bit ECC error then i am only getting ESM1.4 not ESM2.3 anymore.

    I am attaching tested project at my end here, please go through it.

    7024.FEE_ECC_Errors_TEST_LC4357.zip

    --
    Thanks & regards,
    Jagadish.

  • Hi, Gundavarapu

    Please attach your new code (FEE_ECC_Errors_TEST_LC4357.rar) so I can investigate whether subsequent changes you made caused the discrepancy between our results.

  • https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/908/FEE_5F00_ECC_5F00_Errors_5F00_TEST_5F00_LC4357.7z

  • Hi, Gundavarapu

    1. Regarding the issue of failing to trigger the 1-bit error correction interrupt

    The reason is that the function was not called:epcEnableSERREvent();

    2. Does a 2-bit error cause an abort?

    3. Are the interrupts triggered by a 2-bit error limited to the contents shown in the figure below?

    4. I called the function SL_SelfTest_Flash(FLASH_ECC_TEST_MODE_2BIT, TRUE, &failInfoFlash, Port_Var), and Group2Error was set. How can I obtain the error address? According to the manual, the error involves the following registers. However, in practice, the values of these registers are 0. Why is that?

  • Hi Sam,

    2. Does a 2-bit error cause an abort?

    Yes, a 2-bit (uncorrectable) Flash ECC error typically causes a CPU abort.

    The behavior depends on whether the error occurs during:

    • Instruction fetch (prefetch): Results in a Prefetch Abort exception
    • Data read: Results in a Data Abort exception
    3. Are the interrupts triggered by a 2-bit error limited to the contents shown in the figure below?

    The figure indicates Group2 channel 3 for "Cortex-R5F Core - All fatal bus error events [Commonly caused by improper or incomplete ECC values in Flash]" and Group3 channel 13 for "L2FMC - uncorrectable error." Both could potentially be triggered depending on where the error is detected in the system.

    4. I called the function SL_SelfTest_Flash(FLASH_ECC_TEST_MODE_2BIT, TRUE, &failInfoFlash, Port_Var), and Group2Error was set. How can I obtain the error address? According to the manual, the error involves the following registers. However, in practice, the values of these registers are 0. Why is that?

    The flash ECC error address registers may be automatically cleared after being read or after the error status is cleared. From the documentation, there's an ERR_STATUS_CLR register (Error Status Clear Register) that can clear error status.

    Try to read registers immediately in the ESM interrupt handler before any clearing operations.

    --
    Thanks & regards,
    Jagadish.

  • Hi, Gundavarapu

    1. The `SL_SelfTest_Flash(FLASH_ECC_TEST_MODE_2BIT, TRUE, &failInfoFlash, Port_Var)` function simulates a 2-bit error, but it remains stuck in a `while` loop and doesn't trigger an abort exception. What could be the reason for this?

    2. I looked at the manual again carefully.I discovered the following:

    2.1 What do "interconnect and RAM IP modules" mean? Does it imply that the content captured by the EPC does not include flash ECC 2-bit errors?

    2.2 Could you please explain what the following terms described in section 2-4 mean: CPU0 Correctable Error, CPU SCR Correctable ECC for PS_SCR_M I/F, CPU SCR Correctable ECC for DMA I/F, and L2RAMW RMW Correctable Error?

    3. What is the principle behind writing 0x45 to the FEMU_ECC register to simulate a 2-bit flash ECC error?

    4. The screenshots below show the register contents that changed when I performed the 1-bit and 2-bit simulations. Which register are you referring to when you mention the ERR_STATUS_CLR register?

    Registers that change due to a 1-bit error:

    Registers that change due to a 2-bit error:

    5. Additionally, I found that the FUNCERRADD register is not declared or described in the manual.

  • Hi Sam,

    1. The `SL_SelfTest_Flash(FLASH_ECC_TEST_MODE_2BIT, TRUE, &failInfoFlash, Port_Var)` function simulates a 2-bit error, but it remains stuck in a `while` loop and doesn't trigger an abort exception. What could be the reason for this?

    Flash 2-bit ECC errors may not directly cause an abort exception always

    • On Hercules devices like the TMS570LC4357, a 2-bit Flash ECC error typically triggers an ESM (Error Signaling Module) event rather than directly causing a data abort exception
    • The error handling path is Flash ECC error → ESM → Potential NMI or error pin assertion

    It will definitely trigger ESM event and error pin assertion, but abort depends on many factors based exactly where error occurring.

    2.1 What do "interconnect and RAM IP modules" mean? Does it imply that the content captured by the EPC does not include flash ECC 2-bit errors?

    "Interconnect and RAM IP modules" refers to:

    • Interconnect: The internal bus interconnect that connects different modules (CPU, DMA, peripherals) within the SoC
    • RAM IP modules: The various RAM blocks in the device (SRAM etc.) that have their own ECC protection

    Yes, the EPC (Error Profiling Controller) does NOT capture Flash ECC 2-bit errors directly. The statement in the manual indicates that the EPC specifically captures uncorrectable faults from "interconnect and RAM IP modules".

    2.2 Could you please explain what the following terms described in section 2-4 mean: CPU0 Correctable Error, CPU SCR Correctable ECC for PS_SCR_M I/F, CPU SCR Correctable ECC for DMA I/F, and L2RAMW RMW Correctable Error?
    1. CPU0 Correctable Error: Single-bit ECC errors detected and corrected during CPU0 operations (instruction fetch or data access)

    2. CPU SCR Correctable ECC for PS_SCR_M I/F:

      • SCR likely refers to "Scratch RAM" or "Secondary Cache RAM"
      • PS_SCR_M I/F refers to the Master interface to this memory
      • This captures correctable ECC errors on the CPU's access to this specific memory interface
    3. CPU SCR Correctable ECC for DMA I/F:

      • Similar to above, but for DMA (Direct Memory Access) controller's access to the SCR memory
      • Captures correctable ECC errors when DMA reads/writes to this memory
    4. L2RAMW RMW Correctable Error:

      • L2RAMW: Level 2 RAM Wrapper
      • RMW: Read-Modify-Write operation
      • This captures correctable ECC errors during read-modify-write cycles to L2 RAM
    3. What is the principle behind writing 0x45 to the FEMU_ECC register to simulate a 2-bit flash ECC error?

    The main aim is to set two or more bits errors in either ECC or data.

    I created one excel for ECC calculations on this device.

    0028.ECC_calculation_Big_Endian_Devices(LC4357).xlsx

    If you provide address data values, then it will give corresponding ECC value.

    You can verify our inputs here and compare ECC and data with these excel.

    4. The screenshots below show the register contents that changed when I performed the 1-bit and 2-bit simulations. Which register are you referring to when you mention the ERR_STATUS_CLR register?

    The ERR_STATUS_CLR register is part of the Flash ECC/Error Log Registers block. Here's the specific information:

    For TMS570LC4357 (similar to dual-core devices with M3 and C28x):

    Flash ECC Error Status Clear Register (ERR_STATUS_CLR):

    • M3 side offset: 0x400FA614 (base 0x400FA600 + offset 0x14)
    • C28x side offset: 0x4300 + 0xA (word offset)
    • Size: 4 bytes (32-bit register)
    • Type: R/W0-1 (Read/Write, writing 0 has no effect, writing 1 clears the bit)
    • Protection: MWRALLOW (M3) / EALLOW (C28x)

    Purpose: This register is used to clear the error status flags in the ERR_STATUS register, including:

    • FAIL_0_CLR (bit 0): Clear the FAIL_0 flag
    • FAIL_1_CLR (bit 1): Clear the FAIL_1 flag
    • UNC_ERR_CLR (bit 2): Clear the uncorrectable error flag

    In your screenshots: Looking at your register dump, the ERR_STATUS_CLR register should be in the Flash wrapper register space. The registers you're seeing like:

    • EpcCntrl (0x0000050A) - EPC Control Register
    • CamAvailStat (0x0000001F) - CAM Index Available Status
    • CAM_Content0 (0x00020008) - This appears to be capturing an error address

    These are EPC (Error Profiling Controller) registers, not Flash wrapper registers. The ERR_STATUS_CLR is in a different register block specifically for Flash ECC control.

    5. Additionally, I found that the FUNCERRADD register is not declared or described in the manual.
    1. Undocumented Flash Wrapper Register:

      • This could be an internal Flash wrapper register that captures the Function Error Address
      • "FUNC" might stand for "Function" or "Functional"
      • It may be a silicon-specific register not fully documented in public TRMs

    --
    Thanks & regards,
    Jagadish.

  • Hi, Gundavarapu

    Since the EPC module cannot capture the address where the FLASH ECC 2-bit error occurred, and I haven't found the definition for the ERR_STATUS_CLR register, is there another way to determine the address where the 2-bit error happened? (When a 2-bit error occurs, the FUNCERRADD register appears to be 0.)

  • Hi Sam,

    As per my knowledge,

    No there is no hardware supported way to determine exact flash address of a 2-bit ECC error.

    For flash 2-bit ECC errors, the EPC does not latch the failing address.

    I am again checking with my senior colleague   for confirmation. I am waiting for his replay, i will confirm again based on his inputs.

    --
    Thanks & regards,
    Jagadish.

  • Hi, Gundavarapu

    Thank you.

    Suppose I have a module that reads the entire Flash data by traversing it.

    The occurrence of 2-bit errors can be categorized into two scenarios:

    1. The error occurred in the code section.

    1.1 Does traversing the flash module and executing code at this point both cause 2-bit errors to occur?

    1.2 After an error occurs, it may result in data address anomalies or generate erroneous interrupts.Is that correct?

    2. The error occurred elsewhere.

    2.1 At this point, only by traversing and reading the flash module can a 2-bit error be triggered.

    2.2 After an error occurs, it will only trigger an interrupt.

    Is my above statement correct?

    note:The code area refers to the region within the chip's internal flash memory where multiple application programs are stored. The code area specifically denotes the currently active flash region, while the other regions contain the other application programs that are not currently running.

  • Hi Sam,

    Is my above statement correct?

    Most of your understanding is correct, however i want to clarify little further:

    It is either data or code, the 2-bit error will be triggered like below:

    Whenever core try to fetch either data or instruction from memory, the ECC validation will get performed. I mean it will read ECC along with data (that includes instructions as well) in memory, now it will give this ECC and data to the SECDED module for error detections like below.

    So, after verification it will generate ESM interrupts and ESM flags based on the corresponding ECC error.

    --
    Thanks & regards,
    Jagadish.

  • Hi, Gundavarapu

    Thank you for your explanation.

    Next, please have your colleague QJ Wang explain whether it is possible to capture the address where the Flash ECC 2-bit error occurred.

  • Next, please have your colleague QJ Wang explain whether it is possible to capture the address where the Flash ECC 2-bit error occurred.

      could you please confirm this from your end?

  • No, the flash address where 2-bit ECC error occurs is not captured

  • Hi,QJ Wang

    When a 2-bit error occurs, can we determine which sector it originated from?

  • As far as I knew, we don't know which flash sector causes the problem.