DRA829J: Overview of ECC coverage

Part Number: DRA829J

Tool/software:

Hi TI-Team,

We are working on a DO-178 project and require a consolidated overview of the SoC components protected by ECC.

The available details in the TRM are fragmented, making it difficult to form a complete picture.

Specifically, we need clarification on:

  • Which SoC components are covered by ECC protection (e.g., memories, interconnects, MMRs, etc.).
  • How these components map to the ECC RAM IDs (ECC aggregators).
  • Whether automatic correction (e.g., hardware write-back after error detection) is implemented for each ECC-protected RAM.

Could you please provide such an overview, or point us to the appropriate documentation, code references, or other resources where this information is available?

Regards,
Dmitry

  • Hi Dmitry,

    Which SoC components are covered by ECC protection (e.g., memories, interconnects, MMRs, etc.).

    The Software Diagnostics Library (SDL) present as a part of the TI RTOS SDK provides the list of ECC aggregators present across the SoC that can be used for ECC protection. You can find this list in the sdl_ecc.h file in sdl/src/sdl/.

    Additionally, the each of the IPs listed in the TRM will call out ECC in their respective sections, if ECC protection is available.

    How these components map to the ECC RAM IDs (ECC aggregators).

    The ECC aggregator is a critical component for enhancing system reliability by protecting memories in various device modules and subsystems. The ECC Aggregator is connected to these memory and interconnect components which have the ECC, to provide access to control and monitor the ECC protected memories in a module or subsystem. The ECC aggregator supports error injection to aid in diagnostic purposes and to evaluate robustness. The Error Injection feature supported by the ECC aggregator allows users to deliberately introduce errors into their system to test the resilience and effectiveness of error correction mechanisms. This feature helps in evaluating the robustness of the ECC aggregator by simulating real-world error scenarios, enabling users to identify potential vulnerabilities and improve their error handling strategies. This feature includes configurable parameters such as ECC aggregator type, subtype (RAM IDs), error type, and injection location, allowing users to customize their testing scenarios based on their specific requirements.

    As stated in the FAQ - [FAQ] TDA4VM: Types of ECC Aggregators, RAM IDs and Error Injection , these RAM IDs or endpoints are of different types and are available in the sdlr_soc_ecc_aggr.h file in sdl/include/soc/$SOC/.

    Whether automatic correction (e.g., hardware write-back after error detection) is implemented for each ECC-protected RAM.

    Single Error Correction (SEC): In the event of single bit errors, the ECC aggregator has the ability to both detect and correct these errors, ensuring the integrity of the data. Single bit error injection is supported for all endpoints and checkers. Single bit error correction is not supported for parity and redundancy type of checkers.


    Double Error Detection (DED): For double bit errors, the ECC aggregator can detect these errors. However, it does not have the capability to correct them. Instead, it signals the presence of these errors to the Central Processing Unit (CPU) via ESM interrupts. This is supported for all RAM wrapper types and interconnect EDC checker type of endpoints. Parity and Redundant checker types do not support double bit ECC error injection.

    Regards,

    Josiitaa

  • Hi Josiitaa,

    I will need to take some time to evaluate Software Diagnostics Library (SDL) and then come back to you if it's sufficient or not. One of the main concern here is documentation availability, which is crucial for projects which consider DO-178.

    Regarding automatic correction (in memory), the main driver of my question if the following extract from PRIUL1D (chapter 8.2.4.1.4):

    It raises the question - which memories are automatically corrected by HW (by writing back corrected data) and which are not. I will appreciate your further clarification here.

    Regards.
    Dmitry

  • Hi Dmitry,

    I will need to take some time to evaluate Software Diagnostics Library (SDL) and then come back to you if it's sufficient or not. One of the main concern here is documentation availability, which is crucial for projects which consider DO-178.

    You can refer to the SDL userguide - https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/sdl/sdl_docs/userguide/j721e/overview.html

    It raises the question - which memories are automatically corrected by HW (by writing back corrected data) and which are not. I will appreciate your further clarification here.

    This description is specific to the DDR SDRAM, which supports inline ECC. This is a unique case where ECC aggregators are not involved.

    Regards,

    Josiitaa

  • Hi Josiitaa,

    Thank you, this seems to be very useful.

    This description is specific to the DDR SDRAM, which supports inline ECC. This is a unique case where ECC aggregators are not involved.

    Do I read it right, that besides DDR, ECC Wrappers / Aggregators correct not only read value, but write it back to correct in memory itself for all other memories, possibly incl. registers and internal FIFOs?

    Regards.
    Dmitry

  • Hi Dmitry,

    Do I read it right, that besides DDR, ECC Wrappers / Aggregators correct not only read value, but write it back to correct in memory itself for all other memories, possibly incl. registers and internal FIFOs?

    When performing error injection for self tests using the ECC aggregators, it is not necessary to recover or repair the injected memory location when error is injected via the ECC aggregator. We’re not actually corrupting the RAM itself. The application callback is sufficient to handle the injected error. For any real time errors, the single bit errors are automatically corrected by TI's ECC mechanism.

    Regards,

    Josiitaa

  • Hi Josiitaa,

    For any real time errors, the single bit errors are automatically corrected by TI's ECC mechanism.

    I still cannot fully get if read data is corrected only or data in memory itself is corrected as well (i.e. corrected value is written back to origin by HW means).

    Reason of this question is very simple. Assume there is a single bit error in a word in SRAM. CPU reads it and corrected value is given to CPU, but HW does not correct bit in memory itself (assumption). Then a 2nd bit get's corrupted in the same word. If the 1st one was not corrected by HW, we're in troubles and having uncorrectable error.

  • Hi Dmitry, 

    I still cannot fully get if read data is corrected only or data in memory itself is corrected as well (i.e. corrected value is written back to origin by HW means).

    Single bit errors are corrected inline, in the memory itself.

    Regards,

    Josiitaa

  • Hi Josiitaa,

    is this valid for all ECC protected components / memories, except DDR SDRAM?

    Regards.
    Dmitry

  • Hi Dmitry, 

    Yes, please refer to individual IP sections in the TRM for further details.

    Regards,

    Josiitaa