This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6546: ECC Testing

Part Number: AM6546

Hi,

I have been trying to test ECC mechanism for external memory for safety certification purposes. To have a wider coverage, we would like to run this test at runtime on A53 cores on VxWorks RTOS. Below I gave my findings, current test scenerio and questions I have. 

According to, your safety manual (SPRUIJ5) ECC sanity test should be run on R5F (safety) cores and I designed and implemented as adviced application report (SPRACM1). It works on happy path like:

"Start-> Enable ECC W/O poisoning -> Prime DDR -> Check if there is failure -> restart -> Enable ECC with 1bit poisoning -> Check given address is corrected -> restart -> Enable ECC with 2bit poisoning -> Check interrrupt is called -> restart -> Enable ECC W/O poisoning -> continue with other applicatons."

Beside that My foundings regarding Qasi Dynamic Group 3 rules are like below:
- “Quasi-Dynamic Group 3” Register Rules
-- No Outstanding DDR Commands: 
--- No AXI port has outstanding address or data beats
--- No reads, no writes, no refresh-triggered accesses

-- No Refresh in Progress:
---Group-3 registers must not be written while a refresh is active.

-- ALL AXI Ports Must Be QUIESCENT:
--- R5F must NOT be executing from DDR
--- A53 must NOT be running or accessing DDR
--- DMA engines must NOT be reading/writing DDR
--- No cached store-misses targeting DDR
--- Why? -> Because ECCCFG1 write affects the NEXT ACCESS, so all masters must be silent.

-- Writes must COMPLETE before access resumes:
--- Then one and only one DDR access should follow when doing a forced ECC error.

-- You MUST restore original value
--- Because Group-3 register changes are persistent until rewritten.

-- NO reset or retraining is required
--- This is why TI uses these registers for ASO in-field diagnostic injection.

According to these:
1- Is my current test scenerio correct?
2- Is it possible or practical leaving the 1 bit poisoning active after completion of ECC test at runtime? What would happened in worse case?
3- How does poisoning mechanism work? Is that possible to apply different patterns? Is it possible to change poison address at runtime? How is it decided which bit will be altered? 
4- Is it possible to change ECC config register at runtime? Can we initialize / run ECC 1 bit poisoning test periodically on A53 cores? 

Thanks in advance. 

Kind regards, 
Hamdi Ertan Yasar

  • Hello Hamdi,

    1. Your testing scenario and other conditions look correct.

    2. It is not practical to leave the 1 bit poisoning active after completion of ECC test because this will cause false interrupts which will be continuously generated. At runtime you want interrupts to generated for real scenarios. 

    3. When 1-bit is selected you select the location of the error by writing the mask of the corrected data portion to the DDRCTL_ECCBITMASKn registers. With 2 bit errors controlling the position of the error is not possible primarily because with 2-bit errors only detection is done and no correction.

    4. it is possible to change the config at runtime. However this is not advisable. As per safety standards, testing of any diagnostic (in this case ECC) should be done before you run the actual application or at durations of test intervals when application is considered inactive.

    Let us know if you have any other questions.

    Regards,

    Neelima

  • Hi Neelima,

    Thank you for your time and the answers.

    I have a question regarding  DDRCTL_ECCBITMASKn. These registers are seen as "Read Only" and they are used for to have overall indication data that shows which bits are corrected in TRM (SPRUID7E - p.6242). Is it possible to that TRM is not up-to-date? I have SPRUID7E–April 2018–Revised December 2019 and downloaded from the product's page. 

    Kind regards,

    Hamdi Ertan 

  • Hi Hamdi,

    Apologies. You are correct, the ECCBITMASK register is a read only register indicating the position of the error for 1-bit errors.

    The correct register is DDRCTL_ECCPOISONPAT that can be used to set the bit for poisoning. More details are in the TRM. SPRUID7E is the correct version of the TRM. Note that in addition to this the DDRCTL_ADVECCINDEX may also need to be programmed. Default selection would be to set the error in first beat of the burst. ECC is done at a per beat level by the controller.

    Regards,

    Neelima

  • Hi Neelima,

    Thank you for the update.

    Could you please confirm whether the advanced ECC mechanism is supported on AM6546? I am asking because in SPRACM1 – AM65x DDR ECC Initialization and Testing, Table 1 (page 4) states that Advanced ECC is not supported. I could not find a similar statement in the TRM, so I wanted to double-check to be certain.

    Best regards,
    Hamdi Ertan

  • Hi Neelima,

    Any updates at your side. I would like to kindly remind my last question.

    Regards,

    Hamdi Ertan

  • Hello Hamdi,

    Sorry again for the delay in responding. Yes advanced ECC mode is not supported. Please refer to https://www.ti.com/lit/an/spracm1/spracm1.pdf for details on how to enable and test ECC.

    Thanks,
    Neelima