AM6442: ECC double bit error handling

Mateusz Szalkowski

Part Number: AM6442

Tool/software:

Hello,

I'm enabling ECC for DDRSS on AM6442 processor. I applied modifications to U-boot according to documentation and I can verify that it detects both single and double bit errors using ddrss command in U-boot.

The only problem I have now is reaction to double bit error - all suggested methods seem to reboot the SoC right after detecting the error while I would need also log the event somehow and show it to user after reboot. I've seen it's possible to detect if ESM module was the reset cause, but it seems not to be possible to distinguish which ESM event exactly caused it.

Is it possible to e.g. configure event using ESM as low priority to prevent SoC from rebooting and allow log the event first? Or at least detect after reboot that exactly ECC was the reason of reset?

Also is there any example for DDR injection on Linux side to test it similarly to ddrss command in U-boot?

Best regards,
Mateusz

11 months ago

0 Nihar Potturu 11 months ago

TI__Mastermind 20076 points

Hello Mateusz,

Mateusz Szalkowski said:
The only problem I have now is reaction to double bit error - all suggested methods seem to reboot the SoC right after detecting the error while I would need also log the event somehow and show it to user after reboot. I've seen it's possible to detect if ESM module was the reset cause, but it seems not to be possible to distinguish which ESM event exactly caused it.

Can you modify the abort handler to log the error first and then trigger a reset from software instead of using the ESM module to trigger a reset? (I have attached a rough idea below)

abort_handler()
{
    if(error==DDR ECC Double bit)/*You can check the ESM event for this)
    {
        log the error;
        reset the SoC;
    }
    else
    {
        Usual Abort handler;
    {
}

Regards,

Nihar Potturu.

0 Mateusz Szalkowski 11 months ago in reply to Nihar Potturu

Prodigy 130 points

Hello Nihar,

Thanks for the suggestion. After some initial tests with using ESM module to handle WDT events it seems like it might be possible to do that, but looking at the documentation it seems like any option suggested there ends up in board reboot anyway either by ESM module or synchronous abort.

In our architecture we use Linux on A53 along with RTOS on R5. All events that are added to ESM module on bootloader device tree level seem to restart the SoC and if double ECC event is not added there - the SoC reboots because of synchronous abort (at least during tests in u-boot), so even if we try to handle it e.g. on R5 side (as it is now for R5 WDT event) we cannot do anything if the double bit error occurs before R5 is up (so before Linux kernel is up). Any suggestion how to handle such scenario?

In the meantime I would like to try modified handler approach as you suggested, but I couldn't test it with our architecture. I tried to port ddr_ecc_test example from MCU+ SDK to run it on R5 configured as Linux remoteproc, but it doesn't seem to work - either whole system hangs and resets right after running R5 core if DDR is configured in syscfg file, or R5 simply doesn't go into ECC event's callback if DDR is not configured in syscfg file. Any other way to test that e.g. from Linux side?

Regards,
Mateusz

Processors

Processors forum

AM6442: ECC double bit error handling