This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

LAUNCHXL-RM42: Reliable (spurious?) ECC error on reset: ESMSR3 == 0x00000008

Part Number: LAUNCHXL-RM42
Other Parts Discussed in Thread: HALCOGEN

We're trying to bring up a LAUNCHXL-RM42. With the standard sys_startup.c provided by HALCoGen, the MCU hangs in _c_int00 because a bit in ESMSR3 is set. It hangs in this for loop:

    if ((esmREG->SR1[2]) != 0U)
    {
/* USER CODE BEGIN (24) */
/* USER CODE END */
    /*SAFETYMCUSW 5 C MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
    /*SAFETYMCUSW 26 S MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
    /*SAFETYMCUSW 28 D MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
        for(;;)
        {
        }/* Wait */                 
/* USER CODE BEGIN (25) */
/* USER CODE END */
    }

The value of ESMSR3 is 0x00000008, which indicates an ESM Group 3 Channel 3 error, "RAM even bank (B0TCM) - ECC uncorrectable error".

I modified the debug configuration by disabling Target > Auto Run and Launch Options > Auto Run Options > On program load or restart and observed that even before any code has run, the value of ESMSR3 is 0x00000008.

This test project was created following the directions in How to Create A HALCoGen-Based Project For CCS. I enabled only the "GIO" module in HALCoGen and did not change any module settings. I did not add any user code.

I'd like to know why the MCU is signaling an ECC error even before any code has run. Is seems unlikely that I have a bad chip. We would prefer not to ignore ECC errors since this is for a safety-critical application.

Thanks for your help,

Sam Birch
Innovative Electronics, LLC

  • Hi Sam,

    Most likely the SRAM is not initialized before using it. Does power cycling the board help? Do you know which statement causes the SRAM ECC issue?

  • What is the procedure for initializing the SRAM? I would expect the startup code generated by HALCoGen to initialize the SRAM for me. Is this not the case?

    As to your questions:

    For a "warm" reset (e.g. after programming), ESMSR3 is set to 0x00000008 even before the first instruction executes, as described above.

    If I unplug the LAUNCHXL board, plug it back in, and then attach the debugger, I find the MCU executing ramErrorReal in dabort.s:

    ramErrorReal:
            b       ramErrorReal        @ branch here forever as continuing operation is not recommended

    At this point if I check ESMSR3, it is still reading as 0x00000008.

    To determine how the chip gets into ramErrorReal, I disabled the if ((esmREG->SR1[2]) != 0U) check in sys_startup.c and set a breakpoint at the beginning of _c_int00. By stepping through the code, I determined that _dabort is being triggered from he second-to-last line of checkRAMECC in sys_startup.c:

        /* Compute correct ECC */
        tcramA1bit = tcramA1_bk;
        tcramB1bit = tcramB1_bk;
        tcramA2bit = tcramA2_bk; /* <== This line */
        tcramB2bit = tcramB2_bk;
    }

    Keep in mind that this project consists solely of code generated by HALCoGen, and I have not removed any code generated by HALCoGen nor added any code of my own.

    Thanks for your help,
    -sam

    Sam Birch
    Innovative Electronics, LLC

  • Thanks Sam

    When reading the corrupted data (below statement in checkRAMECC), ESM 3.3 is set and data abort is generated. 

    /* Read the corrupted data to generate double bit error */
    ramread = tcramA2bit;
    ramread = tcramB2bit;

    The _dabort exception handler will clear ESM 3.3 which is set intentionally. 

    Writing the backup data to tcramA2bit should not generate data abort or 2-bit ECC error.

    If checkRAMECC() is not called in sys_start.c, does the project generate the same error (ESM 3.3 is set)?

  • If I comment out checkRAMECC() in sys_startup.c, execution sometimes proceeds to main() without ESM 3.3 being set. This seems to only occur on a "cold" reset, when I plug the LAUNCHXL board into my PC and connect the debugger without resetting the MCU. However, after a "warm" reset (when I program the board and enter a debug session or when I use the System Reset button in the debugger), execution still gets stuck in this for(;;) loop:

        if ((esmREG->SR1[2]) != 0U)
        {
    /* USER CODE BEGIN (24) */
    /* USER CODE END */
        /*SAFETYMCUSW 5 C MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
        /*SAFETYMCUSW 26 S MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
        /*SAFETYMCUSW 28 D MR:NA <APPROVED> "for(;;) can be removed by adding "# if 0" and "# endif" in the user codes above and below" */
            for(;;)
            {
            }/* Wait */                 
    /* USER CODE BEGIN (25) */
    /* USER CODE END */
        }

    and ESM 3.3 is set.

    The only way I can reliably get to main is by disabling both checkRAMECC() and the code block quoted above.

    Is it normal that ESM 3.3 be set after a warm reset? This would make debugging fairly annoying as I'd have to power-cycle the board each time.

    Thanks,
    -sam

    Sam Birch
    Innovative Electronics, LLC

  • Hi Sam,

    The only way I can reliably get to main is by disabling both checkRAMECC() and the code block quoted above.

    In checkRAMECC(), 2-bit ECC fault is injected to SRAM, and ESM 3.3 being set is expected. But this error flag should be cleared in _dabort handler. If ESM 3.3 is not cleared, you will get the issue after the warm-reset.

    Does your project have this file? 

    dabort.asm

  • The sys_intvecs.asm:

    If data abort occurs, the code should be jumped to data abort handler: _dabort which is defined in dabort.asm

  • My project has dabort.s since I'm using the GNU toolchain in CCS. Could that be causing problems?

    Anyway, I do see _dabort defined in dabort.s and I see how it checks to see whether the ECC fault is real or synthesized. Since the fault occurs after writes to ECC RAM have been disabled again, it treats the fault as real and jumps to ramErrorReal.

    Thanks for sticking with this. I can send you my entire project if that would be helpful. Let me know.
    -sam

    Sam Birch
    Innovative Electronics, LLC

  • I made a new project using the TI toolchain (v20.2.6.LTS) and the behavior is the same. Still getting stuck in the loop on line 243 of sys_startup.c because ESMSR3.3 is set.

    I'm not doing anything strange here; I followed the instructions in SPNA121B and then hit F11. Would you expect that to work?

    Cheers,
    -sam

    Sam Birch
    Innovative Electronics, LLC

  • Here is my test project that exhibits this issue:

    https://f002.backblazeb2.com/file/iellc-file-share/74ee3393/ff5e8331f95d4ebebb85d970fdf3bde1

    And here's what the debugger shows after I hit F11 and then pause execution:

    Screenshot of CCS debugger with execution paused on line 245 of sys_startup.c and Esm.Stat3 value 0x00000008

  • I built that test project using CCS 11.1 under Windows 10. Was able to debug it on a LAUNCHXL-RM42 without seeing any ECC errors.

    On starting the debug session pauses at the start of _c_int00() and can reach main() successfully by either:

    a. Step Over (F6) until get to main.

    b. Set a breakpoint on main and use Resume (F8) and hits the breakpoint.

    Therefore, can't see any obvious error with the test project.

    Do you only have one LAUNCHXL-RM42? If so might be a fault on that launchpad.

  • Chester,

    Thanks for checking. I'm also using CCS 11.1 under Windows 10. I only have one LAUNCHXL-RM42, but it looks like I'll be picking up another one to rule out a hardware issue.

    Cheers,
    -sam

    Sam Birch
    Innovative Electronics, LLC

  • Thanks Chester for checking. I am not able to download the project because of TI firewall:

  • Hi all,

    I wanted to follow up and say that this was indeed a problem with that board. I ordered a second LAUNCHXL-RM42 and it works flawlessly every time. So, I guess it goes to show that sometimes it really is the hardware.

    Thanks again for all your help,
    -sam

    Sam Birch
    Innovative Electronics, LLC