This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: L2RAMW - double bit ECC uncorrectable error when Compiling/Linking with GCC

Part Number: TMS570LC4357
Other Parts Discussed in Thread: HALCOGEN

Hello,

I'm seeing a Group3 Channel 3 ESM error during initialization (address 0xFFFFF520 has a value of 0x00000008) . I've noticed a few patterns:
* It only happens during a hard reset (turning the PSU off/on). Once I clear the flag, it doesn't seem to be triggered by a Debugger reset.
* It only happens when the binary was compiled/linked with arm-none-eabi-gcc. If I switch the HALCoGen layer to TI and compile with CCS, this does not happen.
* If I clear the flag at the beginning of _c_int00() with esmREG->SR1[2] = 0xFFFFFFFF; the MCU will run as expected with no observable issues. 

I've dug through the disassembly and stepped through initialization instruction by instruction and I'm not seeing a functional difference between the TI and GCC generated instructions before the ESM Group3 self test. I haven't been able to identify a root cause.

Does the binary I'm flashing need to adhere to certain requirements before code starts to execute? Can you clarify what gcc flags I should be using? I'm wondering if this is a linking issue. 

If not, any debugging tips?

Thanks,
Brad

  • Hello Brad,

    The start-up code does cause a double-bit RAM error intentionally in order to verify the error signaling and response, via the checkRAM. This causes a data abort and the abort handler identified whether this was caused intentionally or if it was a real double-bit error on a RAM access.

    If the error is not intentionally generated by the start-up code, can you check if there is any possibility of your application reading a variable without first initializing it? Do you perform an auto-initialization of the CPU RAM as part of your start-up sequence?

    Regards,
    Sunil
  • Hi Sunil,

    Thanks for the quick response. I haven't modified the startup code produced by HALCoGen, other than to instrument it to debug this issue.

    For the sake of debugging I've stripped the application code down to just blinking an LED inside of int main(void) in HALCoGen's HL_sys_main.c file. I've run all the C code through both the Clang Analyzer and the Coverity static analyzer and neither detected reading any uninitialized variables. I'm not using many variables to blink the LED, so this was easy to verify through manual code inspection and testing.

    Where would I find the intentional ECC error and check? _memInit() is being called, though my testing/instrumentation seems to suggest the error is happening before then.

    Best,
    Brad
  • Hi Sunil,

    I've identified the root cause, and have new questions. How I got here is a long story so I'll skip to the end.

    When I changed the HALCoGen layer's toolchain from TI to GCC, HALCoGen changed the function declaration of getResetSource() as follows:

    -resetSource_t getResetSource(void);
    +resetSource_t getResetSource(void) __attribute__((naked));

    As a result, GCC ignored the function's return statement. When the Program Counter reached the end of the function, there were no BX LR instructions to return to _c_int00(). The Program Counter would just continue on to run through the next function in HL_system.c, which is systemGetConfigValue(). This function is never called anywhere in code, it should be functionally dead code. Once the Program Counter reached the end of that function, it would finally return to _c_int00(). In hindsight, the ECC uncorrectable error seems appropriate. 

    Can you clarify why this attribute was added for GCC? Does it address a concern that could be addressed some other way? The only other place the naked attribute was added during the toolchain conversion is _c_int00().

    Thanks,
    Brad

  • Hi Brad,

    I will have to check with the software team on why this attribute was specified for the getResetSource function only when choosing the GCC toolset.

    Regards,
    Sunil
  • Thanks Sunil, I appreciate it.

    In case I wasn't clear in my last post; when I delete that attribute the function does return as expected and the ECC error no longer expresses itself.

    I did some research and found this:

    "This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported."

    The key phrase in that section is "will not have prologue/epilogue sequences." GCC appears to be behaving correctly given that attribute in the declaration. It's also worth noting that the function only contains C code, which the GCC documentation says is not safe when that attribute is used.

    Best,
    Brad

  • Thanks Brad. I caught that as well. I looked up information about this attribute and identified that it does need its own manual context save and restore, along with an explicit return instruction.

    Stay tuned until I hear back from the software team about why this attribute was defined for the getResetSource() function only when the GCC tool-set is chosen.

    Regards,
    Sunil
  • Hi Brad,

    The software team has confirmed this to be a bug, and it is being addressed in the next update to HALCoGen. In the meanwhile, you will need to manually remove the "naked" attribute from the getResetSource() function declared in the HL_system.h file.

    Regards,
    Sunil
  • Hi Sunil,

    I appreciate the quick turn around time on this, the confirmation will help us to move forward. We'll look forward to seeing that HALCoGen update.

    Thanks,
    Brad