Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

MCU-PLUS-SDK-AM243X: My M4 core doesn't like the CCS build process (Still)

Part Number: MCU-PLUS-SDK-AM243X
Other Parts Discussed in Thread: SYSCONFIG

Tool/software:

Hi,

I'm asking as an extension of my previous thread here, which was labelled solved but I was told I'd get a response and never did.
https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1354881/mcu-plus-sdk-am243x-my-m4-core-doesn-t-like-the-ccs-build-process

Why does data verification not work with the M4 core? I want to enable it because while it allows me to execute an app on the core, it corrupts the shared memory across all cores.

My usual workaround is to include no-op commands like i; in different parts of the code until it eventually runs, but eventually I reach a dead end where I cannot get the apps to run at all.

  • Hi Tron,

    Thanks for your query.

    Why does data verification not work with the M4 core? I want to enable it because while it allows me to execute an app on the core, it corrupts the shared memory across all cores.

    From the past discussion on another E2E ticket, it seems that the issue is random and does not occur on all EVMs.

    Can you please confirm, how many EVM have you tested and is the issue seen on each EVM?

    Regards,

    Tushar

  • No, you've misunderstood.

    The issue is only seemingly random for a given build. The build will encounter the same problem when flashed to any of the ~10 EVMs I've tried. That part is consistent and in April you also confirmed this yourself when you replicated the problem from my binaries provided via DM.

    The only way I've found to bypass the issue is to:

    1. disable verification for the M4 core as you previously suggested as a temporary fix, and to

    2. add no-op junk code to create offsets in different locations of my code based on trial and error, to ensure those memory locations that are being corrupted by the M4 are not actively used. With my current code, even this is not working and I'd like to remove these from the code.

    I suspect the build doesn't pass verification because the CCS build process for the M4 core is unreliable causing the code I use to establish structures in shared memory, which works for the R5 cores, to mostly work but generally leads to glitchy memory corruption when built for and executed by the M4 core.

  • Hi Tron,

    Thanks for the above clarification.

    1. disable verification for the M4 core as you previously suggested as a temporary fix, and to

    So, I assume it works every time when verification is disables but doesn't work when verification is enabled.

    Can you please try with the latest CCS and let us know the result?

    To get the latest CCS v12.8.1, please visit CCSTUDIO 

    Regards,

    Tushar

  • If verification is disabled, the app will execute but the build for the M4 core has glitches and corrupts locations in memory.

    If verification is enabled, then I get the following error, which is why I suspect these two issues are related: the root issue is whatever is causing verification to fail and if we solve that, the memory corruption issue will be solved also.

    BLAZAR_Cortex_M4F_0: File Loader: Verification failed: Values at address 0x000101E0 do not match Please verify target memory and memory map.
    BLAZAR_Cortex_M4F_0: GEL: File: M4_Governor.out: a data verification error occurred, file load failed.

  • Also, we are all using the latest CCS.

  • Hello Tron,

    Thanks for the above details.

    Can you please provide sample code to replicate the issue at our end?

    Regards,

    Tushar

  • Hi Tushar,

    I'd need to send you the source code for our entire proprietary project. I can't create sample code because changing individual characters and lines of the codebase causes the M4 build to corrupt differently, or not at all.

    I've just upgraded to MCU+ v10 and SysConfig 1.20 which now allows me to turn on verification with the build error mentioned previously, but sadly the M4 corruption is still present (albeit, this change alone changes which locations in memory are corrupted by the M4 execution).

    Is it possible to have a video call with someone and share my screen as we work through it? It seems this is the root cause of a lot of our problems, from IPC issues, eQEP issues, SPI issues, shared memory corruption and the M4 core it's self hard faulting - I suspect all of these issues are caused by the same memory corruption issue, just different locations of memory at different times.

  • As an example, I have a function that pre-fills default settings for the app. It's an array of structs, and each element has a string called 'name' .There are ~50 elements in the array.

    At the moment the first and 21st strings of the array are being corrupted, so I've purposefully started filling at i=1 so the first element should always be empty:

    When I reboot, the memory is all zeros:

    When I reach the 21st item (line 279), this code executes. It's a simple line. The same form as the 20 elements before and the same code that's worked fine for 2 years until today:

    But, instead of the chars shown being written to the 21st array element, strcpy suddenly writes junk chars to the 0th element instead:

    If I step into the function to see precisely when the glitch happens, I can immediately see that dest being used is the location of the 0th array element, and the src is pointing to a region of memory where the code resides.  The execution is not using the src and dest that were passed to it according to the code.

    Here's something like I'd expect to see inside the strcpy function:

    It's not clear to me how this can happen unless there's an issue in the built output.

    This is just one example of a glitch we're experiencing. If we modify any of this code, no matter where, then a different glitch will appear elsewhere. This is code that we use across all R5 cores, also, and only the M4 core has this issue interpreting the code as we've defined it.

  • I've noticed something interesting. If I power cycle the LP, then the corruption happens. I can see the disassembly looks like this:

    If I don't power cycle, but I halt and debug the app for the M4 core again, then it looks like this - and there's no corruption:

    Months ago I had a significant issue with IPC, where the IPC notify signal between cores would happen but the receiving core would not receive the message and would hard fault instead. I'd also noticed that if I ran debug a second time without power cycling then the issue would disappear, just like here.

    That's not a solution, though, as I need to modify sciclient each time the LP boots, so I need to power cycle every time. The way I 'solved' the IPC issue was to not use IPC for that core, and instead I created my own IPC implementation using shared memory just to get around this issue.

    Here are those threads for context:

    * https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1319712/lp-am243-xtasknotifyfromisr-and-xtasknotifywait-causing-task-to-crash

    * https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1346002/lp-am243-ipc-notify-interrupt-failing-on-r5f0_1

    I've tried to use TI Clang v3.2.2.LTS instead of v2.1.2 and while the disassembled instructions are slightly different, the outcome is the same: the first run will glitch, a second debug without power cycling will use the correct instructions and there will be no glitch.

    I've also tried disabling optimisations, but that doesn't help. There's still a difference between running debug after a power cycle and subsequent debug sessions without power cycling - the latter work but the former don't.

  • I have a temporary fix. If I set the optimisation level to anything other than 0, then the optimisation seems to bypass the issue.

    However, it's clear that there's a problem with TI's debugger, particularly that the standard code is explicitly different and problematic when run the first time after a power cycle.

  • Hi Tron,

    Thanks for providing the above details. From your analysis it seems that the compiler optimization levels are corrupting the binary.

    Can you please try once with the latest compiler version?

    Please refer ARM-CGT-CLANG 

    I am routing your query to compiler team for comments.

    Regards,

    Tushar

  • Hi Tron,

    If I power cycle the LP, then the corruption happens.

    Is that mean the example is working fine without power cycle?

    Is this issue consistent or after power cycle it occurs?

    Regards,

    Tushar

  • Hi Tushar. I tried with TI Clang 4.0.1.LTS and the same thing happens: after a power cycle the memory is corrupted, only if I debug a second time without cycling the power does it work fine.

    That is, unless I set the compiler optimisation to > 0, then it will work fine after power cycling.

  • Hi Tron,

    Thanks for providing the above information. Are you seeing a different error message?(i.e. other than data verification error). 

    As I remember earlier you are able to see the data verification error irrespective of board power cycle. Is it true for this case also or are we trying to debug a different issue?

    Regards,

    Tushar