AM2434: Problems after converting to MCELF

Part Number: AM2434

Hi,

I have been working to migrate to MCELF format as part of migrating to SDK V12. We have four discrete programs, each running on their own core - this has been stable from SDK V8 through V11.

However, I am now having problems with the target crashing (CPU Exception) when loading from flash. I do not see this at all when running from the debugger (nor from booting the same binary in RPRC format with the V11 SBL).

I started by debugging the target using the same method as debugging SBL (wait loop) then stepping through. This all works fine.

When stepping through the code, I saw memory alignment exceptions - this is why the system doesn't boot. However, this turned out to be a complete red-herring but cost me a lot of time:

I eventually noticed that the dissasembly (in a few areas) was quite a different between booting from flash or booting from the debug probe. The dissasembled instructions in the area around the crash were random - it was clear that the program memory was corrupt and would obviously cause the crash.

To prove the memory corruption, I downloaded the raw program memory from of core 0 (0x70080000, length 0x70000 - as per the linker.cmd) for when I ran with the debug probe vs booting from flash. This confirmed differences in the areas that would crash.

I have double-double-double checked all of the linker.cmd files - all program and data memory are properly partitioned. Besides, I never had this problem with pre-mcelf, V8 - V11 SDKs.

So, I am at a bit of a loss of what to do next - except to revert to SDK V11 for now.

Any suggestions?

Thanks, Steve

  • Hi Steve,

    I would like to know which command/process you used to convert your application binaries to MCELF format, could you please share the same?

    Best Regards,

    Meet.

  • Hi Meet,

    I use an mcelf image gen script taken from an example project in SDK V12. Build output snippet:

    python C:/ti/mcu_plus_sdk_am243x_12_00_00_26/tools/boot/multicore-elf/genimage_am64x.py --core-img=4:../DAQ_BECKHOFF_R5F0-0/Release_CLB/DAQ_BECKHOFF_R5F0-0.out --core-img=5:../ADC_DSP_R5F0-1_nortos/Release_CLB/ADC_DSP_R5F0-1_nortos.out --core-img=6:../DAQ_HAL_R5F1-0/Release_CLB/DAQ_HAL_R5F1-0.out --core-img=7:../DAQ_DIAGS_R5F1-1/Release_CLB/DAQ_DIAGS_R5F1-1.out --output=Release_CLB\\DAQ_SYSTEM.mcelf --xip=0x60000000:0x68000000

    python C:/ti/mcu_plus_sdk_am243x_12_00_00_26/source/security/security_common/tools/boot/signing/appimage_x509_cert_gen.py --bin Release_CLB\\DAQ_SYSTEM.mcelf --authtype 1 --key C:/ti/mcu_plus_sdk_am243x_12_00_00_26/source/security/security_common/tools/boot/signing/app_degenerateKey.pem --output Release_CLB\\DAQ_SYSTEM.mcelf.hs_fs

    Steve

    PS - your web UI tools are broken - I would have presented this as code.

  • Progress: I debugged the SBL whilst looking at the program location for my app in R5f0-0 (0x70080000-0x700EFFFF) where the corruption occurs (0x70082EA0).

    Stepping through the SBL, I spotted the memory was changing, even after the call to Bootloader_parseAndLoadMultiCoreELF.

    Investigating further, I saw that the SBL has set MSRAM_1 to 0x70070000 with size 0x20000 (the V11 SBL was 0x10000).

    Clearly, this is wrong as 0x70080000 is the nominal starting location for R5f0-0. This puts the stack of the SBL smack in the middle of the image I have just loaded.

    I am now investigating how I can get the SBL to fit into allowed memory.

    Steve

  • The solution, in the end, was quite simple. I modified the SBL memory allocation thus:

    MSRAM_0: 0x70000100 size 0x5FF00

    MSRAM_1: 0x70060000 size 0x20000

    Now my image boots and runs properly.

    However, I feel that this is a serious bug in mcu_plus_sdk_am243x_12_00_00_26. It has cost me many many hours to find the true root cause - and I suspect others will be caught-out too.

    So, please update the SDK and notify users of the issue ASAP.

    Thank you, Steve

  • Hi Steve,

    Thanks for putting all this effort to test this and to point this out, this indeed seems to be a mistake, I will check internally on getting this fixed in the next release.

    Best Regards,

    Meet.