This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello,
we experienced a weird issue with firmware with different load and run-addresses which may share the same used compiled objects. We use STS 3.1.0 and AM243X MCU PLUS SDK 9
It is really hard to reproduce and took us some days to find out but I will try to summarize the findings and explain the problem.
We use different load- and run-addresses for some sections in our code. I already posted a small summary:
https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1283306/mcu-plus-sdk-am243x-debugging-firmware-with-different-load--and-run-addresses/4899136#4899136
So it works with UNION-statements in the linker-script and with the generated copy-tables to load the needed code at runtime to its address in the internal SRAM. All is done according to the compiler-manual.
But we noticed that we get some prefetch/data-aborts when running one possibility of the two available. In our case we load different network-protocol stacks based on the configuration. So one part of loaded code runs just fine. But as soon as the other is loaded and a function is later called at runtime (when connecting to a plc) the applications crashes.
We saw that the code started jumping to nowhere where none such symbols are defined that were shown in CCS.
It came out that the compiler/linker tries to optimize C++-templates which are the exact same but created multiple times. Thus this is a nice idea but it does not work with different sections which have their memory at the same run-address.
So in our case there was a small templated object but it was not created twice (so for every one of the two section which have the same run-address) but only once for only one of the two sections. This lead to the behaviour that one loaded section works fine while the other one loaded crashes. Because the crashing sections tries to jump to the code which points to an address which is defined in the other section.
Not sure if it makes the understanding easier but I tried to summarize it in a picture:
My guess is that the linker throws out one of the both compilations and instead creates a jump to the other object (which is not instantiated since the .text is not available).
We could work around this issue by using an enum instead of an raw unsigned int, so at least it seems the compiler distinguishes then and does not "see" two exact same objects and compiles/links it two times, exactly how we want it.
Anyway I am surprised this could even happen. I'm a bit curious if this could also happen for other same objects or if that is the only exception because we already had weird aborts we couldn't really tell where they came from. Shouldn't the linker just drop any optimization when it sees there are two sections in one union and not try to remove symbols of those sections if they are defined in the other loadable section in the same union?
It is not easy for us to create a minimal reproducable example right now but I hope I could explain the issue we face.
Best regards
Felix
I have a rough understanding of what happened. But details matter, so I need a test case.
Right now, I cannot think of a better solution than ...
We could work around this issue by using an enum instead of an raw unsigned int
I'll discuss this with my team and get back to you.
Thanks and regards,
-George
Hey Goerge,
ok, I'll try to get a working code snippet that hopefully can show what I mean a bit more detailed.
Best regards
Felix
To give an answer in which I am confident, I need a test case which allows me to reproduce the problem. Lacking that, I can only make this guess. The part of the compiler which optimizes away the function instantiated by a template probably has no idea this function is part of a union, and the other function it calls instead is in a different union. If that is correct, then the enum based workaround will probably continue to be needed.
Thanks and regards,
-George