MCU-PLUS-SDK-AM243X: Boot from Flash not working with different load and run-addresses

Felix Heil

Hello,

our setup is a bit complex.

In short first:

We use the LTS 2.1.2-compiler and SDK 08.04.00.15 with some small modifications. E.g. we use still the older flash-driver since the new one does not work for our used Flash. We have some code-parts with different load- and run-addresses. The code is copied in via the copy_in and the table-operator is used in the linker-script. This works fine when loaded via CCS but does not work when booting from flash.

In long:

We use code-parts of two relocatable output-modules and a static-library in a UNION in the linker-script with two separate load-addresses and one common run-address:

This is what the MCU1_0_PSRAM_CODE-address looks like when loaded via CCS:

this looks fine!

The PSRAM is connected via the GPMC.

But as soon as we let this one boot from flash (and it's ensured the psram/gpmc is already correctly initialized and set up) it looks like this:

And thus we get an abort when the code will be executed.

Pls note this is the load-address and not the run-address. Since the load-address already has garbage in it this garbage is copied at the run-address -> abort happens.

But for addition: copy-in looks like this:

the map-file looks like this:

the mentioned code is located at the bottom with the 500785d0 load origin.

It also also important to mention that this does not happen once the image was loaded via CCS and a warm reset is done! We can do as many warm resets as we want (via the Sitara itself, not just a CPU Reset). Only when power is disconnected and the device is powered up again this happens. But this is the usual use case so we need to solve this problem.

How does this happen? are there additional infos needed for the rprc-image-generation? Do we need some special handling inside the bootloader for this?

I looked up the documentation and it does only mention a run-address for the sections: software-dl.ti.com/.../TOOLS_BOOT.html

Also I checked the generated rprc/appimage-file and found the part where the rprc-header is, but it only has the run-address and the size inside, no load-address anywhere:

The source-code inside the sdk also just mentions a run-address:

if this feature is missing, we need it to be implemented!

Best regards

Felix

over 2 years ago

0 Aakash Kedia over 2 years ago

TI__Mastermind 26145 points

Hi Felix Heil,

I have not seen a scenario where the loading is happening from one volatile memory to another volatile memory. But even if this use case is valid this is a design limitation for RPRC that it does not handle the same. I will capture this as an design issue internally.

Thanks for suggesting this. We might come up with a valid fix for the same.

Best Regards,
Aakash

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash,

yes this might seem a bit exotic. But the use-case is like it is described in the compiler-manual: https://software-dl.ti.com/codegen/docs/tiarmclang/compiler_tools_user_guide/compiler_manual/linker_description/05_linker_command_files/placing-a-section-at-different-load-and-run-addresses-stdz0756565.html and: https://software-dl.ti.com/codegen/docs/tiarmclang/compiler_tools_user_guide/compiler_manual/program_loading_and_running/run-time-relocation-stdz0694629.html#stdz0694629

"At times you may want to load code into one area of memory and move it to another area before running it. For example, you may have performance-critical code in an external-memory-based system. The code must be loaded into external memory, but it would run faster in internal memory. Because internal memory is limited, you might swap in different speed-critical functions at different times."

So we thought that the SDK supports this.

We have thought about a lot of options but that was the only suitable.

Since we need to provide Webserver-functionality, industrial ethernet stacks, a lot of other protocols, display-driver, multiple SPI-connections and the corresponding applications we need "a lot" of RAM. Well at least like 4-5 MB. And the internal SRAM is not enough. Also we cannot use DDR-RAM because of the costs and the design-limitations. Also it may be a bit crazy to use at least 256 MB additionally to increase the size of internal SRAM for needed 4-5 MB.

So we agreed on using a pSRAM via GPMC. We did some tests but in the end, even if cached, it's not fast enough for the industrial ethernet protocols and stacks. the pSRAM lets the CPU stall sometimes when it's accessing. So we put every non-time-critical-code onto pSRAm and all time-critical code into internal SRAM, and the TCMs.

Now the internal SRAM is only enough for one industrial ethernet protocol. But since we need to support an application where the device should be able to speak multiple different protocols - depending on its configuration - both should be in internal SRAM. Since there is not enough space for both of them, we wanted to first load them into pSRAM and then, based on the configuration, load them into the internal SRAM. either the one or the other (using the UNION-statement in linker-script).

We also thought of keeping them in Flash and loading them from there, but then this would not work likely with the linker-scripts and copy-tables. We implemented an fw-Update-architecture with Slot A/B (and a fallback to prevent any bricking) and thus the position of the stacks inside the flash would change all the time, even depending on the size of the updated image.

Also the idea was to load a stack on another core - which would be doable without any problems technically. But then there is no special IPC-interface made by the stack- vendors and we need to take care of that again and so on. So we thought that would be the fastest solution to achieve what we need.

I hope that makes the use-case a bit clearer.

Best regards

Felix

0 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix Heil,

I understand and acknowledge the use case. Currently the SDK does not support this use case. We can take this input for improvement for our future releases. Unfortunately, it might not come in immediate releases as a fix. There is nothing much that can be done on this front.

Apologies for inconvenience.

Best Regards,
Aakash

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash,

ok yes I already talked with Ming about this topic.

Another current possibility that came to my mind... regardless of our Slot A/B-architecture, we also pack parts intwo different "sub-images". The fw is capable of getting the address of those sub-images and reading the data from flash.

I thought now that maybe we can go this way: put both stacks into separate "sub-images". In the fw, depending on the configuration we then get access to one of those stacks, so, that means getting the address in flash.

But here there may come in the tricky part. I think of two options:

Also we need to use the different load- and run-address-technique in a special manner.

The stacks are compiled as relocatable output-modules. That means they are only partially linked. I had a session and a solution for another problem regarding this topic with George. Here we also noticed that we need to link them with the -nostdlib option. This means the relocatable output modules will need the libc-stuff and some other function linked against in the final linking.

But since this is in that sense only possible when the linker is directly aware of the modules I am not sure this will work this way with the copy-table option. I saw there are other ways to load stuff addressed by the linker. So... maybe we can "pseudo"-link the relocatable output-modules without having them really loaded at a load-address and not included in the output-file at all? so not directly using the copy-tables but somehow telling the linker so he does know what to link where.

the stacks are still in the flash and then will get loaded by us into the psram at the addresses the linker expects them to be as a load-address. this information we can still achieve by the pre-generated copy-tables. This means the load must happen before the copy_in is issued.

But I am not sure if it is possible to just take the compiled and partially linked relocatable output module and pack it directly into the flash or if the linker does some magic in the final linking and rearranges the content of this relocatable output-module, since data-sections and function-sections are enabled too. Then it will get problems I think since the addresses won't match anymore.

mh. Or can you think of another workaround?

Best regards,

Felix

0 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix,

My simple solution would be going towards an ELF parser in the SBL itself and remove the RPRC parsing. Also the loading SBL does is only to the load address present in the image. Also to reduce the size of the image you can remove the symbols as well. I have an experimental changes that was done but never productized because supporting multi-core images with single ELF was a problem we were unable to solve.

This will be an easy solution for you.

Best Regards,
Aakash

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash, thanks for this idea.
Is it possible to get some support for this? we rre not used to parse elf-files. Is there any documentation that could help us on this topic?

Best regards,

Felix

0 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix Heil,

We have an experimental implementation for the same and it worked for us. Although as this is not official, we might not be able to support you on this in depth.

Can you manage after this on your own ? Instead of *.appimage, just use *.out file to flash on the device.

/cfs-file/__key/communityserver-discussions-components-files/908/bootloader.c

/cfs-file/__key/communityserver-discussions-components-files/908/bootloader.h

/cfs-file/__key/communityserver-discussions-components-files/908/bootloader_5F00_priv.h

Best Regards,
Aakash Kedia

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash,

thanks, I will have a look! Since we use our Bootloader for multiple products (so also with RPRC) and need to be backwards compatible I may modify the bootloader to have both possibilities. I guess I can just quickly check if "RPRC MEND" is written to the image or not and then decide if parsing an elf or the usual rprc-parsing. Does this make sense?

If this is at least working so far, is it possible to get a timeline by TI when the feature of different load- and run-addresses with the SDK will be implemented?

Best regards

Felix

+1 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix Heil,

Felix Heil said:
I guess I can just quickly check if "RPRC MEND" is written to the image or not and then decide if parsing an elf or the usual rprc-parsing. Does this make sense?

Maybe you can look into the RPRC source code and add an option to save the load address in one of the reserved fields of RPRC.

Felix Heil said:
If this is at least working so far, is it possible to get a timeline by TI when the feature of different load- and run-addresses with the SDK will be implemented?

We are not going to update or move forward with RPRC as this is not a standard option. We might switch to ELF parser mostly.

Best Regards,
Aakash

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash,

I guess this sounds good? So do you have a timeline when this will be implemented in the sdk?

0 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix,

It is planned for 09.01 (Dec'23) release.

Best Regards,
Aakash

0 Felix Heil over 2 years ago in reply to Aakash Kedia

Expert 1175 points

Hey Aakash,

this feature is mandatory for our next product. So we will need it sooner than December. We agreed with Ming that we need to escalate this topic.

Best regards

Felix

0 Aakash Kedia over 2 years ago in reply to Felix Heil

TI__Mastermind 26145 points

Hi Felix,

The follow-up of this thread is happening on https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1237937/mcu-plus-sdk-am243x-extracting-sections-with-tiarm-tools-for-manually-placing-them-in-ram

So, I am closing this thread.

Best Regards,
Aakash

Arm-based microcontrollers

Arm-based microcontrollers forum

MCU-PLUS-SDK-AM243X: Boot from Flash not working with different load and run-addresses