Executing a function at its load and run address

Eugene K

I have the need to relocate some code to RAM for execution but also need to execute the same code at its load address. I believe that this is possible with ARM ISA and PC relative addressing. I tried to use CODE_SECTION pragma with load and run addresses in the linker.

Linker script

    .foo {} load > FLASH run > RAM
                        palign (4)
                        LOAD_START(FooLoadStart)
                        RUN_START(FooRun)
                        RUN_SIZE(FooSize)

in C code load address is used as

extern void FooLoadStart(void);

FooLoadStart();

and run address call uses direct Foo() call.

Original Foo code"

#pragma CODE_SECTION (Foo, ".foo")

void Foo(void) {}

The issue is that trampoline which was generated for load address call with 4.6.4 was incorrect (jumps to 0) run trampoline is correct. What is missing?

Thanks,

Eugene

over 14 years ago

0 Jim Noxon over 14 years ago

TI__Genius 14940 points

You might take a look at the union operations in the linker but I think they are doing the opposite of what you are wanting to do.

If you know the code is really relocatable, you may be better off implementing your own trampoline as

// the actual foo function definition
void foo_target( void ) {}

// a pointer to a function, initialized to its load (rom) address
void (*foo)(void) = foo_target_rom_address;

// a function to select the run position
// could also do the copy to ram if needed
void set_foo_target( bool run_in_ram )
{ foo = run_in_ram ? foo_target_ram_address : foo_target_rom_address; }

This way, you can selectively choose where to run the function from by the global pointer foo.

To call the function merely call foo().

Jim Noxon

0 George Mock over 14 years ago

TI__Guru**** 251120 points

The mechanisms in the linker for supporting different load and run addresses for the same underlying entity (i.e. LOAD_START, UNION, etc.) are not intended to be used to support execution at the load addresses. It is intended only to support copying from the load address to the run address, and then executing at the run address. This is true regardless of whether the underlying HW has the capability to support execution at the load address. Therefore, I am unsurprised that Eugene has run into problems. If anything, I'm surprised at how close he came to getting it to work.

The solution suggested by Jim, where the tools see two different functions that are invoked through a function pointer at runtime, will work with no problems. In that view of things, everything has a run address, and nothing has a different load address.

Thanks and regards,

-George

0 Eugene K over 14 years ago in reply to George Mock

Expert 1685 points

Jim, George,

Thank you for providing answers and suggestion on how to get the functionality that I need to work. Allow me however, look at it from different angle. Two questions:

1. How to ensure that compiler creates relocatable assembly (executable at load and run addresses) from C source?

2. Why linker did not create working trampoline for execution out of ROM as it did for execution out of RAM? Why did it matter for linker where to jump? Information that FooLoadStart() is the function does exist in the source and it is the same as prototype for Foo().

Eugene

0 George Mock over 14 years ago in reply to Eugene K

TI__Guru**** 251120 points

Eugene K said:
1. How to ensure that compiler creates relocatable assembly (executable at load and run addresses) from C source?

This is not possible. The tools do not support executing code at the load address.

Eugene K said:
2. Why linker did not create working trampoline for execution out of ROM as it did for execution out of RAM? Why did it matter for linker where to jump? Information that FooLoadStart() is the function does exist in the source and it is the same as prototype for Foo().

Because the ROM address is a load address, and not a run address.

Thanks and regards,

-George

0 Eugene K over 14 years ago in reply to George Mock

Expert 1685 points

Georgem said:
This is not possible. The tools do not support executing code at the load address.

The question is not about run/load addresses but rather about generating assembly that only uses PC relative addressing. Can this be ensured through compiler options or something else?

Georgem said:
Because the ROM address is a load address, and not a run address.

Why does it matter in trampoline? Shouldn't it be just a jump to the target address with proper ISA bit, offset etc.? Please help me to understand why load address trampoline can not work or can not be generated.

Regards,

Eugene

0 George Mock over 14 years ago in reply to Eugene K

TI__Guru**** 251120 points

Eugene,

Eugene K said:
The question is not about run/load addresses but rather about generating assembly that only uses PC relative addressing. Can this be ensured through compiler options or something else?

No.

Eugene K said:
Why does it matter in trampoline? Shouldn't it be just a jump to the target address with proper ISA bit, offset etc.? Please help me to understand why load address trampoline can not work or can not be generated.

Load addresses are designed and tested to support exactly one purpose: copying from the load address to the run address prior to when the entity (code or data) is needed for system execution. A trampoline is designed and tested to branch to a run address. To attempt to use a trampoline for any other purpose is not supported. I don't know the exact details of how trampolines cannot branch to a load address. But I fail to see how that matters.

Thanks and regards,

-George

0 Jim Noxon over 14 years ago in reply to Eugene K

TI__Genius 14940 points

Eugene K said:
The question is not about run/load addresses but rather about generating assembly that only uses PC relative addressing. Can this be ensured through compiler options or something else?

And the answer is... It Depends.

There are some prerequisites for this to happen.

The target platform must support PC relative addressing
The compiler must support PC relative addressing
The optimizer must support PC relative addressing
The code you write must fit within the constraints of 1 and 2

The ARM certainly supports PC relative addressing and quite well too. Most branches can range from -32768 to +32767 bytes in offset from the current PC in ARM mode and up to 1023 in Thumb mode. The CCS compiler can actually create PC relative code to reach farther but that's another topic entirely. Other platforms are not so reasonable. Some platforms do not support PC relative addressing at all (at least not directly via any intrinsic addressing mode). Some platforms will utilize different memory spaces for code and data so you will need to make sure you can execute your code from RAM, the ARM has a single memory space so this is not a problem.

Not all compilers take advantage of PC relative addressing or if they do it is only for achieving some performance metric such as compiling for speed or code size. For CCS and the TMS470 family, do not use the -ml (emm-ell) option as that will force absolute addressing as the -mt and -md options are required for its use. Also make sure the --trampolines=on option is enabled and you may want to use the --minimize_trampolines and --trampoline_min_spacing options too. Doing this will not guarantee you will get relocatable code, just increase the probability your code will be relocatable.

Even with the above comments, the way you write your code will influence whether or not relocatable code is generated. Although there are no hard and fast rules, in general, smaller functions and smaller blocks controlled by branching constructs (if, switch, while, for, etc.) will help achieve relocatable code as the branching distance is a key factor in achieving this.

The optimizer can play some tricks on you too. On some platforms, PC relative addressing is actually slower than absolute addressing (not so on an ARM) and in these cases, especially if you are optimizing for speed, you can cause your code to no longer be relocatable. Another thing to watch for is auto-inlining. When optimization is turned on high and the optimizer can see that a function is only called once then it may inline the entire function, which can be several hundred lines in some cases. This can cause relocatable failure as well. Another issue with optimization is generation of subroutines from common code. In this case, you need to make sure the subroutine calls are done in a PC relative manner. Most compilers don't provide any provisions for doing this even if you want to. Since the ARM can use the BL instruction, subroutine calls can also be PC relative but you need to make sure you are copying all of the code associated with a function. Also, on the ARM, large constants are generally created in tables locally associated with the function that uses them so don't forget to copy this information too.

Finally, check the output code to make sure it truly is relocatable. This is a tedious process, especially if the code is large, but a crucial one none the less. For me, I turn on the option to keep the generate .asm file so I have something to look at. Once I'm happy with the output, or I have tweaked it if necessary, I then add the .asm file to the project and exclude the original .c source. This way I can be sure that a simple change to the optimizer or some other project setting doesn't break my code. Keep the original .c source so you have something to refer to if you need to do some additional debugging. Also place a comment on the top of it stating what all the compiler/linker/optimizer/etc. settings should be to generate appropriate output.

Although it's not a one button guarantee of success, it is possible to achieve your goal. You just have to accept your place in the tool chain to do it. :-)

Jim Noxon

0 Eugene K over 14 years ago in reply to George Mock

Expert 1685 points

George,

It matters in this way. Linker did generate trampoline to load address but put NULL as the target of the branch. This resulted in non-executable code. Along the way linker issued following warnings:

warning #10201-D: conservatively using trampoline for call destination
   "FooLoadStart"; generation and use of this trampoline can
   potentially be avoided if the --symbol_map option is used to re-direct
   symbol references instead of using symbol assignments
warning #10335-D: Machine mode for call destination "FooLoadStart"
   is unknown.Trampoline is generated assuming both caller and callee are in
   the same mode.

None of them has any relevance to linker inability to 'resolve' load address trampoline. According to all previous explanations an attempt to use load address as entry point should have resulted in hard error because tools do not support this function and generated trampoline will be broken. Can you explain the above warnings in the context of this post and why it is not hard error then?

Thanks,

Eugene

0 George Mock over 14 years ago in reply to Eugene K

TI__Guru**** 251120 points

I filed SDSCM00039838 in the SDOWP system to have the linker changed to clearly issue an error when a trampoline tries to branch to a load address.

Thanks and regards,

-George

Code Composer Studio™︎

Code Composer Studio forum

Executing a function at its load and run address