This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: Linker: placing sections at different load and run addresses

Part Number: MCU-PLUS-SDK-AM243X

Hello,

so I stumbled about a use case which is documented in many different parts in the LTS1.3.0 compiler manual, which makes it sometimes hard to gather all the needed information at once.

I wanted to try the topic which is explained here: https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/linker_description/05_linker_command_files/placing-a-section-at-different-load-and-run-addresses-stdz0756565.html#stdz0756565

So in fact placing sections at a different load than run-address. Our use case is to place different code-parts each in external RAM and depending on what the program does load them in the same run-location (of course not both at once, but either the one or the other.).

I followed the example also described here: https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/linker_description/08_using_linker_generated_copy_tables/generating-copy-tables-with-the-table-operator-stdz0750716.html#stdz0750717

and here: https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/linker_description/05_linker_command_files/using-group-and-union-statements-stdz0753269.html

so my solution was the following:

in the linker-script:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
.ovly: > MCU1_0_R5F_MEM_TEXT
UNION: run > MCU1_0_R5F_MEM_TEXT
{
.testLog:
{
-l libtestLog.a
} load > MCU1_0_EXTRAM_CODE, palign(8), table(TESTCODELOG)
.testSignal:
{
-l libtestSignal.a
} load > MCU1_0_EXTRAM_CODE, palign(8), table(TESTCODESIG)
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and inside the source code:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <cpy_tbl.h>
extern COPY_TABLE TESTCODELOG;
extern COPY_TABLE TESTCODESIG;
/* ... */
if(useLog)
{
copy_in(&TESTCODELOG);
new LogTest();
}
else
{
copy_in(&TESTCODESIG);
new SignalTest();
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

but it won't work.

I can see that the code will be loaded at call copy_in, but somethings wrong.

having opened the memory browser I see the following (I changed the names a bit) before the copy_in (in this case the else-branch):

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0x701790A4 .L.str
0x701790A4 6362696C 62612B2B 00203A69
0x701790B0 .L__const._ZN7x6x20x21xEv.regs
0x701790B0 28001000 00000080
0x701790B8 .L.str
0x701790B8 5464654C 006B7361
0x701790C0 .L.str
0x701790C0 E2800001 E5C10014 EA000033 E59D0010 E5900010 E30B1210 E3471017 ED9F0B31 E3032CDF E3402034 EB000124 E59D1010
0x701790F0 E5D10014 E2800001 E5C10014 EA000026 E59D0010 E5900010 E30B121C E3471017 E30A2E4B E3472017 EBFEFE74 E59D1010
0x70179120 E5D10014 E2800001 E5C10014 EA00001A E59D0010 E5900010 E30B1228 E3471017 EBFF1F5A E59D1010 E5D10014 E2800001
0x70179150 E5C10014 EA000010 E59D0010 E5900010 E30B1234 E3471017 E30A2E58 E3472017 E3003539 EB0000B9 E59D1010 E5D10014
0x70179180 E2800001 E5C10014 EA000003 E59D1010 E3A00000 E5C10014
0x70179198 .L.str.13
0x70179198 EAFFFFFF EAFFFF91 E28DD018 E8BD8800 2A9D627C 404525DF 41BB999A 00000000 00000000 00000000 E92D41F0 E24DD0C0
0x701791C8 E1A0C001 E1A0E000 E59D00DC E59D10D8 E58DE0BC E58DC0B8 E1CD2BB6 E58D30B0 E5CD10AF E5CD00AE E59D00BC E58D0028
0x701791F8 E28D00AC E58D003C EBFFC99A E28D0088 E58D0024 E3A01021 EBFFD114 E59D1024 E59D0028 E3811001 E5D10014 E2800001
0x70179228 E5C10014 EA00001A E59D0010 E5900010 E30B1328 E3471017 EBFF1F32 E59D1010 E5D10014 E2800001 E5C10014 EA000010
0x70179258 E59D0010 E5900010 E30B1334 E3471017 E30A2F58 E3472017 E3003539 EB0000B9 E59D1010
0x7017927C .L__FUNCTION__._ZN7x8x3x26xIJEE12xEv
0x7017927C E5D10014 E2800001 E5C10014 EA000003 E59D1010 E3A00000 E5C10014 EAFFFFFF EAFFFF91 E28DD018 E8BD8800 2A9D627C
0x701792AC 404525DF 41BB999A 00000000 00000000 00000000 00000000 00000000 00000000
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and after

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0x701790A4 .L.str
0x701790A4 6362696C 62612B2B 00203A69
0x701790B0 .L__const._ZN7x6x20x21xEv.regs
0x701790B0 28001000 00000080
0x701790B8 .L.str
0x701790B8 5464654C 006B7361
0x701790C0 .L.str
0x701790C0 7017AA50 7017A960 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0x701790F0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0x70179120 E5810000 E30E1603 E3451003 E30E264C E3452003 E3A00000 E3A030F2 EB003CC7 EAFFFFFF E3040E42 E3400049 E58D0078
0x70179150 EB00895E E3A01003 E28D2078 E3A03004 EB005366 E3500000 1A00000F EAFFFFFF E30E066F E3450003 E1A0100D E5810004
0x70179180 E30E0669 E3450003 E5810000 E30E1603 E3451003 E30E264C
0x70179198 .L.str.13
0x70179198 E3452003 E3A00000 E3A030F9 EB003CAD EAFFFFFF E30E16A5 E3451003 E28D0060 E58D0010 E3A02018 E58D2014 EB0004BC
0x701791C8 EB008940 E59D2010 E59D3014 E3A01004 EB005348 E3500000 1A00000F EAFFFFFF E30E066F E3450003 E1A0100D E5810004
0x701791F8 E30E0669 E3450003 E5810000 E30E1603 E3451003 E30E264C E3452003 E3A00000 E3A03C01 EB003C8F EAFFFFFF E3040655
0x70179228 E3400046 E58D005C E3040142 E3440C4C E58D0058 EB008923 E3A01005 E28D2058 E3A03008 EB00532B E3500000 1A00000F
0x70179258 EAFFFFFF E30E066F E3450003 E1A0100D E5810004 E30E0669 E3450003 E5810000 E30E1603
0x7017927C .L__FUNCTION__._ZN7x8x3x26xIJEE12xEv
0x7017927C E3451003 E30E264C E3452003 E3A00000 E3003107 EB003C72 EAFFFFFF E30E16C5 E3451003 E28D0041 E58D0008 E3A02017
0x701792AC E58D200C EB000481 EB008905 E59D2008 E59D300C E3A01006 EB00530D E3500000
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The copy-tables looks the following:

Fullscreen
1
2
3
4
0x7017B2E0 TESTCODELOG, __TI_table_TESTCODELOG
0x7017B2E0 0001000C 50000000 701790C0 00002220
0x7017B2F0 TESTCODESIG, __TI_table_TESTCODESIG
0x7017B2F0 0001000C 50002220 701790C0 000006F0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The copied data is exactly the data which is located in EXTRAM.

what now happens is that the call of the line "new SignalTest()" brings the device to an data abort. The problem is, I can not really debug, because as you see the symbol-demangling uses the first seen symbol and not the actually loaded ones.

It won't crash if linked normaly into the sections, so it's not a problem of the classes/objects.

It doesn't matter if I load the LogTest or SignalTest.

Also I wondered if it may happen because of the sections. So I wondered which sections does the compiler generate? Because just linking the libs I thought it may include all the sections of this library. Which would be fine, but I was not sure about the initialization process of some const-objects which are initialized before call of main(). So I tried playing with the .init_array-section and putting it somewhere else but this did not help.

The documentation has a lack of explanation which sections are generated. There are many chapters which are sometimes redundant but none of those does explain all the possible sections generated, so I can just guess.
I found informations here:

1. https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/runtime_environment/memory-model.html

2. https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/migration_guide/updating_linker_command_files.html?highlight=section

3. https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/intro_to_object_modules/introduction-to-sections-stdz0691509.html?highlight=section

But none are really explaining how to use something. So I tried just putting in the sections text, rodata and data. But if the init_array-section is missing the loaded program won't even run to main but abort inside of __TI_auto_init but I think that may happen because we put the global init_array into TCMA.

So this wasn't a solution anyways.

The problem also happens if we just use one lib with different load- and run-addresses without the UNION statement, so it's not related to overlapping.:

Fullscreen
1
2
3
4
.testSignal:
{
-l libtestSignal.a
} load > MCU1_0_EXTRAM_CODE, palign(8), run > MCU1_0_R5F_MEM_TEXT, table(TESTCODESIG)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Debugging in code showed that the Constructor-Call does not work correctly. The allocation beforehand with new works, we use our own heap-implementation.

After some steps I saw in the disassembly the program just steps through the copied code, and whats interesting, it does just continue stepping and even steps to the 00000000 parts in exectuion inside this copied part:

Fullscreen
1
2
3
4
0x7017B5C0 SignalTest()
0x7017B5C0 E92D4800 E24DD018 E58D0014 E58D1010 E58D200C E5CD300B E59D0014 E58D0004 E59D1010 EBFF7F95 E59D0004 E2800008
0x7017B5F0 E59D1010 E59D200C E5DD300B EBFF49F3 E59D0004 E30B1710 E3471017 E2812008 E5802000 E2811044 E5801008 E3A01000
0x7017B620 E5801010 E5801014 E5801018 E28DD018 E8BD8800 00000000 00000000 00000000
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

after some more steps the abort happens.

So I have no more idea what I am doing wrong here. Any suggestions? Is there a special handling with C++-code?

Best regards

Felix

  • Please understand that this code ...

    Fullscreen
    1
    2
    3
    4
    .testLog:
    {
    -l libtestLog.a
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    ... says to create an output section named testLog.  The input sections include all the .text sections from libtestLog.a, which contains the code for the functions.  But it also includes all the other input sections from libtestLog.a, such as .rodata.  You may not intend that.  If you only want the code for the functions to be in this custom overlay, then write ...

    Fullscreen
    1
    2
    3
    4
    .testLog:
    {
    -l libtestLog.a(.text)
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Another point to keep in mind ... Everything in the program always thinks everything in the output sections testLog and testSignal are always at the run address.  The only exception is the copy_in function.  You have to make sure that those output sections really are present at the run address before any other part of the program attempts to read or write any of it.

    Please let me know if these suggestions resolve the problem.

    Thanks and regards,

    -George

  • Hey George, sorry for the late reply, I will test this soon, hopefully tomorrow.

  • Hey George,

    so I found time again and tested just copying in the text-section. It does not work either, I still get an data-abort.

    I tried it like this now:

    Fullscreen
    1
    2
    3
    4
    .testSignal:
    {
    -l libtestSignal.a(.text)
    } load > MCU1_0_EXTRAM_CODE, palign(8), run > MCU1_0_R5F_MEM_TEXT, table(TESTCODESIG)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I see the code is copied correctly but it does not run as expected.

  • I see the code is copied correctly

    I further presume this copy completes before any call to the functions in this code.  This is as far as my expertise goes.  There must be some other system related issue causing the problem.

    One possibility is that the CPU is reading old instructions from some cache when it should read new instructions from memory instead.  

    That being the case, I am handing responsibility for this thread over to experts on AM243x systems.

    Thanks and regards,

    -George

  • Thanks George,

    yes, it's arranged like in my first post, so directly after a copy the function/CTOR related to the code will be called. So I guess it could be really a problem with cache. I'm not that deep into that topic, but I'm stepping through the dissassembly and the assembly-commands there look exactly like the ones copied. Is it possible here, that this view is not correct since in cache there are still possibly other assembly-commands stored, which are not shown correctly inside CCS?
    This means I need to invalidate the cache, which of course will need a call to the registers of the AM24.

    Thanks so far! I will have a look at the TRM and try to invalidate the cache and see what happens and keep this thread updated.

  • Hi Felix,

    This thread came back to me again. I am on vacation. I will get back to you later today or tomorrow. Thanks!

    Ming

  • Hi Felix,

    I agree with George's assessment. I think it is cause by the cache. Can you put the memory section for the copy-to area in non-cached area? Can you share the linker.cmd and the example.syscfg file with us, so that we can help to identify where the problem might be?

    Thanks!

    Ming

  • Hey Ming,

    I tried it and yes it works. I get no more abort.

    Since we want the region where it is copied in to be cached, is it possible to invalidate the caches?

  • Hi Felix,

    It depends on how do you load the code into memory.

    1. If you use the SBL to do the loading, then after the SBL loaded the application image into the memory before it branch to the entry point of the application code, you can add the cache invalidate operation for the cached memory sections.

    2. If you use the CCS to load the application image, then you  may have to do the cache invalidate in the CCS with memory browser.

    Best regards,

    Ming 

  • Since we want the region where it is copied in to be cached, is it possible to invalidate the caches?

    While for a different ARM Cortex-R5 based device, TMS570LC4357: Moving a function to RAM has some code about cleaning the cache (does a clean rather than invalidate to ensure any modified data in the cache is written to memory).

  • just a tick from me, to keep this updated. Still had no time to investigate

  • Hi Felix,

    Please let us know when you get new result.

    Thanks!

    Ming

  • My suggestion would be to use dsb and isb instructions after your copy is complete.

  • Hey Ming, so I tried to invalidate the cache as follows:

    Fullscreen
    1
    2
    copy_in(&TESTCODESIG);
    CacheP_inv(reinterpret_cast<void*>(TESTCODESIG.recs[0].run_addr), TESTCODESIG.recs[0].size, CacheP_TYPE_ALLP);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    But this didn't help.
    I also tried to find the cache via the memory browser and according to the TRM it should be located at 0x074000000 and 0x074800000 for the first R5F-Core, right? but the memory browser doesn't show me anything but a lot of zeros.

    So the run-addr is 0x7017B440 which is cache line aligned to 64Byte if I got the documentation of the SDK right. Also the size is 1472 which is also a multiple of 64Byte.

    I didn't try the other solutions written down here but I may in the next step.

    Am I doing something wrong here?

  • Hi Felix,

    As mentioned by Chester, the memory copy (from Flash to Memory) also need to be flushed. You may want to use the CacheP_wbInv(), instead of CacheP_inv().

    Best regards,

    Ming

  • Thanks Ming and Chester. that worked!