This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: LTS1.3.1-linker problems which produce undefined aborts

Part Number: MCU-PLUS-SDK-AM243X

Hello,

so we have a bit more complex project. We are first creating relocatable output-modules, those are then linked in the final linking with a load-address and a run-address both inside a union. So the idea is to decide at runime which .text and .data to load from an external RAM into the internal SRAM of the Sitara. For pre-main-const-init I placed all .rodata in sections that have the same run and load-address, Becaus I am not sure about what the compiler will do with the rodata of the relocatable output modules.

The relocatable output modules have a lot of hidden symbols (when linking: -Wl,--hide=* and -Wl,--unhide= for symbols to make available to the outside) since both define the same symbols a lot. They also have a lot of references which are not compiled in. Since they use freeRTOS-functions and drivers of the AM243X-SDK. They are also linked in in the final linking.

When partial-linking I also needed to define separate sections for the output-modules since I couldn't find any other was to reference to the already linked-in libs:

SECTIONS
{
    .textFastLibs:
    {
        -l "lib1fastlib"(.text)
    }
    .rodataFastLibs:
    {
        -l "lib1fastlib"(.rodata)
    }
}

and the same thing for a lib2.

So I placed them inside the linker like this (names changed due to sensitive information):

    UNION
    {
        .lib1Code: 
        {
            -l "lib1usinglib.a"(.text),
            -l "lib1.out"(.text),
        } load > MCU1_0_PSRAM_CODE, palign(8), table(LIB1_CODE)
        
        .lib2Code: 
        {
            -l "lib2usinglib.a"(.text),
            -l "lib2.out"(.text)
        } load > MCU1_0_PSRAM_CODE, palign(8), table(LIB2_CODE)
    } run > MCU1_0_R5F_MEM_TEXT

the same for the .data:

    UNION
    {
        .eipData: 
        {
            -l "lib1usinglib.a"(.data),
            -l "lib1.out"(.data),
            -l "lib1.out"(.bss)
        } load > MCU1_0_PSRAM_DATA, palign(8), table(LIB1_DATA)
        
        .pntData: 
        {
            -l "lib2usinglib.a"(.data),
            -l "lib2.out"(.data),
            -l "lib2.out"(.bss)
        } load > MCU1_0_PSRAM_DATA, palign(8), table(LIB2_DATA)
    } run > MCU1_0_SPACE

and the fast-libs shall reside in TCM:

    UNION
    {
        .fastTextLib1:
        {
            -l "lib1.out"(.textFastLibs)
        }  load > MCU1_0_PSRAM_CODE, palign(8), table(LIB1_FAST_CODE)
        .fastTextLib2: 
        {
            -l "lib2.out"(.textFastLibs)
        }  load > MCU1_0_PSRAM_CODE, palign(8), table(LIB2_FAST_CODE)
    } run > R5F_TCMA

Also the .rodata-sections are placed in SRAM independently, so with unique load- and run-addresses each that the const-init does not get confused.

In my understanding now all relevant sections of the relocatable output-module should be defined and set correctly.

The map-file shows me this:

SEGMENT ALLOCATION MAP

run origin  load origin   length   init length attrs members
----------  ----------- ---------- ----------- ----- -------
00000000    00000000    00000048   00000048    r-x
  00000000    00000000    00000040   00000040    r-x .vectors
  00000040    00000040    00000008   00000008    r-- .ARM.exidx
00000100    500e0150    00001e40   00001e40    r-x
  00000100    500e0150    00001e40   00001e40    r-x .fastTextLib1
00000100    500dddc0    00002390   00002390    r-x
  00000100    500dddc0    00002390   00002390    r-x .fastTextLib2
00002490    00002490    000016a0   000016a0    r-x
  00002490    00002490    00000c40   00000c40    r-x .text.hwi
  000030d0    000030d0    00000490   00000490    r-x .text.cache
  00003560    00003560    000002e0   000002e0    r-x .text.mpu  
  00003840    00003840    00000110   00000110    r-x .text.atexit_register
  00003950    00003950    000000b8   000000b8    r-x .text.__cxa_finalize
  00003a08    00003a08    00000098   00000098    r-x .text._outs
  00003aa0    00003aa0    00000050   00000050    r-x .text._outc
  00003af0    00003af0    00000020   00000020    r-x .text.__cxa_atexit
  00003b10    00003b10    00000010   00000010    r-x .text.__aeabi_errno_addr
  00003b20    00003b20    00000008   00000008    r-x .text.__cxa_ia64_exit
  00003b28    00003b28    00000008   00000008    r-x .text._nop
00003b30    00003b30    00000800   00000000    rw-
  00003b30    00003b30    00000800   00000000    rw- .sysmem
41010000    41010000    00003300   00003300    r--
  41010000    41010000    00000100   00000100    r-- .irqstack
  41010100    41010100    00001000   00001000    r-- .fiqstack
  41011100    41011100    00002000   00002000    r-- .svcstack
  41013100    41013100    00000100   00000100    r-- .abortstack
  41013200    41013200    00000100   00000100    r-- .undefinedstack
50000000    50000000    0007a840   0007a840    r-x
  50000000    50000000    0006af90   0006af90    r-x .textStack
  5006af90    5006af90    0000f8b0   0000f8b0    r-- .rodataStack
50334840    50334840    0000bc20   0000bc20    rw-
  50334840    50334840    0000bc20   0000bc20    rw- .dataStack
70080000    70080000    00041878   00041878    rw-
  70080000    70080000    00010000   00010000    rw- .stack
  70090000    70090000    0002fa78   0002fa78    rw- .bss
  700bfa78    700bfa78    00001e00   00001e00    rw- .data
700c1880    5031e240    00016600   00016600    rw-
  700c1880    5031e240    00016600   00016600    rw- .lib1Data
700c1880    50300000    0001e240   0001e240    rw-
  700c1880    50300000    0001e240   0001e240    rw- .lib2Data
700f0000    700f0000    00000018   00000018    r--
  700f0000    700f0000    00000018   00000018    r-- .init_array
700f0020    700f0020    000669d8   000669d8    r-x
  700f0020    700f0020    000002c0   000002c0    r-x .text.boot
  700f02e0    700f02e0    00048810   00048810    r-x .text
  70138af0    70138af0    0000f8c0   0000f8c0    r-- .rodata
  701483b0    701483b0    00001170   00001170    r-- .rodataLib1
  70149520    70149520    00005980   00005980    r-- .rodataLib2
  7014eea0    7014eea0    00003600   00003600    r-- .rodataFastLib1
  701524a0    701524a0    00004558   00004558    r-- .rodataFastLib2
70156a00    500bb0d0    00022cf0   00022cf0    r-x
  70156a00    500bb0d0    00022cf0   00022cf0    r-x .lib1Code
70156a00    5007a840    00040890   00040890    r-x
  70156a00    5007a840    00040890   00040890    r-x .lib2Code
70197290    70197290    00000060   00000060    r--
  70197290    70197290    00000060   00000060    r-- .ovly

But now I can't proceed over __TI_auto_init_nobinit. It runs into an undefined-handler. I display two scenarios:

1. trampolines



So I checked the map-file again.

And there I found this in the FAR CALL TRAMPOLINES-section of the map-file (first value is the callee address, second trampolin address, third call-address):

the containing lib was placed explicitly in PSRAM (GPMC so the 0x5.... addresses) inside a section:

But if you look at the map-file where the SEGMENT ALLOCATION MAP is shown you see that this section explicitly is used for the lib1Code and lib2Code-run-address.

So it's not even a thing about the lib1 oder lib2 weren't loaded correctly it happens way before we come to this point, so even before we reach main().

Why is there a callee of a trampolin generated in the middle of the run-address-section? I also checked the memory view:

The register-view of CCS doesn't really help.

Additionally I made a core trace to catch this better:

columns are RowNo, PC, OpCode, Function, Line No, File, Directory, Cycles, Status ("undefined instruction" is the last status). Interestingle the calls before this call don't make any sense because somehow some functions are already called that should be called later on but not before main.

How can I prevent the Linker from doing such things? Or am I the one doing a mistake here?

I guess it's important to notice the lib which uses this trampoline is in no way connected to the libs which are running at this section later on. Also it's not the only callee-address which is located there but probably the first one which gets called and produces this undefined abort.

2. static global

So I tried removing the part where at least the CCS crashed (the corresponding object is not created) And now it crashes at another part which is not inside the run-section:

the map-file changed to the following:

SEGMENT ALLOCATION MAP

run origin  load origin   length   init length attrs members
----------  ----------- ---------- ----------- ----- -------
00000000    00000000    00000048   00000048    r-x
  00000000    00000000    00000040   00000040    r-x .vectors
  00000040    00000040    00000008   00000008    r-- .ARM.exidx
00000100    500de950    00001e40   00001e40    r-x
  00000100    500de950    00001e40   00001e40    r-x .fastTextLib1
00000100    500dc5c0    00002390   00002390    r-x
  00000100    500dc5c0    00002390   00002390    r-x .fastTextLib2
00002490    00002490    000016a0   000016a0    r-x
  00002490    00002490    00000c40   00000c40    r-x .text.hwi
  000030d0    000030d0    00000490   00000490    r-x .text.cache
  00003560    00003560    000002e0   000002e0    r-x .text.mpu
  00003840    00003840    00000110   00000110    r-x .text.atexit_register
  00003950    00003950    000000b8   000000b8    r-x .text.__cxa_finalize
  00003a08    00003a08    00000098   00000098    r-x .text._outs
  00003aa0    00003aa0    00000050   00000050    r-x .text._outc
  00003af0    00003af0    00000020   00000020    r-x .text.__cxa_atexit
  00003b10    00003b10    00000010   00000010    r-x .text.__aeabi_errno_addr
  00003b20    00003b20    00000008   00000008    r-x .text.__cxa_ia64_exit
  00003b28    00003b28    00000008   00000008    r-x .text._nop
00003b30    00003b30    00000800   00000000    rw-
  00003b30    00003b30    00000800   00000000    rw- .sysmem
41010000    41010000    00003300   00003300    r--
  41010000    41010000    00000100   00000100    r-- .irqstack
  41010100    41010100    00001000   00001000    r-- .fiqstack
  41011100    41011100    00002000   00002000    r-- .svcstack
  41013100    41013100    00000100   00000100    r-- .abortstack
  41013200    41013200    00000100   00000100    r-- .undefinedstack
50000000    50000000    00079038   00079038    r-x
  50000000    50000000    00069790   00069790    r-x .textStack
  50069790    50069790    0000f8a8   0000f8a8    r-- .rodataStack
50334840    50334840    0000bc20   0000bc20    rw-
  50334840    50334840    0000bc20   0000bc20    rw- .dataStack
70080000    70080000    00041860   00041860    rw-
  70080000    70080000    00010000   00010000    rw- .stack
  70090000    70090000    0002fa78   0002fa78    rw- .bss
  700bfa78    700bfa78    00001de8   00001de8    rw- .data
700c1860    5031e240    00016600   00016600    rw-
  700c1860    5031e240    00016600   00016600    rw- .eipData
700c1860    50300000    0001e240   0001e240    rw-
  700c1860    50300000    0001e240   0001e240    rw- .pntData
700f0000    700f0000    00000018   00000018    r--
  700f0000    700f0000    00000018   00000018    r-- .init_array
700f0020    700f0020    000662a8   000662a8    r-x
  700f0020    700f0020    000002c0   000002c0    r-x .text.boot
  700f02e0    700f02e0    00048570   00048570    r-x .text
  70138850    70138850    0000f430   0000f430    r-- .rodata
  70147c80    70147c80    00001170   00001170    r-- .rodataLib1
  70148df0    70148df0    00005980   00005980    r-- .rodataLib2
  7014e770    7014e770    00003600   00003600    r-- .rodataFastLib1
  70151d70    70151d70    00004558   00004558    r-- .rodataFastLib2
701562d0    500b98d0    00022cf0   00022cf0    r-x
  701562d0    500b98d0    00022cf0   00022cf0    r-x .lib1Code
701562d0    50079040    00040890   00040890    r-x
  701562d0    50079040    00040890   00040890    r-x .lib2Code
70196b60    70196b60    00000060   00000060    r--
  70196b60    70196b60    00000060   00000060    r-- .ovly

and the address is now in the .data-section:

The core-trace is now really weird:

Could it be that at some earlier point the software starts to take the wrong way somehow? But the question would be: why?

I wanted to disassembly-debug the whole process but as soon as I get into the first branches inside __TI_auto_init_nobinit and in the __cxx_global_var_init which are opening the corresponding files in the view, CCS constantly crashes all the time. So it's impossible to debug this problem.

Compile-options we use:
        "-Wno-gnu-variable-sized-type-not-at-end"
        "-mcpu=cortex-r5"
        "-mfloat-abi=hard"
        "-mfpu=vfpv3-d16"
        "-Wno-error=ti-macros"
        "-Wno-unused-function"
        "-Wno-invalid-command-line-argument"
        "-fno-rtti"
        "-ffunction-sections"
        "-fdata-sections"
        "-mno-unaligned-access"

Additional link-options we use:
"-Wl,--reread_libs"
"-Wl,--ram_model"
"-Wl,--diag_suppress=10063"
"-Wl,-e_vectors"

Best regards

Felix

  • update:

    So I followed the links under "similar topics" and found why my CCS is crashing all the time. then I found this:
    https://software-dl.ti.com/codegen/esd/cgt_public_sw/ARM_LLVM/2.1.3.LTS/README.html#codegen-10229-tiarmclang-compiler-generated-debug-information-can-cause-code-composer-studio-ccs-to-crash

    using the workaround lets me debug further and now I found the problematic point (currently using the compiled elf of 2.):

    The pop-call here:

    runs into:

    I made a screen of the registers when reaching the pop:

    It's interesting because at every other constellation, with the same compiler options, linker options and so on it works.
    Only the stuff with the run- and load-adresses was changed here. And:
    the problematic pop is neither in a load or run-address. Next I will try checking if it is related to the load and run-stuff by explicitly not using it. Then we may check further with the relocatable-output module.

    But it is to mention the relocatable output-module only contains C-Code. The C++-Code only is written and used by us.

    Edit:

    I removed the load and run-stuff. Which means I directly link the .out-file. I tried it for both, lib1 and lib2. I also removed the copy_in-party for the copy-tables of the linker and the use of the other library.

    It works then. Each lib on its own, with the same mentioned sections directly linked in.

    So I guess there is a problem related to the load- and run-address-stuff. Am I doing something wrong there? I earlier tried it with one small C++-Object which worked flawlessly so I thought this would also work with bigger libs. But it seems the linker gets confused here Or the compiler generates jumps to places that do not exist or are wrong?

    Regards

    Felix

  • Hi Felix,

    I have assigned to our subject matter expert, they will be able to help you on this item.

    Thanks,
    G Kowshik

  • Hi Felix -- You describe a lot of things here.  What I glean from this is that you're able to work through the CCS issues, but you still have an issue with what appears to be improper trampoline generation, is that right?  The linker will generate trampolines between calls that are spaced too far away in the memory map, but it isn't clear from what you've written why you're seeing them being generated.  Can you describe what is incorrect about the trampoline in the run/load case?

    I'd also like it if you could reduce the project down to something more reproducible so that we can try to get to the bottom if the linker issues.

    please follow the directions in the article How to Submit a Compiler Test Case

    It's OK to just include object files along with your linker command file, as long as we have what we need to reproduce the issue from the link step using your linker command file.

    Thanks

  • Hi Alan,

    ok, yes it was a bit confusing with all that stuff.
    I may summarize it:

    I noticed that when using load- and run-addresses with two relocatable output modules in a union in the Linker, the __TI_auto_init_nobinit then runs code which jumps via the assembly-pop to wrong addresses. The code where the pop jumps wrong is C++-Code. The relocatable output modules are only C-Code which have itself hidden symbols (because they define the same symbols) and unhidden functions for the API and references which are only resolved in the final linking.

    Linking a relocatable output module itself with the same run- and load-addresses works fine for both. Only a different run- and -load-address and the union of the output-modules leads to that error.

    The problematic code is not related to the output modules. It is only inside the application-code and C++-Code.

    It also does not seem to be bound to trampolines, this seems just to be one expression of the problem. Mainly it is because a pop jumps to a wrong address and then runs commands which are not the right commands, since it is any code, which can be runnable .text- code or just some .data and so on. And any code is mostly not a ARM-command and thus the undefined abort occurs.

    So I would guess now that the const-init, so the init before main, is the problematic part.

    For reproduction:

    Since the relocatable output modules themselves are third-party NDA-code and the project is quite complex with a lot of abstraction layers, linker-specialities, a CMake-build-system and so on it's not that easy to provide a minimal example or a reduction. We do not use CCS as a build-system. But I think since the output-modules itself with the same run- and load-address each just run fine we can start here.

    I can cut some created objects and files which use const-initialization to reduce it but more may be not possible.

    but I can send you the linker-file, the compiled and linked relocatable output modules with its linker- and map-files each and the final map- and elf-files. It are just too many object files to make them easily compile at your side. Or we can somehow give you access to the whole project.

    First I will start reducing the created objects and so the running C++-const-init-code.

    Best regards

    Felix

  • Hey Alan.

    So I can reproduce a case where the error occurs and where it does not.

    I removed the most part of our created C++-objects. And I am possible to run to main(). As soon as I active the creation of one of the C++-Objects the problem occurs again. Notice that I am running the case 3 (see below). so with relocatable output modules with different run- and load-addresses in a UNION.

    So I dissassembly-stepped the procedure. The activated C++-Object has some const-init at the beginning which uses also some etl-functionality and this seems to issue the problem. Sadly I cannot completely debug, since CCS now crashes even with the workaround. But it does somewhere inside the "lower_bounds"-function somewhere in etl::flat_map. But nevertheless I think the problem is not directly located there because it works in different conditions.

    So to summarize:

    common for all cases:

    - two relocatable output modules (lib1, lib2 for this case) completely in C. both have hidden symbols and unhidden symbols and also open references which are linked in the final linking. In this case to the sdk-driver and the sdk-freeRTOS. not to any C++-Stuff. it's complete C-Linkage.

    - C++-lib-sections are linked via .text, .data, .rodata and .bss partly into gpmc-area (from 0x50000000 on) and partly directly into SRAM. the C++-libs are static libs.

    1. (working):

    - relocatable output module lib1 linked into SRAM via Linker-script for .text, .data, .rodata and .bss explicitly.

    2. (working):

    - same as 1. but with lib2

    3. (not working, so not even reaching main()):

    - linking as in first post in this thread: relocatable output modules lib1 and lib2 are put into a UNION for their .text, .data and .bss. This UNION has one run-address in SRAM, while lib1 and lib2 each have a different load-address in PSRAM for their .text, .data and .bss.

    - .rodata of lib1 and lib2 is linked in a separate SRAM memory-area each. no different run- and load-addresses or union involved.

    - this is the setup for the entrance-example-with and without the issuing c++-object. With const-init of C++ it does not work. without const-init of C++ it does.

    I am clueless what can casue the problem since a completely unrelated part is affected.

    Best regards

    Felix

  • Thanks!  At this point, it'd probably be worth having a meeting to go through this visually so that we can debug with you.  I can set something up with you (and pull in George Mock, one of our experts) for next week. Is there a particular day / time that would work best for you?

  • Yes I also thought of that. Not sure about the time-shift. We are UTC+1. So I would be available on tuesday from 11am to 6pm. same for Wednesday.

  • I'll contact you privately.

    Thanks and regards,

    -George

  • In advance to our meeting later:

    I think I figured something out that can lead to the problem:

    The documentation for relocabtable modules links to https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/linker_description/10_partial_incemental_linking/partial-incremental-linking-stdz0756731.html#stdz0756731. And incremental linking states:

    "If the intermediate files have global symbols that have the same name as global symbols in other files and you want them to be treated as static (visible only within the intermediate file), you must link the files with the –make_static option"

    So since this was the case for me this lead also to that functions like malloc, memset and so on, so functions provided by the compiler, were linked in but not visible to the outside. So I needed to make them global in the second step, and weak via objcopy in a third step and that seems at least to minimize the problem. I will have some more info until our debugging session and we can dicsuss this then.

    It's mainly about those:

    maybe there is a way to not "make global" each symbol but all symbols of the specified libs?

    Best regards

    Felix

  • Here is one change I am confident will help.  When partial linking, add the option -nostdlib.  By default, the compiler RTS libraries are part of a link.  This option disables that behavior.  It is documented here.  The final link needs no changes.  That is when the compiler RTS libraries are linked in.

    Thanks and regards,

    -George

  • Hey George,

    great! This worked!

    At least I am now running until I copy the stuff in via the copy-tables. there it seems to still have a bit of an issue, but that's a different topic.

    Thank you!

    I would also suggest to add this information to the partial-linking and/or relocatable-section in the LTS-manual.

    Best regards

    Felix

  • Hey George,

    ok, I think I just need to ask further:

    At least it is now running until the copy-tables get access and stuff should be copied in. Cash is invalidated after copying in and befor calling any of the copied-in-functions. From there on it sometimes runs into an data-abort or loses connection at all and can't be halted anymore.So I tried to figure it out by debugging but that's a bit hard since CCS has double definitions for the same memory-area and thus the debug-stepping jumps a lot around and I tried to get through the disassembly but without success. So I checked the mapfile if the addresses are right and found this:

    I saw that even with the compiler-option -nostdlib libc-stuff is compiled into the relocatable modules. This is an extract of the map-file of one of the relocatable modules:

    I noticed that because a sys_mem-section just appeared in the final linking which wasn't there before because we do not use the sys_mem-section in any case. We use our own malloc-implementation (which also seems to work until the copy-in takes place), from which I expect to overwrite the compiler-malloc-one. Which it did before and still does.

    the final-link-map-file shows this here:

    here the both red-masked outs are the separate output-modules. This is a separate section of the relocatable output-modules which contains the text-sections of a linked-in-library in their run-address (textFastLibs of the first post in this thread). The sys_mem just got automatically put behind them.

    Also I saw stuff like this in the final map-file:

    This is one of the load-addresses of one relocatable output-module.

    the same symbols are present for the other relocatable output-module at its load-address:

    The symbols don't seem to appear at the run-addresses.

    I just need to know if this should happen and if that's normal so I can better understand what could go wrong in my case and it's not related to the compiler and the topic we tried to solve in our session. At least the sys_mem should not exist for our application.

    Best regards,

    Felix

  • ok follow up. Guess I found the problem:
    I thought of switching to LTS2.1.2. I dunno why but somehow with that compiler and LTO on it goes way further. copying in works and so do the allocations and constructions of the objects with the code loaded into the run-addresses.

    But the sysmem-topic is definetly present. As soon as the code of the relocatable object calls a malloc it does not jump to our implementation but to the one that is compiled/linked into the relocatable object. The map-file cannot really tell that but I guess that because the shown address of the calloc is inside the run-address, which is definetly wrong:

    map-file:

    and this matches the code because then a nullptr is returned and thus the library inside the relocatable output module asserts.

    To debug this better I deactivated one of the both loaded modules (and removed it from the linker-file) and only used one, so CCS does not get that confused. also i needed to set hw-breakpoints.

    I may make those symbols global in the relocated output module and weaken the symbols via objcopy later but not sure if this is the right solution?

    best regards

    Felix

  • I want to focus on the problem of any function from the compiler RTS library being part of any partial link.  Because you use -nostdlib, that should not occur.  I want to reproduce that behavior.  To do that, I need the exact text of the command used to invoke the partial link, and all of the object files, libraries, and command files used by that command.  If any command file refers to yet other files, I need those files as well.  Is this something you can provide?

    Thanks and regards,

    -George

  • Hey George, yes I can provide this. I will text you in a private message.

    In advance I also checked all the created commands by the CMake-system we use. Indeed there is another included static-library linked into the relocatable output module but it's also compiled with -nostdlib. I also ensured there is no -ram_model or -e_vector option passed while linking, because CMake sometimes passes linker-options all the way down and that's of course not wanted.

    I will create a zip with all the needed files and send it to you.

    Best regards

    Felix

  • The rest of this thread will continue in private messages.  For now, I'll mark it closed.

    Thanks and regards,

    -George

  • To summarize this thread ... The customer made some minor errors when using -nostdlib.  Once those were fixed, everything worked well.

    Thanks and regards,

    -George