MCU-PLUS-SDK-AM243X: Debugging firmware with different load- and run-addresses

Felix Heil

Hello,

So I have a firmware which uses different load- and run-addresses. We also use the STS 3.1.0 compiler and MCU PLUS SDK 9. CCS is version 12.4

This one worked some months ago (with older SDK and LTS 2.1.3), but now we are reactivating it and somehow only one of the code-parts work, but the other doesn't.

Without making it too complex for now: We made a UNION in the linker-script like it is described in the compiler-manual, as well we are using the generated copy-tables, the copy_in-command with the respective run-addresses and a CacheP_wbInv at exactly those addresses afterwards.

Debugging is not possible. It always loads the file which is related to the not active code-part and not the one that is really loaded. Thus the symbols shown also don't match.

How can I manually tell CCS to use the other file and symbols?

Those are the generated tables:

Also I noticed that I can't somehow check the memory view as soon as I want to check an address after 0x701a5360 it says: "MAIN_Cortex_R5_0_0: Trouble Reading Memory Block at 0x701a5360 on Page 0 of Length 0x1bd"

an example of the memory-view:

here it's still working:

as soon as I scroll further:

If I scroll back again it works and further it doesn't work again and so on.

Does this have something to do with the cache writeback invalidation?

Additionally I noticed that the loaded code even at the load-address is not the real code that should be loaded. So of course the copied in code is also the wrong code. The generated tables point to valid addresses and at least for one part it works also correctly and the code is correct in the load-address and run-address later on. The other part somehow has just garbage at its load-address.

the linker script would put the sections to its load-address here, like seen in the map-file:

notice two sections have the same run-address.

So I thought about dumping the sections directly out of the elf-file with the command:

tiarmobjcopy --dump-section .XCode=path/Xcode.bin (first one)

tiarmobjcopy --dump-section .YCode=path/Ycode.bin (second one)

I checked them with a hex-viewer and "XCode"-section looks different (so correct) than the one loaded at the load-address by CCS (which is garbage).
So the section that is defined in the linker script should look like this:

so for the first "XCode"-section it looks like this at the load-address:

It's not what is in the dumped section-bin-file.

As we may see the other "YCode"-section:

and what's at the load-address:

if we change endianess it's exactly the expected code. So this one works!

The code for them in the linker-script looks like this:
     UNION
     {
         .XCode:
         {
             -l "libX.a"(.text),
             -l "relocatableobjectX.out"(.text),
         } load > MCU1_0_PSRAM_CODE, palign(8), table(X_CODE)

         .YCode:
         {
             -l "libY.a"(.text),
             -l "relocatableobjectY.out"(.text),
         } load > MCU1_0_PSRAM_CODE, palign(8), table(Y_CODE)
     } run = MCU1_0_R5F_MEM_TEXT

in the source (with dummy-names, since original code is nda and so on):

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
extern COPY_TABLE X_CODE;
extern COPY_TABLE Y_CODE;
    
    
if (/* a check at runtime */)
{
    copy_in(&X_CODE);
    CacheP_wbInv(reinterpret_cast<void*>(X_CODE.recs[0].run_addr), X_CODE.recs[0].size,
                 CacheP_TYPE_ALLP);
    // run libX-code here
}
else 
{
    copy_in(&Y_CODE);
    CacheP_wbInv(reinterpret_cast<void*>(Y_CODE.recs[0].run_addr), Y_CODE.recs[0].size,
                 CacheP_TYPE_ALLP);
    // execute libY-code here
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    
extern COPY_TABLE X_CODE;
extern COPY_TABLE Y_CODE;
    
    
if (/* a check at runtime */)
{
    copy_in(&X_CODE);
    CacheP_wbInv(reinterpret_cast<void*>(X_CODE.recs[0].run_addr), X_CODE.recs[0].size,
                 CacheP_TYPE_ALLP);
    // run libX-code here
}
else 
{
    copy_in(&Y_CODE);
    CacheP_wbInv(reinterpret_cast<void*>(Y_CODE.recs[0].run_addr), Y_CODE.recs[0].size,
                 CacheP_TYPE_ALLP);

    // execute libY-code here
}

So since the section is correct in the elf-file itself as we checked when we dumped it, what happens here when I load the elf-file? what can go wrong that even at the load-address the wrong code is loaded?

My Debug-configuration for Program/memory load is the following:

I also tried to run a program verification but it says that it's ok. what I doubt heavily.

Best regards

Felix

over 1 year ago

0 Erik Friedel over 1 year ago

TI__Expert 5765 points

Hello Felix,

Are you using the same AM243x device in terms of GP versus HS-FS?

To determine if the device is secure, refer to field parameter for device revision: "r" of the Device name. If the Device Revision is "B" or a subsequent alphabetical letter, then the device is a Secure device.

The proper expert for this topic is out of office until Wendesday and so please wait until then for additional support on your question.

Regards,

Erik

0 Felix Heil over 1 year ago in reply to Erik Friedel

Expert 1130 points

Hey Erik,

in this case it's a gp-device.

Best regards

Felix

0 Robert Czech over 1 year ago

Prodigy 150 points

Hi Erik,
additional debugging (enabling the log of the CCS Debug Server) showed:

The sections placed in the UNION are correctly read from the elf File:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: Program Header contents @ index 4
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_type: 0x00000001
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_offset: 0x0009e000
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_vaddr: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_paddr: 0x50098ce0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_filesz: 0x00039540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_memsz: 0x00039540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_flags: 0x00000005
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_align: 0x00000010
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: PT_LOAD: 
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: OFS LOAD Section added:  name: .text
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   size in bytes: 234816, or 0x39540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   load location: 0x50098ce0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   run location: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   memory page: 0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   offset in file: 647168 or 0x9e000
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: Program Header contents @ index 5
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_type: 0x00000001
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_offset: 0x000d7540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_vaddr: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_paddr: 0x500d2220
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: Program Header contents @ index 4
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_type: 0x00000001
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_offset: 0x0009e000
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_vaddr: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_paddr: 0x50098ce0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_filesz: 0x00039540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_memsz: 0x00039540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_flags: 0x00000005
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_align: 0x00000010
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: PT_LOAD: 
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: OFS LOAD Section added:  name: .text
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   size in bytes: 234816, or 0x39540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   load location: 0x50098ce0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   run location: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   memory page: 0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   offset in file: 647168 or 0x9e000
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: Program Header contents @ index 5
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_type: 0x00000001
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_offset: 0x000d7540
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_vaddr: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_paddr: 0x500d2220
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_filesz: 0x0001b400
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_memsz: 0x0001b400
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_flags: 0x00000005
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: p_align: 0x00000010
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS I: PT_LOAD: 
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D: OFS LOAD Section added:  name: .text
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   size in bytes: 111616, or 0x1b400
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   load location: 0x500d2220
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   run location: 0x7019e740
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   memory page: 0
0x00004480 39142 4 MAIN_Cortex_R5_0_0 OFS D:   offset in file: 881984 or 0xd7540

But an error (3 because of 3 UNIONs in our linker file) is shown:

Fullscreen

1
2
3
4
5
6
7
8
0x00004480 39367 3  LLDB I: WARNING: Found overlapping ELF segments. Will attempt to divide segments into non-overlapping parts.
0x00004480 39367 3  LLDB I: Error: Found overlapping ELF segments with matching vaddr. Cannot procede. Discarding all segment changes.
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

0x00004480 39367 3  LLDB I: WARNING: Found overlapping ELF segments. Will attempt to divide segments into non-overlapping parts.
0x00004480 39367 3  LLDB I: Error: Found overlapping ELF segments with matching vaddr. Cannot procede. Discarding all segment changes.
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping PT_LOAD segment. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?
0x00004480 39367 3  LLDB I: Ignoring overlapping section. Corrupt object file?

It seams the debug server is checking the run section (virtual address, p_vaddr) instead of the load section (physical address, p_paddr).

Is there a way to change this behaviour by a setting? Is the DebugServer part of CCS installer or can the debug server be downgraded (because it worked some time ago)?

Regards,

Robert

0 Ki over 1 year ago

TI__Guru**** 450031 points

Hello Felix,

Felix Heil said:
Debugging is not possible. It always loads the file which is related to the not active code-part and not the one that is really loaded. Thus the symbols shown also don't match.

The issue is that there are multiple symbols with the same run address and hence the debugger doesn't know which one to use. In this case it is using the "wrong" symbols.

Felix Heil said:
How can I manually tell CCS to use the other file and symbols?

There are both GEL and DSS APIs to hide/show symbols for various overlay sections this works well for TI non-clang compiler output. You hide the symbols for the non-relevant sections and make sure only the relevent symbols are shown to the debugger. This will allow the debugger to pull up the correct source file and other related debug information.

The issue is with clang compiled output using the default clang symbol manager in CCS. The issue is that the default clang symbol manager for clang output does not support these APIs. Hence it is currently not possible to effectively debug overlay sections for clang output. There is a semi-workaround where you can force the debugger to use the legacy symbol manager (which still supports those APIs) with clang output but the results of this is mixed at best. There are still some issues with overlay debug regardless and it introduces other issues since the new (default) clang symbol manager is optimized for clang output while the legacy symbol manager is not.

I filed a ticket to add support for debugging overlays with the new clang symbol manager. Tracking ID: https://sir.ext.ti.com/jira/browse/EXT_EP-11506

0 Ki over 1 year ago

TI__Guru**** 450031 points

Felix Heil said:
Additionally I noticed that the loaded code even at the load-address is not the real code that should be loaded. So of course the copied in code is also the wrong code. The generated tables point to valid addresses and at least for one part it works also correctly and the code is correct in the load-address and run-address later on. The other part somehow has just garbage at its load-address.

This is unusual. I don't see this behavior in my environment. When I check the load addresses for my overlay sections in the Memory Browser view, they look correct.

Felix Heil said:
We also use the STS 3.1.0 compiler and MCU PLUS SDK 9

Felix Heil said:
This one worked some months ago (with older SDK and LTS 2.1.3)

Do you still see the same behavior when going back to the older compiler version?

0 Felix Heil over 1 year ago in reply to Ki

Expert 1130 points

Hey Ki,

thanks for the reply! Ok it is understandable that CCS does not know which symbols to show. And thanks for that internal ticket!

For the other problem: we tried to change back and somehoe the problem is the same. Please also check my colleagues additional post, because he investigated a bit further and it seems that the debug server produces this problem.

Best regards

Felix

0 Ki over 1 year ago in reply to Felix Heil

TI__Guru**** 450031 points

Felix Heil said:
Please also check my colleagues additional post, because he investigated a bit further and it seems that the debug server produces this problem.

Ah, I see Robert's thread now. I don't know how I failed to notice it before. I am investigating...

0 Ki over 1 year ago in reply to Felix Heil

TI__Guru**** 450031 points

Felix,

Ki said:
Tracking ID: https://sir.ext.ti.com/jira/browse/EXT_EP-11506

We are currently investigating this ticket. We would like to understand more your use case on how/why you are using overlays. Are you running out of memory space in a desired region to run your program from? We want to make sure we clearly understand your use case so that we implement this request properly.

Thank you

0 Felix Heil over 1 year ago in reply to Ki

Expert 1130 points

Hey Ki,

this is partly the case. At the AM243x we have 2 MB internal RAM. We also use 8 MB external pSRAM but this is slower than the internal RAM. Since we support different industrial ethernet protocols with real-time constraints but only one should run at a time we are loading the stack for the configured protocol into internal RAM. But we cannot load all the stacks at once. So the idea was to load both stacks into the pSRAM and depending on the protocol load the code into internal RAM.

Our current implementation is to separate the stacks as bin-files and relocatable output modules into parts of our fw-image which are stored in flash and then loaded from there to their respective run-addresses. This is because the MCU PLUS SDK rprc-parser does not support different load- and run-addresses. But the tiarmclang does.

So we have two scenarios:
1. production-fw (when running the SBL from flash): loads the code from flash to the run-address (via the copy-table-generated run-address) -> this solution works in production!

2. debugging (loading via CCS): uses load-addresses in pSRAM and copy_in to internal RAM is activated via a define. -> this solution does not work but is needed since we cannot flash our pcbs all the time.

Best regards

Felix

0 Ki over 1 year ago in reply to Felix Heil

TI__Guru**** 450031 points

Thank you. This is very helpful.

Just to be clear - for #1 I assume when you say "this solution works", you are referring to the target application running as expected and not the CCS debugging of overlays (which I assume is always broken).

0 Felix Heil over 1 year ago in reply to Ki

Expert 1130 points

hey Ki, sorry I didn't directly read the ticket. So if it's about the debug-symbol overlays this is of course not working from flash. I just mean the other problem when loading Code into RAM.

0 Ki over 1 year ago in reply to Felix Heil

TI__Guru**** 450031 points

Yes, understood. Thank you for the clarification.

0 Ki over 1 year ago in reply to Ki

TI__Guru**** 450031 points

Felix - we have been investigating further and found the other issue with debugging overlays with CCS for clang compiled output. The default clang symbol manager for clang output will flag sections that share the same run time address and consider this scenario invalid. Hence it will discard all but one section when loading the program. Hence the case where only one of the sections will have the code properly loaded while the other section will have empty data for it in the load address. Hence when data is copied from the load to run address, it can copy empty data in some cases and hence the application will not run correctly.

Basically if you wish to load the clang compiler program with CCS, you cannot use the default clang symbol manager. You can force CCS to use the legacy symbol manager. Steps on how to do this is documented in my post below:

https://e2e.ti.com/support/tools/code-composer-studio-group/ccs/f/code-composer-studio-forum/1067594/failed-to-link-source-code-for-debug-session/3952658#3952658

Note that the legacy symbol manager is not optimized for clang output hence the debug experience in some areas will be subpar (call stack may not unwind correctly, etc). But the program will load correctly, including all overlay sections.

Thanks

0 Felix Heil over 1 year ago in reply to Ki

Expert 1130 points

Hey Ki,

thanks for the explanation!

So we may use the legacy symbol manager for such cases but is there probably a plan to introduce a fix for this or is it not possible at all?

Best regards

Felix

0 Ki over 1 year ago in reply to Felix Heil

TI__Guru**** 450031 points

Hi Felix,

I pressed the "TI This Resolved" button for you last post by mistake. Please "reject" the "answer" to reset it.

Felix Heil said:
but is there probably a plan to introduce a fix for this or is it not possible at all?

Yes we are looking at fixing this issue. The root issue is that our symbol manager for clang is based off LLDB. Interestingly (and disappointingly), LLDB does not seem to support the concept of code overlays. It considers environments with multiple sections sharing the same run-time address as an invalid scenario. Hence any fixes we make would be custom fixes and we are trying to determine how to best navigate this effort.

Thanks

0 Ming Wei over 1 year ago in reply to Ki

TI__Mastermind 49085 points

Hi Ki,

Any progress on fixing this issue?

Best regards,

Ming

0 Ki over 1 year ago in reply to Ming Wei

TI__Guru**** 450031 points

Ming Wei said:
Any progress on fixing this issue?

We currently do not have a concrete timeline.

Arm-based microcontrollers

Arm-based microcontrollers forum

MCU-PLUS-SDK-AM243X: Debugging firmware with different load- and run-addresses