This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MCU-PLUS-SDK-AM243X: memory-alignment-problems and Optimization levels and their options by LTS 1.3.1 tiarmclang

Part Number: MCU-PLUS-SDK-AM243X

Hello,

we noticed some compiliation/memory-alignment-problems when using -O2 compared to -O1. So with -O1 everything works fine. With -O2 we get undefined and prefetch-aborts when accessing memory which was previously allocated on an tlsf-heap. We also noticed that this problem is solveable when we align our structs to allocate 8 Byte-alignments with the __attribute__((aligned(8)))-directive. We use a mix of C and C++.

So we wanted to find the root for this problem and thus we wanted to know which additional compiler-flags are set with -O2 compared to -O1. The compiler-manual does not tell much about the flags used additionally. Just what they do: https://software-dl.ti.com/codegen/docs/tiarmclang/rel1_3_0_LTS/compiler_manual/using_compiler/compiler_options/optimization_options.html (documentation of 1.3.1 compiler links to 1.3.0 at this topic). Based on an stackoverflow-post I tried to find out what the tiarmclang uses for -O2, so I tried it like this:

echo 'int;' | ./tiarmclang -xc -O2 - -o /dev/null -\#\#\#
echo 'int;' | ./tiarmclang -xc -O1 - -o /dev/null -\#\#\#

outputs are:

$ echo 'int;' | ./tiarmclang -xc -O2 - -o /dev/null -\#\#\#
TI Arm Clang Compiler 1.3.1.LTS
Target: arm-ti-none-eabi
Thread model: posix
InstalledDir: C:\ti\ti-cgt-armllvm_1.3.1.LTS\bin
 "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin\\tiarmclang.exe" "-cc1" "-triple" "thumbv7em-ti-none-eabihf" "-emit-obj" "--mrelax-relocations" "-disable-free" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "-" "-mrelocation-model" "static" "-mframe-pointer=none" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-nostdsysteminc" "-fno-zero-initialized-in-bss" "-fdef-uninit-in-bss" "-fcommon" "-ffunction-sections" "-fdata-sections" "-fno-delete-null-pointer-checks" "-fwchar-type=int" "-fshort-enums" "-target-cpu" "cortex-m4" "-target-abi" "aapcs" "-fvisibility" "hidden" "-mfloat-abi" "hard" "-fallow-half-arguments-and-returns" "-fno-split-dwarf-inlining" "-debugger-tuning=gdb" "-resource-dir" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib\\clang\\12.0.1" "-internal-isystem" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib\\clang\\12.0.1\\include" "-internal-isystem" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\include\\c" "-O2" "-fdebug-compilation-dir" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin" "-ferror-limit" "19" "-fno-signed-char" "-fgnuc-version=4.2.1" "-vectorize-loops" "-vectorize-slp" "-faddrsig" "-o" "C:\\msys64\\tmp\\--242b98.o" "-x" "c" "-"
 "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin\\tiarmlnk" "-IC:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib" "-o" "nul" "C:\\msys64\\tmp\\--242b98.o" "--start-group" "-llibc++.a" "-llibc++abi.a" "-llibc.a" "-llibsys.a" "-llibsysbm.a" "-llibclang_rt.builtins.a" "-llibclang_rt.profile.a" "--end-group" "--cg_opt_level=2"

and:

$ echo 'int;' | ./tiarmclang -xc -O1 - -o /dev/null -\#\#\#
TI Arm Clang Compiler 1.3.1.LTS
Target: arm-ti-none-eabi
Thread model: posix
InstalledDir: C:\ti\ti-cgt-armllvm_1.3.1.LTS\bin
 "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin\\tiarmclang.exe" "-cc1" "-triple" "thumbv7em-ti-none-eabihf" "-emit-obj" "--mrelax-relocations" "-disable-free" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "-" "-mrelocation-model" "static" "-mframe-pointer=none" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-nostdsysteminc" "-fno-zero-initialized-in-bss" "-fdef-uninit-in-bss" "-fcommon" "-ffunction-sections" "-fdata-sections" "-fno-delete-null-pointer-checks" "-fwchar-type=int" "-fshort-enums" "-target-cpu" "cortex-m4" "-target-abi" "aapcs" "-fvisibility" "hidden" "-mfloat-abi" "hard" "-fallow-half-arguments-and-returns" "-fno-split-dwarf-inlining" "-debugger-tuning=gdb" "-resource-dir" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib\\clang\\12.0.1" "-internal-isystem" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib\\clang\\12.0.1\\include" "-internal-isystem" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\include\\c" "-O1" "-fdebug-compilation-dir" "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin" "-ferror-limit" "19" "-fno-signed-char" "-fgnuc-version=4.2.1" "-faddrsig" "-o" "C:\\msys64\\tmp\\--ed7ffc.o" "-x" "c" "-"
 "C:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\bin\\tiarmlnk" "-IC:\\ti\\ti-cgt-armllvm_1.3.1.LTS\\lib" "-o" "nul" "C:\\msys64\\tmp\\--ed7ffc.o" "--start-group" "-llibc++.a" "-llibc++abi.a" "-llibc.a" "-llibsys.a" "-llibsysbm.a" "-llibclang_rt.builtins.a" "-llibclang_rt.profile.a" "--end-group" "--cg_opt_level=1"

only two differences: "-vectorize-loops" "-vectorize-slp"

The problem: you can't set these flags. They do not exist. And they are way less to accomplish the optimization that is noted in the manual.

The problem itself may be rooted in the tlsf since it does not seem to be updated since 2008 but is said to compile and work fine with -O2 based on gcc.

Since I often had some weird undefined and prefetch-aborts that vanished when i moved some libs inside the linker-script I am not sure where the problem is exactly located but it seems that at some optimization-level the alignment for .data-stuff and heap-allocated objects is not working correctly.

Best regards

Felix

  • update from our side.

    We found at least one of the problems:

    We used GPMC with an pSRAM, so that had one issue: we did not set the linker-flag -mno-unaligned-access, which according to the documentation should be used when using peripheral memory.

    That solves one of the problems. But we also have another one which brings up data aborts and according to CCS this is located in __udivmoddi4. I'm not common with the process how clang linkes those libclang_rt.builtins.a-libs where this should be located. I cannot place it via the linker at a different place.

    This problem occurs only, when we add a component which does 64-bit computation. So also I'm not sure if there may be an alignment problem too.

  • Hi Felix -- glad you used "-mno-unaligned-access" -- unaligned loads and stores can still work with peripheral memory as long as the memory pointers are aligned, but as you are accessing struct members, this is the type of situation in which "-mno-unaligned-access" is useful.  In general, runtime library code is built assuming that using unaligned loads and stores is OK on ISAs where they exist, so runtime libraries shouldn't be placed in peripheral memory as far as possible.

    As for __udivmoddi4, you mentioned that you can't place it via the linker at a different place; there does the exist the capability of placing a library function in a different location in memory using a format like this:

    SECTIONS
    {
        ...
        .my_text: { libclang_rt.builtins.a<udivmoddi4.S.obj>(.text) } > MY_MEM
        ...
    }

    This will place the contents of the udivmoddi4 function (in its constituent file) from that specific library in another region of memory.  You could also use a wildcard to place all builtin routines there (not just udivmoddi4). Is this something you tried?

  • Hi Alan,

    I guess the point you mention with the " runtime libraries shouldn't be placed in peripheral memory as far as possible." can be aproblem at our side as well. Currently we are just directly placing special libraries in internal SRAM and all other .text in our gpmc/pSRAM. That may seem to be a problem then. I will start doing it the other way round and see if that solves some of those issues.

    I also tried to place the lib which contains the function like this in the linker-script:

    .textStack:   
    {  
      -l libclang_rt.builtins.a(.text),
    }

    but as soon as I try to place it there I get the following:
    "C:/XXX/inker_r5f_mcu1_0_freertos_debug.cmd", line 87: warning:
       cannot resolve archive
       C:\ti\ti-cgt-armllvm_1.3.1.LTS\lib\libclang_rt.builtins.a to a compatible
       library, as no input files have been encountered

    and a lot of undefined symbols(from this lib) are displayed.

    I will now change the linker-script in general and see if this helps somehow.

    Thank you!

    Best regards

    Felix

  • Hi Felix -- you have to use the syntax like I showed within the SECTIONS directive of the linker command file (i.e. don't include "-l").

    As to your question, it occurred to me that ordinarily it's probably OK to place runtime library code sections in peripheral memory because the memory accesses within that code are typically to other memory locations or to the stack via the stack pointer.  It's probably worth trying to place the function elsewhere to see the effect, but if you are seeing a data abort in the routine, it would be good to know at what instruction this happens, and whether the instruction is a memory access, and what memory it is attempting to access.

  • Hey Alan,

    we finally found the problem.

    In fact it seems -O1 produces less code and data than -O2. because when we increased the stack-size of the corresponding freeRTOS-task we could eliminate the abort even with -O2.

    So it seems the root of the problem was not related to the compiler and linker at all :D

    Thanks for your help!