Other Parts Discussed in Thread: HALCOGEN
Tool/software:
I have been using Clang/LLVM, specifically ARM's ATfE distribution, to compile HALCoGen-generated code. It mostly works great, but there are a few issues that have to be worked around. I'm documenting them here in case anybody else finds it useful, and in case the engineers at TI want to fix these issues in a future version. In all cases I think these changes are all compatible with GCC, armcl, and armclang, and in most cases fix reliance on undefined behaviors deprecated instructions.
Pre-UAL assembly instructions in sys_core.s
I get these two assembler errors multiple times in sys_core.s:
fmxr error: invalid instruction
fmdrr error: invalid instruction
Root cause:
- LLVM's assembler only recognizes UAL syntax https://reviews.llvm.org/D39196
- fmxr and fmdrr are deprecated, pre-UAL mnemonics https://developer.arm.com/documentation/ddi0403/d/Appendices/Legacy-Instruction-Mnemonics/Pre-UAL-floating-point-instruction-mnemonics?utm_source=chatgpt.com
Workaround:
Assemble with -x assembler-with-cpp -Dfmxr=vmsr -Dfmdrr=vmov
Proposed upstream fix:
Find/replace in sys_core.s template(s): fmxr to vmsr, fmdrr to vmov
(Note: this issue also affects FreeRTOS portasm.s)
sys_core.s uses FP instructions but doesn't specify an FPU type
Once the above Pre-UAL instructions are fixed, I get the next set of errors on the same lines as in the previous issue:
vmsr error: instruction requires: VFP2
vmov error: instruction requires: fp registers
Root cause:
When the LLVM assembler encounters a .cpu directive, it ignores any -mcpu= or -mfp= flags specified on the command line.
Workaround:
Assemble with -x assembler-with-cpp -Dcpu=extern
This works by replacing the .cpu directive with a .extern directive, which doesn't wind up having any effect on the generated code since no symbol cortex-r4 exists in the project. The assembler then falls back on the command-line -mcpu= or -mfp= flags.
Proposed upstream fix:
Add a .fpu directive such as the following to the sys_core.s template(s):
.fpu vfpv3-d16
(Note: this issue also affects FreeRTOS portasm.s)
Non-ASM statement in naked function _c_int00
Compiling sys_startup.c results in the following error:
error: non-ASM statement in naked function is not supported
102 | _coreInitRegisters_();
Root cause:
_c_int00 has the naked attribute, but contains statements other than basic asm statements. This is undefined behavior in GCC. GCC's manual (https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html#index-naked-function-attribute_002c-ARM) says:
Only basic asm statements can safely be included in naked functions. While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.
However, GCC doesn't attempt to prevent this unsupported usage. Clang enforces the requirement that naked functions only contain basic asm statements.
Fortunately, _c_int00 can be made into a non-naked C function with noreturn if _coreInitRegisters_ and _coreInitStackPointer_ are called before it is entered.
Workaround option 1:
Modify sys_startup.c USER CODE blocks 5 and 7 as follows:
__attribute__ ((naked))
/* SourceId : STARTUP_SourceId_001 */
/* DesignId : STARTUP_DesignId_001 */
/* Requirements : HL_SR508 */
void _c_int00(void)
{
/* USER CODE BEGIN (5) */
__asm__(
"bl _coreInitRegisters_\n\t"
"bl _coreInitStackPointer_\n\t"
"b _c_int00_c"
);
}
__attribute__ ((noreturn))
void _c_int00_c(void)
{
#if 0
/* USER CODE END */
/* Initialize Core Registers to avoid CCM Error */
_coreInitRegisters_();
/* USER CODE BEGIN (6) */
/* USER CODE END */
/* Initialize Stack Pointers */
_coreInitStackPointer_();
/* USER CODE BEGIN (7) */
#endif // 0
/* USER CODE END */
Workaround option 2:
To work around this issue without modifying any HALCoGen-generated files:
- Compile with -Dnaked=noreturn -Xlinker --wrap=_c_int00 -Xlinker --wrap=_coreInitRegisters_ -Xlinker --wrap=_coreInitStackPointer_
- Add a C file to your project with the following code:
__attribute__((naked)) void __wrap__c_int00(void) { __asm__( "bl __real__coreInitRegisters_\n\t" "bl __real__coreInitStackPointer_\n\t" "b __real__c_int00" ); } void __wrap__coreInitRegisters_(void) {} void __wrap__coreInitStackPointer_(void) {}
Proposed upstream fix:
- Rename _c_int00 to _c_int00_c
- Remove the calls to _coreInitRegisters_() and _coreInitStackPointer_() from _c_int00_c
- Add a new _c_int00 routine to either sys_startup.c (as a naked C function) or sys_core.s with the following body:
bl _coreInitRegisters_
bl _coreInitStackPointer_
b _c_int00_c
Calls to picolibc memcpy() in het.c cause data aborts
The program gets stuck in _dabort from a memcpy call in hetInit() when compiled against picolibc's memcpy.
Root cause:
I can't find any explicit documentation that hetRAM1 and hetRAM2 require word-aligned stores, but nevertheless that does appear to be the case. hetInit() is counting on memcpy to perform word-aligned stores into hetRAM1 and hetRAM2. However the spec for memcpy() makes no such guarantee, and recent versions of picolibc and newlib include memcpy() implementations that break this assumption, leading to data access faults.
Workaround:
- For het.c and only for het.c, pass -Dmemcpy=device_memcpy to the compiler.
- Include this implementation of device_memcpy() in your project:
void *device_memcpy(void *dest, const void *src, size_t n) { uint32_t *d = dest; const uint32_t *s = src; n /= 4; while (n--) { *d++ = *s++; } return dest; }
Tip: I apply this workaround using the following CMake snippet to ensure the -D flag is only passed to the compiler when compiling het.c:
Proposed upstream fix:
Replace calls to memcpy() in hetInit() with calls to the above device_memcpy() implementation.