TMS570LS1225: HALCoGen outputs non-EABI compliant code which can break GCC

Patrick Herborn

Part Number: TMS570LS1225
Other Parts Discussed in Thread: HALCOGEN, , TMS570LS3137

Hello All!

I have been trying (and have succeeded) to get HALCoGen to output code which can be compiled with an ARM GCC toolchain (without using CCS - not sure if this affects CCS/Linaro) and have stumbled upon an issue with one part of the HALCoGen output :

Part of the startup sequence in sys_startup.c :

/* SourceId : STARTUP_SourceId_001 */
/* DesignId : STARTUP_DesignId_001 */
/* Requirements : HL_SR508 */
void _c_int00(void)
{
/* USER CODE BEGIN (5) */
/* USER CODE END */

    /* Initialize Core Registers to avoid CCM Error */
    _coreInitRegisters_();

The issue here is that _coreInitRegisters_() is not EABI compliant (it cannot be) and trashes registers GCC does not expect to be clobbered. This emitted the following here :

00000f88 <_c_int00>:
     f88:       e3e04000        mvn     r4, #0
     f8c:       ebfffc5d        bl      108 <_coreInitRegisters_>
     f90:       ebfffc8a        bl      1c0 <_coreInitStackPointer_>
     f94:       ebfffca4        bl      22c <_coreEnableEventBusExport_>
     f98:       ebfffd0e        bl      3d8 <_errata_CORTEXR4_66_>
     f9c:       ebfffd07        bl      3c0 <_errata_CORTEXR4_57_>
     fa0:       e514301b        ldr     r3, [r4, #-27] ; 0xffffffe5
     fa4:       e3130902        tst     r3, #32768      ; 0x8000

Notice that R4 is set at the very beginning and GCC does not expect it to change [EABI says R4 is callee-save] before the LDR at 0xfa0 - but of course it does, and therefore this results in a data abort at that point. The fetch happens lower down due to this code code in sys_startup.c :

/*SAFETYMCUSW 139 S MR:13.7 <APPROVED> "Hardware status bit read check" */
else if ((SYS_EXCEPTION & WATCHDOG_RESET) !=0U)

[Ironically the 0xffffffe5 is actually "correct" as per disassembly since R4 returns to 0 rather than staying at -1, it should access 0xffffffe4 and it would, if R4 was still -1].

A simple workaround for this issue is to tell GCC that its registers are going to be trashed :

/* USER CODE BEGIN (6) */
__asm__ __volatile__ ("":::"r0","r1","r2","r3","r4","r5","r6","r7","r8","r9","r10","r11","r12","r14");
/* USER CODE END */

Notice what this does to the output :

00000f88 <_c_int00>:
     f88:       ebfffc5e        bl      108 <_coreInitRegisters_>
     f8c:       e3e04000        mvn     r4, #0
     f90:       ebfffc8a        bl      1c0 <_coreInitStackPointer_>
     f94:       ebfffca4        bl      22c <_coreEnableEventBusExport_>
     f98:       ebfffd0e        bl      3d8 <_errata_CORTEXR4_66_>
     f9c:       ebfffd07        bl      3c0 <_errata_CORTEXR4_57_>
     fa0:       e514301b        ldr     r3, [r4, #-27] ; 0xffffffe5
     fa4:       e3130902        tst     r3, #32768      ; 0x8000

The MVN has now been moved after the non-EABI compliant function call - the other function calls are sufficiently EABI compliant that it doesn't break - and the LDR succeeds.

There may be a more elegant solution to this, but so long as GCC is allowed to allocate things out-of-order, any call to a non-EABI compliant assembly function could be problematic.

This should likely be fixed in HALCoGen.

Just for reference, when building using CCS and the TI compiler (with HALCoGen emitting TI code) we see :

00003c44 <_c_int00>:
    3c44:       ebfffde3        bl      33d8 <_coreInitRegisters_>
    3c48:       ebfffe10        bl      3490 <_coreInitStackPointer_>
    3c4c:       ebfffe2a        bl      34fc <_coreEnableEventBusExport_>
    3c50:       ebfffe92        bl      36a0 <_errata_CORTEXR4_66_>
    3c54:       ebfffe8d        bl      3690 <_errata_CORTEXR4_57_>
    3c58:       e3e0401b        mvn     r4, #27
    3c5c:       e594c000        ldr     ip, [r4]
    3c60:       e31c0902        tst     ip, #32768      ; 0x8000

Whilst it is sat somewhat higher up in memory than the GCC output, we can see that it doesn't allocate R4 early like GCC does and thus avoids the issue.

Best regards,

Patrick Herborn.

over 3 years ago

0 QJ Wang over 3 years ago

TI__Guru**** 186176 points

Hello Patrick,

Please refer to the patch posted by Chester.Gillon on Dec 5 2020.

0 Chester Gillon over 3 years ago in reply to QJ Wang

Guru 92251 points

QJ Wang said:
Please refer to the patch posted by Chester.Gillon on Dec 5 2020.

The patch I created was to the GCC compiler to add big-endian run time libraries.

Whereas Patrick has found a problem which needs to be fixed in HALCoGen.

I assume Patrick has already patched the the GCC compiler, as otherwise wouldn't have been able to link an executable for the big-endian [BE32] format TMS570LS1225 device.

0 Patrick Herborn over 3 years ago in reply to Chester Gillon

Intellectual 520 points

Hiya QJ, Chester!

Chester is indeed correct to point out that the issue I was encountering was a HALCoGen issue and not specifically related to GCC per se. HALCoGen is emitting non-EABI compliant assembly and this could well break any EABI compliant compiler - it was just luck that the TI compiler didn't fall into this trap. As mentioned, it should be fixed in HALCoGen. It is incorrect to call assembly from C and clobber registers that are callee-save. As mentioned, the asm function has no choice but to clobber them (that's its whole purpose) - so it is perhaps more correct to either point the reset vector at this code and then for it to branch to _c_int00. The asm() workaround obviously does what it needs to inasmuch as it tells the compiler "I've just trashed your regs, so you cannot assume anything you have done is still valid" - though I can only confirm it works with GCC (other compilers will have other mechanisms). Also, as mentioned, there may be a more elegant way (some __attribute__ or #pragma to tell the compiler this is a "broken" / non-EABI function).

With regard to Chester's assumption regarding a GCC "Big Endian" build, I have come to the following conclusions (based, in great part, on info in Chester's extensive contributions in the linked thread - an excellent resource) :

o It is not necessary to build "Big Endian" versions of binutils - even a prior version I had not built like that is fine with bigarm files and honours -EB flag

o GCC itself does not need to be build in a "Big Endian" way - even a prior version I had not built like that is fine generating bigarm files and honours -mbig-endian flag

o The only parts of the GCC build that "need" to be big endian are the libs and you can easily build them that way with CFLAGS_FOR_TARGET='-g -O2 -mbig-endian' in GCC's ./configure

o It is not strictly necessary to have big endian libraries for bare metal work. Indeed my HALCoGen GCC linker script tries to exclude things from libc.a, libm.a, libgcc.a.

I have also observed some other weirdness :

o Linking with GCC as a front-end resulted in bad output (it was little endian in a bigarm file - go figure!)

o Linking with LD directly creates expected results with bigarm files containing bigarm contents

o Using LD, and perhaps due to the linker script's attempt to exclude lib*.a, I am able to include littlearm files in the link list so long as nothing gets pulled in from the littlearm libs (one might expect that LD could barf at that, but it seems clever enough to realise that endianness is moot when you're not actually pulling anything in from a lib with different endianness).

I'll try a GCC 10 build next to see how that behaves - testing so far has been GCC 9.2 and 9.3.

Best regards,

Patrick Herborn.

0 Sunil Oak over 3 years ago in reply to Patrick Herborn

TI__Mastermind 49120 points

The _coreInitRegisters_ function is required to initialize ALL CPU user registers once on each power-up, because ARM does not guarantee the default contents of the user registers after power-up. If left uninitialized there could be core compare errors generated based on the register contents being different between the two CPUs.

You can also change this function based on your requirements. For example, this function already switches through each CPU mode to initialize banked registers. In that case, you can integrate the stack pointer initialization for each CPU mode within this function itself instead of calling a separate function to just initialize the R13 (SP) registers for each mode (and save some time for init). For this you can create a function called _initCoreRegs_ and call it from the start-up.c on power-up like this:

/* USER CODE BEGIN (5) */
#if 0
/* USER CODE END */

    /* Initialize Core Registers to avoid CCM Error */
    _coreInitRegisters_();

/* USER CODE BEGIN (6) */
/* USER CODE END */

    /* Initialize Stack Pointers */
    _coreInitStackPointer_();

/* USER CODE BEGIN (7) */
#endif

    _initCoreRegs_();
/* USER CODE END */

0 Chester Gillon over 3 years ago

Guru 92251 points

Patrick Herborn said:
There may be a more elegant solution to this, but so long as GCC is allowed to allocate things out-of-order, any call to a non-EABI compliant assembly function could be problematic.

I just tried adding the returns_twice function attribute to the prototypes for _coreInitRegisters_ and _coreInitStackPointer_:

/** @fn void _coreInitRegisters_(void)
*   @brief Initialize Core register
*/
void _coreInitRegisters_(void) __attribute__((returns_twice));

/** @fn void _coreInitStackPointer_(void)
*   @brief Initialize Core stack pointer
*/
void _coreInitStackPointer_(void) __attribute__((returns_twice));

To test created a program for a TMS570LS1225 and with the sys_core.h created by HALCoGen 04.07.01, i.e. without the function attributes, and using -O3 problematic code was generated:

0000a8c8 <_c_int00>:
     * to identify the cause of the CPU reset.
     */

    /* check for power-on reset condition */
    /*SAFETYMCUSW 139 S MR:13.7 <APPROVED> "Hardware status bit read check" */
    if ((SYS_EXCEPTION & POWERON_RESET) != 0U)
    a8c8:       e3e04000        mvn     r4, #0
    _coreInitRegisters_();
    a8cc:       ebfff3d1        bl      7818 <_coreInitRegisters_>
    _coreInitStackPointer_();
    a8d0:       ebfff413        bl      7924 <_coreInitStackPointer_>
    _coreEnableEventBusExport_();
    a8d4:       ebfff433        bl      79a8 <_coreEnableEventBusExport_>
        _errata_CORTEXR4_66_();

After adding the returns_twice function attribute GCC move the write to r4 to after the call to _coreInitStackPointer_:

0000a8c8 <_c_int00>:
{
/* USER CODE BEGIN (5) */
/* USER CODE END */

    /* Initialize Core Registers to avoid CCM Error */
    _coreInitRegisters_();
    a8c8:       ebfff3d2        bl      7818 <_coreInitRegisters_>

/* USER CODE BEGIN (6) */
/* USER CODE END */

    /* Initialize Stack Pointers */
    _coreInitStackPointer_();
    a8cc:       ebfff414        bl      7924 <_coreInitStackPointer_>
     * to identify the cause of the CPU reset.
     */

    /* check for power-on reset condition */
    /*SAFETYMCUSW 139 S MR:13.7 <APPROVED> "Hardware status bit read check" */
    if ((SYS_EXCEPTION & POWERON_RESET) != 0U)
    a8d0:       e3e04000        mvn     r4, #0
    _coreEnableEventBusExport_();

The problem with adding the function attributes to the prototypes in sys_core.h is that they are not in USER CODE blocks and so the changes get lost when HALCoGen re-generates the code.

An alternative is to leave the function prototypes in sys_core.h unmodified, and add the prototypes with the function attributes in the USER CODE prior to their use in _c_int00():

void _c_int00(void)
{    
/* USER CODE BEGIN (5) */
    void _coreInitRegisters_(void) __attribute__((returns_twice));
    void _coreInitStackPointer_(void) __attribute__((returns_twice));
/* USER CODE END */

    /* Initialize Core Registers to avoid CCM Error */
    _coreInitRegisters_();

/* USER CODE BEGIN (6) */
/* USER CODE END */

    /* Initialize Stack Pointers */
    _coreInitStackPointer_();

The GCC 7.2.1 compiler takes the second function prototypes with the function attribute in preference to those without the function attribute in the sys_core.h file, and still moves the write to r4 after the call to _coreInitStackPointer_(). However, having a prototype for the same function seen by the compiler twice with different function attributes is potentially problematic and could be against coding guidelines.

0 Patrick Herborn over 3 years ago in reply to Sunil Oak

Intellectual 520 points

Hiya Sunil!

I think we are all in agreement on what _coreInitRegisters_ does, is supposed to do, and why it needs to do it.

Where there is some disagreement is in regard to how it should go about doing that. Taking your example, it appears to move, rather than mitigate the problem.

I've had some more time to think about this since first posting and, to put it bluntly, there is simply no AAPCS / EABI compliant way of doing what it must do. To quote from the AAPCS :

"A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6)"

It is therefore, by definition, impossible to write an AAPCS / EABI function which does what _coreInitRegisters_ must do - it must set those registers and so it therefore cannot comply with the AAPCS / EABI.

It should be noted that _coreInitStackPointer_ also breaks AAPCS / EABI, but this does not manifest itself as a problem here because _c_int00 is __attribute__ ((naked)) and __attribute__((noreturn)) - so it's not like the compiler is making use of stmfd r13! (which would be really bad before the stack pointers are initialised!) - but again any AAPCS/EBIA compliant compiler has the right to expect the SP to remain unchanged across a function call (and clearly the purpose of _coreInitStackPointer_ is to change those). It is perhaps, then, wise to have the reset vector jump to _coreInitRegisters_, which then either flows into or jumps to _coreInitStackPointer_, which then finally flows into or jumps to _c_int00, once all the non-AAPCS/EABI compliant bring-up code has run and all further functions can be AAPCS/EABI compliant.

It is perhaps still possible, by means of some compiler directive other than the asm() workaround I proposed above, to tell the compiler that a particular function is "broken" / not AAPCS/EABI compliant, and perhaps that is a more elegant solution than the one I proposed. For GCC9.x at least, it allocates a value to r4 before calling those functions, and it expects r4 to have the same value upon return - which then causes a data abort (likely misaligned access) since r4 has actually changed.

[Caveat : I suppose it might, for the purposes of ensuring that both cores have the same reg contents, be possible to start by bringing up the SPs, doing an stmfd r13! followed by an ldmfd r13!. Presumably only one core actually stores to RAM and so upon fetch, both cores would end up with the same reg values - those that were present in the one write-capable core - although of course the fact that the SPs are being clobbered still breaks the AAPCS/EABI, it just won't break the executable in the process]

Best regards,

Patrick Herborn.

0 Patrick Herborn over 3 years ago in reply to Chester Gillon

Intellectual 520 points

Hiya Chester!

Interesting idea using __attribute__((returns_twice)), though as you rightly say the use of multiple function prototypes is less than ideal (but then so is my asm() solution). TI could possibly roll with that solution for HALCoGen 04.07.02... though I am still minded that whilst mitigating the issue, it is (presumably) not AAPCS/EABI compliant - whereas sending the reset vector through _coreInitRegisters_ and _coreInitStackPointer_ would be (ie get the nasty stuff out of the way before switching from asm over to C).

Best regards,

Pat.

0 Chester Gillon over 3 years ago in reply to Patrick Herborn

Guru 92251 points

Patrick Herborn said:
Interesting idea using __attribute__((returns_twice)), though as you rightly say the use of multiple function prototypes is less than ideal (but then so is my asm() solution).

Looking at a HALCoGen installation, there appear to be source files which are templates for the the HALCoGen generated source files, with markup to either conditionally exclude sections of the files or set parameter values.

Where c:/ti/Hercules/HALCoGen/v04.07.01/drivers/TMS570LS3137ZWT/SYSTEM570v000/sys_core.h appears to the template for the sys_core.h include file used for a TMS570LS1225. If you were prepared to modify the HALCoGen installation, presumably the "template" files could be modified. E.g. to add __attribute__((returns_twice)) to the prototypes for _coreInitRegisters_ and _coreInitStackPointer_.

0 Patrick Herborn over 3 years ago in reply to Chester Gillon

Intellectual 520 points

Hiya Chester!

There do indeed seem to be a number of templates in the HALCoGen installation and it seems that the actual one deployed is detailed in the device-specific configuration - so there are multiple versions of these files. It was interesting that mine had some errata calls in the emitted code that is missing from the TMS570LS3137 version - perhaps I had a previous version and had not properly purged those allowing the previous one to persist (as may happen if there are no actual changes needed). I'll try purging and re-emitting to see if it then drags in the appropriate one [the device-specific config does reference the 3137 version].

As for modifying the HALCoGen source, I guess there are three ways to mod it - 1) apply your patch to the function declaration, 2) apply my asm() patch or 3) change the reset vector so it jumps to the ASM before the ASM jumps to the C code. Re-reading the GCC docs suggests that 1) and 2) have a similar effect on GCC inasmuch as they both tell GCC "don't assume things you normally assume".

The returns_twice attribute appears to exist to implement setjmp() and vfork() (or other functions which behave in an "unexpected" way). It is likely a more elegant method of telling GCC that "this function does weird stuff" than my asm() patch [though the latter can go in the USER CODE section without the double-declaration needed for a USER CODE implementation of returns_twice]. Neither approach directly addresses the fact that _coreInitRegisters_() cannot be EABI compliant - but by the same token, setjmp() cannot be either ([Edit: unless setjmp() saves all those regs and longjmp() restores them] since it is impossible to predict what happens to the other registers between then and a longjmp() - whereas it is perhaps possible for vfork() to return with registers intact). The GCC docs are a little confusing where they say regs are dead on entry to a returns_twice function rather than on return as would seem clearer (to me at least).

I guess it ultimately boils down to whether there are other legitimate things programmers may wish to interject between the CPU hitting the reset vector and the code that flattens the regs so as to ensure both cores have the same state. I cannot think of anything off the top of my head which would need to be done ahead of that [but others may, of course] so my gut feeling is that setting the reset vector to _coreInitRegisters_ and having that also init the SPs would likely be fine. Those tasks can never be EABI compliant, whereas pretty much everything thereafter could be [bar any setjmp() / longjmp()].

Best regards,

Patrick Herborn.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LS1225: HALCoGen outputs non-EABI compliant code which can break GCC