This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Race-condition when enabling second CPU core in SysBios

Other Parts Discussed in Thread: SYSBIOS

I suppose that there is a race-condition between the setting of the stack pointer and setting the instruction-pointer when enabling the second CPU in SysBIOS 6.37.02.27. The code from packages/ti/sysbios/family/arm/ducati/Core_smp.c:

Void Core_startCore1()
{
    if (Core_getId() != 0) {
        return;
    }

    /* put C stack in the middle of core 1's Hwi stack to avoid collision */
    *(Char **)(0x0008) = &Core_module->core1HwiStack[Core_core1HwiStackSize/2];
    *(UInt32 *)(0x000C) = (UInt32)Core_core1Startup;
}

The stack pointer is written to 0x8 and the instruction pointer to 0xC.

However, there is nothing that prevent the compiler from doing the store to set the stack pointer AFTER the store to set the instruction pointer. This is what happened in my case (TI compiler v5.1.11). See the disassembly view:

The code for CPU core 1 loops until the instruction pointer is set to something else than zero and then (after a few instructions) also loads the stack pointer. Code from Power_resumeCpu.sv7M:

        .sect   ".ducatiPowerBoot"
        ; reset vectors
vecbase:
        .long   0               ; sp = not used ; address 0x0
        .long   _ti_sysbios_family_arm_ducati_smp_Power_reset ; address 0x4
core1sp:
        .long   0               ; Core 1 sp  ; address 0x8
core1vec:
        .long   0               ; Core 1 resetVec ; address 0xC
        
        
[...]


core1:
        ; Core 1 waits for "a while" to let core 0 init the system
        ldr     r0, core1vec
        cmp     r0, #0
        beq     core1           ; loop until core 0 unleashes us

        mov     r2, #core1vec-vecbase
        mov     r1, #0
        str     r1, [r2]        ; clean up for next reset

        ldr     sp, core1sp
        bx      r0              ; jump to core1's c_int00
        

So, the CPU core 1 could be running with some uninitialized stack pointer because the store to the stack pointer location is executed by CPU core 0 after releasing CPU core 1.

Possible solutions are to fix the code in Core_startCore1():

1. Use the qualifier "volatile" for both stores. This forbids compiler reordering. However, if this does not prevent the CPU from reordering during runtime (in case of ARM v7) unless the CPU doesn't do this at all (for instance, the Cortex M4 does strict in-order processing). This won't help on a Cortex A15.

2. Use a memory barrier instruction for ordering (e.g. "DMB" for ARM v7):

    *(Char **)(0x0008) = &Core_module->core1HwiStack[Core_core1HwiStackSize/2];
    __asm("\tDMB\r\n");
    *(UInt32 *)(0x000C) = (UInt32)Core_core1Startup;

This is probably the safest solution, but not particularly nice.

3. AFAIR with ARM v7 there is a possibility to enforce strict ordering of memory instructions via the page-tables. This is usually done for I/O memory. Together with the "volatile" keyword, this prevents reordering by the compiler and during runtime.

Comments?

  • Matthias,

    Thank you very much for posting all these details and the suggested solutions!

    I will discuss this with the author of this code later today…

    But I’m wondering now if this is a real problem today.  Are you seeing an actual issue when running, or is this based on analyzing the reordering in the disassembly view? 

    I ask because after core 0 writes to 0xC, there is only one intervening instruction before the write to 0x8.  Meanwhile on core 1, after the “ldr r0, core1vec” where the new value is loaded, there will be the compare, check, and resetting of the value for next reset, resulting in 5 intervening instructions before the “ldr sp, core1sp”.  If both CPUs are executing in parallel it seems that the proper SP value would still be read on core 1, even though it was written second by core 0.

    I think it would probably be good to insert a barrier instruction to be extra safe, especially if this code changes in the future and there is opportunity for additional reordering and insertion of other instructions on core 0.  But I’m wondering if it is a real problem today.  Am I missing something?

    Regards,
    Scott

  • Hi Scott,

    I'm currently having problems with the startup of SysBios on my platform and I'm totally unsure what it is. There is probably something else broken, not necessarily what I reported here - maybe in my code, maybe in my SysBIOS configuration. I just stumbled across this issue so I thought I'd better report it.


    As you already mentioned, the chances may not be very high that the issue occurs. Maybe it always works. I will try the "DMB"-fix and will let you know.

    What I currently see is that after startup I hit the breakpoint in the main()-function. There I create one task with Task_create() and then call BIOS_start(). However, the breakpoint in the very first line of my task is never hit and the CPU gets an ARM hard fault somewhere in between. That's why I started debugging what actually going on after BIOS_start().

    Another important addition to solution2: In order to work correctly on Cortex A15, it is also necessary to add another "DMB" before loading the stack pointer for core1. Memory barriers should always be paired on the write and on the read side. Only then you have the guarantee of the happen-before-relationship of the write and the read memory instructions. I.e.:

    core1:
            ; Core 1 waits for "a while" to let core 0 init the system
            ldr     r0, core1vec
            cmp     r0, #0
            beq     core1           ; loop until core 0 unleashes us
    
            mov     r2, #core1vec-vecbase
            mov     r1, #0
            str     r1, [r2]        ; clean up for next reset
    
            dmb                     ; synchronize: Other DMB pair before writing the SP
            ldr     sp, core1sp
            bx      r0              ; jump to core1's c_int00

    I hope this helps. Best regards,

    Matthias

  • Hi Matthias,

    OK, thanks for the additional points regarding A15.  This particular boot code is used for M3 and M4 cores only.

    If you haven’t seen it, there are some hints for debugging exceptions here: http://processors.wiki.ti.com/index.php/SYS/BIOS_FAQs#4_Exception_Dump_Decoding_Using_the_CCS_Register_View

    Best regards,
    Scott