This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Concerns regarding SMP-mode vs. non-SMP mode in SysBios (and in general)

Other Parts Discussed in Thread: SYSBIOS

Concerns regarding SMP-mode vs. non-SMP mode in SysBios (and in general)

------------------------------------------------------------------------

 

1.

Choosing the right synchronization mechanisms for

a) Task/Task mutual exclusion in SMP-mode,

b) Task/ISR mutual exclusion in SMP-mode and

c) Task/Task/ISR mutual exclusion in SMP-mode

d) Furthermore, it shall be possible to signal events from ISR

 

There are several implications:

- Task/Task mutual exclusion could be implemented using Semaphores, but then no

  access from ISR possible because you cannot call Semaphore_pend() in ISR context.

- Task/ISR mutual exclusion could be implemented using GateHwi (i.e. disabling ISRs).

  However, this does not protect against mutual exclusion from another task in SMP

  mode. Please note that it _would_ protect in non-SMP mode! This is one of the

  reasons why we prefer non-SMP mode.

- Task/Task/ISR is really tricky. It is possible in other operating systems,

  e.g. Linux Kernel using Spinlocks. But I don't know any SysBios

  synchronization mechanism that allows us to implement this.

 

Summary of (1): In non-SMP mode it is trivial to implement the needed primitives.

                In SMP-mode it is unclear how and whether this is possible at all.

 

 

2.

Ensuring correctness regarding parallel execution in presence of ARMv7's relaxed

memory model, where the following reordering is allowed for two memory accesses

to two different addresses A and B:

- a load from A is reordered after a load from B (and the load from A precedes the load from B in program order)

- a load from A is reordered after a stores to B (and the load from A precedes the store to B in program order)

- a store to A is reordered after a load from B  (and the store to A precedes the load from B in program order)

- a store to A is reordered after a store to B   (and the store to A precedes the store to B in program order)

 

See

 

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/memory-barriers.txt?id=refs/tags/v4.0.5

 

or

 

http://preshing.com/20120710/memory-barriers-are-like-source-control-operations/

 

 

 

The SysBios manual does not state which synchronization primitives provide memory

barriers or if they provide memory barriers at all. Usually, one can assume that

mutual exclusion primitives working on memory (like Semaphores and Mutexes)

provide them, but not primitives that just work on disabling and enabling

interrupts (like GateHwi).

 

Of course, we must ensure correctness, but the documentation does not say anything

about this topic. How are we then supposed to know what is guaranteed and what not?

 

 

Example code:

 

int global_var_result = 0;

 

void tasks1(...)

{

    /* produces result */

    [...]

   

    /* store result */

    global_var_result = ...;

   

    /* Set event for task2, see below */

    Event_post(...);

   

    [...]

}

 

void task2(...)

{

    Event_Wait(...)

   

    /* read result */

    .. = global_var_result;

}

 

 

If Event_post() does not include a write memory barrier, the store to global_var_result

could be reordered by the hardware after the write that is used within

Event_post() to signal the event to task2. Obviously, this could lead to task2

exiting from Event_Wait() and reading an old value from global_var_result.

Please note that this is only possible in SMP-mode, not in non-SMP mode!

 

Analogous, the same reordering is possible on the read side: Assuming that these

two tasks are executing on CPU core1 and core2 respectively, and assuming that

core1 did _not_ reorder the stores of task1. Then it is still possible that

CPU core2 could speculatively read global_var_result before executing the read

to check the event condition in Event_Wait(). This will lead to old data being

read by core2, not the produced result from task1.

 

 

The Linux kernel provides the smp_rmb() and smp_wmb() macros to enforce memory

barriers where needed. On ARMv7, they will translate into a DMB instruction:

 

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGIEIE.html

 

Further reads:

 

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14041.html

 

 

Thus, assuming that the SysBios event mechanism does not include memory

barriers, in order to enforce correctness, the example would have to be changed

to:

 

 

void tasks1(...)

{

    /* produces result */

    [...]

   

    /* store result */

    global_var_result = ...;

   

    smp_wmb(); // <- new! And needed in order to enforce ordering

   

    /* Set event for task2, see below */

    Event_post(...);

   

    [...]

}

 

void task2(...)

{

    Event_Wait(...)

   

    smp_rmb(); // <- new! And needed in order to enforce ordering

   

    /* read result */

    .. = global_var_result;

}

 

However, there is no such thing as smp_rmb() and smp_wmb() in SysBios, so it is

unclear what the status is there and if it is needed or not.

 

Note that all of this is NOT necessary on non-SMP systems! If using only one core,

that core always sees _ITS_ memory accesses as if executed in program order.

 

  • Hi Pablo,

    Which target are you running SMP/BIOS on ? Is it a Cortex-M or a Cortex-A processor ?

    Pablo Granados said:

    1.

    Choosing the right synchronization mechanisms for

    a) Task/Task mutual exclusion in SMP-mode,

    b) Task/ISR mutual exclusion in SMP-mode and

    c) Task/Task/ISR mutual exclusion in SMP-mode

    d) Furthermore, it shall be possible to signal events from ISR

     

    There are several implications:

    - Task/Task mutual exclusion could be implemented using Semaphores, but then no

      access from ISR possible because you cannot call Semaphore_pend() in ISR context.

    - Task/ISR mutual exclusion could be implemented using GateHwi (i.e. disabling ISRs).

      However, this does not protect against mutual exclusion from another task in SMP

      mode. Please note that it _would_ protect in non-SMP mode! This is one of the

      reasons why we prefer non-SMP mode.

    - Task/Task/ISR is really tricky. It is possible in other operating systems,

      e.g. Linux Kernel using Spinlocks. But I don't know any SysBios

      synchronization mechanism that allows us to implement this.

    Task/Task mutual exclusion can be achieved through Semaphores like you already noted. This is true in both SMP and non-SMP mode.

    Task/ISR as well as Task/Task/ISR mutual exclusion can be achieved through the use of Hwi_disable()/Hwi_restore(). Here's a brief description of how Hwi_disable()/restore() works in SMP mode and why it can be used to achieve mutual exclusion:

    Hwi_disable() disables interrupts on the current core and internally acquires an inter-core spinlock. This same inter-core spinlock is also acquired by every Hwi before calling the user's Hwi function. Therefore, if a task shares a data structure with a Hwi (ISR) it can call Hwi_disable() to guarantee mutual exclusion. The disabling of interrupts will prevent another ISR on the same core from pre-empting the task and accessing the data structure while the inter-core spinlock will prevent a Hwi being serviced on some other core from accessing the data structure.

    Hwi_restore() will restore the interrupts on the local core and release the inter-core spinlock (depends on the key).

    Pablo Granados said:

    2.

    Ensuring correctness regarding parallel execution in presence of ARMv7's relaxed

    memory model, ...

    The SysBios manual does not state which synchronization primitives provide memory

    barriers or if they provide memory barriers at all. Usually, one can assume that

    mutual exclusion primitives working on memory (like Semaphores and Mutexes)

    provide them, but not primitives that just work on disabling and enabling

    interrupts (like GateHwi).

     

    Of course, we must ensure correctness, but the documentation does not say anything

    about this topic. How are we then supposed to know what is guaranteed and what not?

     Example code:

     ...

     

    If Event_post() does not include a write memory barrier, the store to global_var_result

    could be reordered by the hardware after the write that is used within

    Event_post() to signal the event to task2. Obviously, this could lead to task2

    exiting from Event_Wait() and reading an old value from global_var_result.

    Please note that this is only possible in SMP-mode, not in non-SMP mode!

     ...

    Thus, assuming that the SysBios event mechanism does not include memory

    barriers, in order to enforce correctness, the example would have to be changed

    to:

     ...

     

    However, there is no such thing as smp_rmb() and smp_wmb() in SysBios, so it is

    unclear what the status is there and if it is needed or not.

     

    Note that all of this is NOT necessary on non-SMP systems! If using only one core,

    that core always sees _ITS_ memory accesses as if executed in program order.

    I agree, we should document the need to use barriers in SMP applications. We can add this to the SMP/BIOS wiki page (http://processors.wiki.ti.com/index.php/SMP/BIOS

    I also wanted to highlight that loads and stores always complete in program order on Cortex-M devices (see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHGJICF.html) so the ordering problem you mentioned above should not occur. On Cortex-A devices of course barriers are important to guarantee any ordering requirements.

     

    Best,

    Ashish

  • Hi Pablo,

    Here's a link to the bug (SDOCM00113597) we have for adding a barrier module:
    https://cqweb.ext.ti.com/cqweb/main?command=GenerateMainFrame&service=CQ&schema=SDO-Web&contextid=SDOWP&entityID=SDOCM00113597&entityDefName=IncidentReport&username=readonly&password=readonly

    It may take a while for the link to go active.

    Best,
    Ashish

  • Pablo Granados said:

    The SysBios manual does not state which synchronization primitives provide memory barriers or if they provide memory barriers at all. Usually, one can assume that mutual exclusion primitives working on memory (like Semaphores and Mutexes) provide them, but not primitives that just work on disabling and enabling interrupts (like GateHwi).

    Of course, we must ensure correctness, but the documentation does not say anything about this topic. How are we then supposed to know what is guaranteed and what not?

    I think this is a very good point. It's great that a barrier module is being developed and guidance on barrier use in SMP applications will be welcomed, but there's a broader issue here. The SYS/BIOS documentation makes no formal guarantees about the runtime ordering of operations.

    For example, the API documentation could guarantee that Semaphore_post has release semantics and Semaphore_pend has acquire semantics (on all TI-RTOS platforms). With that guarantee programmers would know that they don't need to add extra barriers to ensure correct ordering in scenarios where a sempahore controls access to a shared resource. On some platforms that would be done with explicit barrier instructions, while others have a memory model that automatically gives the required ordering.

    Similar guarantees could be provided for the other synchronization modules, and possibly also for operations such as task/SWI/HWI creation.

    A good example of how to document this is the Java platform API specification:

    Memory Consistency Properties

    Chapter 17 of The Java™ Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:

    • Each action in a thread happens-before every action in that thread that comes later in the program's order.
    • An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
    • A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
    • A call to start on a thread happens-before any action in the started thread.
    • All actions in a thread happen-before any other thread successfully returns from a join on that thread.

    The methods of all classes in java.util.concurrent and its subpackages extend these guarantees to higher-level synchronization. In particular:

    • Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.
    • Actions in a thread prior to the submission of a Runnable to an Executor happen-before its execution begins. Similarly for Callables submitted to an ExecutorService.
    • Actions taken by the asynchronous computation represented by a Future happen-before actions subsequent to the retrieval of the result via Future.get() in another thread.
    • Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread.
    • For each pair of threads that successfully exchange objects via an Exchanger, actions prior to the exchange() in each thread happen-before those subsequent to the corresponding exchange() in another thread.
    • Actions prior to calling CyclicBarrier.await and Phaser.awaitAdvance (as well as its variants) happen-before actions performed by the barrier action, and actions performed by the barrier action happen-before actions subsequent to a successful return from the corresponding await in other threads.

  • The initial question was also posted here:
    e2e.ti.com/.../1673900
    Sorry for the duplicate.

    Robert Cowsill said:
    For example, the API documentation could guarantee that Semaphore_post has release semantics and Semaphore_pend has acquire semantics (on all TI-RTOS platforms). With that guarantee programmers would know that they don't need to add extra barriers to ensure correct ordering in scenarios where a sempahore controls access to a shared resource. On some platforms that would be done with explicit barrier instructions, while others have a memory model that automatically gives the required ordering.

    Similar guarantees could be provided for the other synchronization modules, and possibly also for operations such as task/SWI/HWI creation.

    This would be my recommendation.

    Moreover, this matches the intuitive usage of synchronization primitives from a software point of view. Some software people don't even know about memory models and reordering but still they are capable of writing correctly synchronized multi-threaded programs by using the offered synchronization primitives to form e.g. critical sections (because those include all necessary barriers!).

    If they would have to add barriers in addition to the usage of synchronization primitives, I can guarantee you that they get the barriers wrong (which types to use; and where to place them). So, please include the necessary barriers in the sync primitives and add this to the SysBios function documentation. Thank you.

  • Hi Robert and Matthias,

    You make a good point. We probably need to improve our documentation for synchronization primitives. I believe we have the necessary barrier instructions already in place for targets that require them, but this may not be clearly documented.

    We presently support SMP mode of operation on only 2 targets, namely Cortex-A15 and Cortex-M3/M4. On the Cortex-M3/M4, the hardware does not perform any re-ordering so barriers are not really required. On the Cortex-A15, however, barriers are required to guarantee correct operation and we do have them where required. All SYS/BIOS synchronization primitives internally disable interrupts (call Hwi_disable/enable) to guarantee mutual exclusion. When running in SMP mode, in addition to disabling interrupts the primitives also acquire an inter-core lock. The inter-core lock implementation for Cortex-A15 executes the necessary barrier instructions. So, even though the Semaphore_pend or GateMutex_enter code does not have a barrier instruction, the Hwi_disable() call they all make does have one.

    Best,
    Ashish

  • Ashish Kapania said:

    You make a good point. We probably need to improve our documentation for synchronization primitives. I believe we have the necessary barrier instructions already in place for targets that require them, but this may not be clearly documented.

    Yes, I think it's particularly important to improve the documentation because of the fact that M3/M4 need no barriers in a single core configuration. That means there are no actual barrier instructions to see in the source code. It would be much easier to understand the synchronization primitives if the documentation said "this call behaves as if there were a memory barrier at the end (whether an actual barrier is required or not)"

  • Robert, as you mention, there's a broader issue and problems might already arise on unicore processors whenever the compiler reorders memory accesses. I asked a similar question here, but the answer wasn't exactly what I had hoped for (in particular, part of the answer was: "Yes, this means that the only solution in TI C to creating a critical section is to use volatile for all objects accessed in the critical section.").