This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

_mfence() v/s critical section [Hwi disable/enable], Multiple writers, MSMC, c6670

Other Parts Discussed in Thread: SYSBIOS

Hi All,


Board c6670

Situation: - All 4 cores want to write [more or less simultaneously] into a data structure X [X is in MSMCSRAM].

Theory: - Multiple writers to single end point problem

Practicals: -

option: -

[A]=========================================================
previous sys-bios version (version older than 6.33.06.50)
whichever core wants to access the structure X, he enters critical section acquires a Key [by Hwi_disable] and once done with write, Hwi_enable [thereby exiting the critical section


pseudo code [This code will be accessed by both cores]*****************************
/* Disable the interrupt */
Hwi_disable();

cache_invalid(X,sizeof(X));

X.a = 1;

cache_writeback(X,sizeof(X));

/* Disable the interrupt */
Hwi_enable();


[B]=========================================================
newer sys-bios version (version newer than 6.33.06.50)

blindly access X by any core simultaneously, mfence is within CACHE functions and pass CACHE_FENCE_WAIT as argument.

An example is
/* Writeback the contents of the cache. */
CACHE_wbL1d (ptr, size, CACHE_FENCE_WAIT);

No need of critical section, keys and Hwi disable or enable.

this CACHE_FENCE_WAIT is newer one.

/** No blocking, the call exits after programmation of the
* control registers
*/
CACHE_NOWAIT = 0,

/** Blocking Call, the call exits after the relevant cache
* status registers indicate completion. For block coherence
* this waits on the Word count register to be come 0.
*/
CACHE_WAIT = 1,

/** Blocking Call, For block coherence this uses the MFENCE to
* wait for completion
*/
CACHE_FENCE_WAIT = 2

pseudo code [This code will be accessed by both cores]*****************************
CACHE_wbL1d (X,sizeof(X),CACHE_FENCE_WAIT);

X.a = 1;

cache_writeback(X,sizeof(X),CACHE_FENCE_WAIT);
##############################################################################################how to do in version older than 6.33.06.50 with mfence [no critical section, how hwi disables or enables]

pseudo code [This code will be accessed by both cores]*****************************
CACHE_wbL1d (X,sizeof(X),CACHE_WAIT);
_mfence();

X.a = 1;

cache_writeback(X,sizeof(X),CACHE_WAIT);
_mfence();

1. please note in current version older than 6.33.06.50, i will be using CACHE_WAIT and i want to eliminate/avoid Hwi_disable and enable and use _mfence instead. is the above code correct ?
2. If my understanding is correct,i don't need to use Hwi_disable and enable? please confirm?
3. Thumb Rule is after every cache operation, _mfence() needs to be called. please correct me if i am wrong??

Thanks
RC Reddy

  • Hi,

    I don't use sysbios but mybe it is not important since as far as I know Hwi_enable()/disable() simple enable/disable interrupt at CPU level, Also I'm using only PDK 1.0.0.12 that already have the wait option.

    For my PDK version and BIOS 6.32.02.99:

    • PDK (routine with uppercase CACHE_*): both CACKE_WAIT and CACHE_FENCE_WAIT don't require additional _mfence() but require critical section since CSL don't protect from interrupts the cache update, so it is open to the silicon bug
    • BIOS (routine named Cache_*): it implement the wait option (wait on word count and then _mfence() also) and also protect against interrupt but under some circustant reenable/disable interrupt (when waiting for a previous operation to complete)
    • I don't know the behaviour of the cache_* (all lowercase) routine

    Q1, 2:. Since the critical section is required to protect against a silicon bug (ant not to synchornize the cores), it is always required but with sysbios it is embeddded in it and you don't require to code it. In any case it is required only to protect the cache update and not during the writing of the data (I suppose you have another method to protect from concurrent access from the 4 core).

    Using CSL:

    unsigned int saved=_disable_interrupts(); //intrinsic

    CACHE_invL2(&X, sizeof X, CACHE_FENCE_WAIT);  //better to use L2, also invalidate L1 evenif L2 is not enable

    _restore_interrupt(saved);

    ..... work on X

    saved=_disable_interrupts(); //intrinsic

    CACHE_wbL2(&X, sizeof X, CACHE_FENCE_WAIT);

    _restore_interrupt(saved);

    Q3: In general Yes, you have always to wait and _mfence() is the prefered method, In CSL version where CACHE_FENCE_WAIT is not support you can use CACHE_WAIT. In version where also CACHE wait if not supported you have to add the _mfence() (priot to reenable interrupts).

  • If you want mutual exclusion between cores -- that is, Core 1 (or Core 2, etc.) is guaranteed to be the only core trying to write to the structure at any moment -- then you need some mechanism besides Hwi_disable()/Hwi_enable() or the cache maintenance/mfence operations.  The Semaphore2 peripheral is designed to support that kind of approach, although the Multicore Navigator can also serialize many kinds of accesses.

    Hwi_disable() and Hwi_enable() protect against interruption by another task or interrupt on the same DSP core.  They do not protect against other DSP cores accessing the same resource as the core that disables its HWIs.

    _mfence() only needs to be called if you want a low-overhead way to wait for a cache maintenance operation to complete.  Sometimes a cache operation can proceed in the background, and the DSP core can continue to execute code without executing the mfence operation.  There are also older (pre-C66x) methods for the DSP core to poll whether a cache operation is complete or whether it can request a new cache operation; these polling methods require more cycles and more code space than _mfence().  Like Hwi_disable() and Hwi_enable(), cache maintenance operations are purely local to a DSP core, and do not block other DSP cores from accessing the same memory.

  • Hi All,

               Thanks for all replies. Can i summarize all above points like this

    For cache mutual exclusion

    CACHE_invL2(&X, sizeof X, CACHE_FENCE_WAIT);  //better to use L2, also invalidate L1 evenif L2 is not enable

    CACHE_wbL2(&X, sizeof X, CACHE_FENCE_WAIT);

    within core Mutual Exclusion

    Hwi_disable()

    Hwi_enable()
     

    Among cores Mutual Exclusion lock

    CSL_semAcquireDirect

    CSL_semReleaseSemaphore

    case 1:- 

    so in my case, if i am using sys-bios latest version installation [Cache_*], [assuming a case of all cores accessing structure X [X is in MSMCSRA]], then i dont need to worry about 

    i need to take care of only core Mutual Exclsuion 

    case 2: - 

    if i am using PDK based CACHE_* [assuming i am using FENCE_WAIT], then i have to take care of core Mutual Exclsuion and within core mutual exclusion

    let me know if i am correct in above summary.

    a doubt/question:- i can create a semaphore call it as MSMC_MEM_SEM with a value of 3 [please note this is for MSMC SRAM based X data structure synchronization among cores].

    and i can use like this

     CSL_semAcquireDirect (MSMC_MEM_SEM)

     CSL_semReleaseSemaphore (MSMC_MEM_SEM)

    Thanks

    RC Reddy

  • For the sake of completeness: the structure X should be aligned to a cache line (128 bytes to be sure to be compatible with L2) and no other data should share a cache line with it, otherwise the chache invalidation/flush could accidentally corrupt some other data.

    Sometimes I prefere to alias a little multicore control tructure in MCSM on a non-cacahble memory address (using PAX and MAR), just to be sure to have no problems.

    I planned to do some performance check but I never did it. Essentialy In situation where:

    1. read the shared area
    2. Do little modification or check some values to take decision
    3. Write back in case of modification

    I suppose that the cache give me no significant performance advantage, mybe under some circustance it is worst. The best solution is application dependent.