This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Cache CSL and Cache sysbios questions in the C6678

Other Parts Discussed in Thread: SYSBIOS

Hi,

 We tested execution time in cache invalidate and cache write back using the sysBIOS and also the CSL cache functions:

CACHE_invL1d(Vector, VectorLength, CACHE_WAIT)

Cache_inv(Vector, VectorLength, Cache_Type_L1D, CACHE_WAIT)

CACHE_wbL1d(Vector, VectorLength, CACHE_WAIT) ,

 Cache_wb(Vector, VectorLength, Cache_Type_L1D, CACHE_WAIT)

 

1.- With sysbios functions, Cache_inv and Cache_wb with parameter CACHE_WAIT and CACHE_NOWAIT,  the waiting time for both of them is very similar, seems that in the two cases (wait and no wait), the routine waits until the end of the transfer?

2.- For CACHE_invL1d and CACHE_wbL1d of CSL compared with Cache_wb and Cache_inv of the sysbios, for both of them with parameter WAIT for cache line transfer (64 bytes) we received:

Cache_inv                           =0.44 uSec

CACHE_invL1d                   =0.06 uSec

Cache_wb                          =0.43 uSec

CACHE_wbL1d                  =0.3 uSec

Waiting time for CSL routines is significantly shorter than sysbios routines times. Are the execution times right?

3.- We run our application under sysbios. Is possible to use the CSL routines aforementioned instead of the sysbios cache routines?

 4.- In CSL CACHE_invL1d, the waiting time using the parameter CACHE_WAIT_FENCE is shorter than the same routine with parameter CACHE_WAIT. Does cover the use of CACHE_WAIT parameter, the case of CACHE_WAIT_FENCE parameter?

Thanks in advance.

Shmuel

 

 

  • Hi Shmuel,

    I can't really comment whether or not your benchmarks are a "apples for apples" comparison.

    But here are a few things to consider that could explain for some of the differences with the timing measurements:

    1. How were you using the BIOS Cache functions? Were you passing in "Cache_Type_L1D" as the "type" argument? There are others such as "Cache_Type_L1", etc..

    2. In your .cfg file, did you use the C66 specific cache delegate or the generic cache module?

    Generic cache module:

    var Cache = xdc.useModule('ti.sysbios.hal.Cache');

    C66 specific cache module:

    var ti_sysbios_family_c66_Cache = xdc.useModule('ti.sysbios.family.c66.Cache');

    The C66 specific cache delegate is more tailored for the C66 and could perhaps provide improved execution times.

    3. The BIOS Cache functions are re-entrant safe for use with BIOS, so there is some extra logic associated with that which would affect the overall cache function execution time. I don't know if the CSL cache functions are thread safe for BIOS. I would have to get back to you on the CSL APIs or forward this thread to the team working on them.

  • The source code for both modules should be available in both distributions if you want to compare them.  The source for the BIOS/Cache module can be found in ti/sysbios/family/c66/Cache.c.  

    The BIOS implementation has a bit of extra overhead since it aligns the base and length to cache line size.   We also have some code to workaround a problem with the Cache in early C66 devices.  See sprz331 for details.

            /*
             *  Silicon errata sprz331a Advisory 14.
             *  Due to c66 silicon bug [see SDOCM00076053] spin with
             *  interrupts disabled here if atomicBlockSize != 0.
             *  Insert 16 NOPS after the wait.
             */
            if (Cache_atomicBlockSize) {
                Cache_wait();
                asm("      nop 4");
                asm("      nop 4");
                asm("      nop 4");
                asm("      nop 4");
            }

    You can disable this workaround by adding the following to your .cfg file:

    var Cache = xdc.useModule('ti.sysbios.family.c66.Cache');
    Cache.atomicBlockSize = 0;

    The BIOS Cache module also has some code to invalidate the prefetch buffer.  I don't have immediate access to the CSL/CACHE code so not sure if that code has similar logic for the prefetch buffer.

    You should #include <ti/sysbios/family/c66/Cache.h> (not <ti/sysbios/hal/Cache.h>) to save a function call.  The hal/Cache calls the family/c66/Cache APIs thru a function wrapper.

    -Karl-