IPC HeapBufMP_free() performance

Ralf Goebel

Hi,

I found a performance problem in my application which is caused by HeapBufMP_free().

I'm using a HeapBufMP instance for image buffer management on 8 cores. The problem is that HeapBufMP_free() is invalidating cache for the released data buffer within a GateMP_enter()/GateMP_leave() block. This can take a relatively long time, depending on the size of the buffer. In my application, HeapBufMP_free() gets called simultaneously on all cores. This causes a spinning loop in GateMP_enter() on every core (except the first caller), waiting for the cache operation of the previous core to complete. In other words: only one core can invalidate the data buffer's cache at the same time.

In my opinion it should be OK to invalidate the cache of the data block before entering the gate.

I changed the code in HeapBufMP.c as follows and it seems to work:

Void ti_sdo_ipc_heaps_HeapBufMP_free(ti_sdo_ipc_heaps_HeapBufMP_Object *obj, 
        Ptr block, SizeT size)
{
    IArg key;

    Assert_isTrue(((UInt32)block >= (UInt32)obj->buf) &&
        ((UInt32)block < ((UInt32)obj->buf + obj->blockSize * obj->numBlocks)),
        ti_sdo_ipc_heaps_HeapBufMP_A_invBlockFreed);
    
    /* Assert that 'addr' is block-aligned */
    Assert_isTrue((UInt32)block % obj->align == 0,
            ti_sdo_ipc_heaps_HeapBufMP_A_badAlignment);

#if 1 // new version
    /*
     *  Invalidate entire block make sure stale cache data isn't
     *  evicted later
     */
    if (obj->cacheEnabled) {
        Cache_inv(block, obj->attrs->blockSize, Cache_Type_ALL, TRUE);
    }
#endif

    /* Enter the gate */
    key = GateMP_enter((GateMP_Handle)obj->gate);

    ListMP_putTail((ListMP_Handle)obj->freeList, block);

    if (ti_sdo_ipc_heaps_HeapBufMP_trackAllocs) {
        /* Make sure the attrs are not in cache */
        if (obj->cacheEnabled) {
            Cache_inv(obj->attrs, sizeof(ti_sdo_ipc_heaps_HeapBufMP_Attrs), 
                    Cache_Type_ALL, TRUE);
        }

        obj->attrs->numFreeBlocks++;

        /* Make sure the attrs are written out to memory */
        if (obj->cacheEnabled) {
            Cache_wbInv(obj->attrs, sizeof(ti_sdo_ipc_heaps_HeapBufMP_Attrs), 
                    Cache_Type_ALL, TRUE);
        }
    }
    
#if 0 // original version
    /* 
     *  Invalidate entire block make sure stale cache data isn't 
     *  evicted later
     */
    if (obj->cacheEnabled) {
        Cache_inv(block, obj->attrs->blockSize, Cache_Type_ALL, TRUE);
    }
#endif

    /* Leave the gate */
    GateMP_leave((GateMP_Handle)obj->gate, key);
}

HepMemMP and maybe other modules seem to have the same behaviour.

Ralf

over 12 years ago

0 ScottG over 12 years ago

TI__Mastermind 26780 points

Ralf,

I checked into this a bit, and we don’t see any obvious reason why the cache invalidates need to be limited to one core at a time. I just filed and enhancement request (SDOCM00096671) to get this looked at in detail, and if there are indeed no issues, to make this change in a future release.

Thank you for reporting and suggesting this!

Scott

Processors

Processors forum

IPC HeapBufMP_free() performance