This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Cache coerence problem in Shared Memory(C6678)

I've a problem with the shared memory of my C6678.

I'm sending data from another DSP via SRIO to the Shared Memory of My DSP and i want to check if the data are correct

This is the simple test function i'm using

CRandGen::SetSeed(Tcount+1); //Pseodo random generator
DstAddress = (UINT32*)(aSHARED_MEMORY_BASE + PktSize);
for (int j = 0; j < PktSize/4; j++)
{
UINT32 Read = DstAddress[j];
UINT32 Check= CRandGen::Rand();
if(Read != Check)
Ecount++;
}

I've put a breakpoint at the row DstAddress = (UINT32*)(aSHARED_MEMORY_BASE + PktSize); 

In my first run DstAddress = 0x0c000000 and i get no problems

In the second run DstAddress = 0x0c004000 and when i reach the breakpoint the data in the shared memory is correct. But if i proceed step by step when i reach the row UINT32 Read = DstAddress[j]; the readen value is wrong and i see in the watch window the data changed with wrong values.

But if i deselect the L1D Cache combobox the data returns to the correct value.. So the data in the cache is wrong and the core uses this wrong values

What could be the problem that causes this coherence  problem?

  • Hi,

    This is normal, since the C6678 doesn't maintain coherence between a core and other devices (including other cores), so you have to manage it by yourself.

    You always have to flush (when writing) and invalidate (when reading) data that have been written by a device, the embedded one also.

    Note that the shared memory regions must be aligned to cache line size (L1 or L2, if it is enable)..

  • Hi, thanks for your answer.

    I've added a write to the register rL1DWBINV before the check in shared memory but the problem still remains.

    This is what i get when i reach the row DstAddress = (UINT32*)(aSHARED_MEMORY_BASE);

      

    While after executing UINT32 Read = DstAddress[j]; i get

      

    And deselecting the combobox

  • Hi,

    Don't write-back and invalidate: for reading, simple invalidate and add an _mfence() after the invalidate command.

  • ok thanks.

    Is there any documentation or datasheet where i can read about _mfence() ?

    Is a standard function for multicore environment?

  • Gabriele_Giannini said:

    ok thanks.

    Is there any documentation or datasheet where i can read about _mfence() ?

    Is a standard function for multicore environment?

    It is a C6000 compiler intrinsic that map to the assembler instruction MFENCE. See compiler manual and CPU instruction set SPRUGH7.

    It is not specific for multicore. It "stalls the instruction fetch pipeline until the memory system
    busy flag goes low".

    There is a silicon bug that require "the memory system be idle during the block coherence
    operations" (see SPRZ334E).

    The workaround suggest to disable interrupts during coherence operation, wait fo operation completition (mfence), execute 16 nops.

  • hanks. So, correct me if i'm wrong, before the part of code where i check the content of the shared memory i should add:

    //To initiate a global invalidation operation, write a 1 to the I bit of the L1DINV register.
    rL1DINV |= 1;

    //Poll this bit to detect the completion of the operation (Must return to 0)
    while (rL1DINV & 1) ;

    //stall until the completion of all the CPU-triggered memory transactions
    _mfence();

    (actually i'm not using interrupt so i don't need to disable them)

    Right?

  • The fact is that now the code is not working.

    I'm inside a function and the variable PktSize become corrupted after clearing the cache operationn and so the for cicle don't work correctly

  • Doing a global L1D invalidate operation is a bad idea, because all cached and modified data suddenly gets invalid.
    Instead, you can use a global L1D writeback-invalidate, or a block L1D invalidate only on the data buffer.

    There is also a single MFENCE issue (Silicon Advisory 27). Therefore, you should use two two _mfence() instructions.

    Ralf

  • Gabriele_Giannini said:

    The fact is that now the code is not working.

    I'm inside a function and the variable PktSize become corrupted after clearing the cache operationn and so the for cicle don't work correctly

    Sorry, I forget to say to not use global invalidation, since it affect all your cache data and not only the shared area.

    You have to use the block cache operation, that is:

    L1DIBAR= aSHARED_MEMORY_BASE + PktSize

    L1DIWC = PktSize/sizeof(uint32_t)

    Pay always attention  that the all the cache works on cache lines, so the memory region reserved for the communication should aligned and have a size multiple of cache line size (128 bytes for L2). Otherwise the linker could allocate some local variable in the cache block that you''l go to invalidate.

  • Thanks, i will try it.
    One of my colleagues has suggested me if it's possible to completely disable the caching of the shared memory, it would also be useful for further developments in the firmware.

    Is it possible?

  • Gabriele_Giannini said:

    Thanks, i will try it.
    One of my colleagues has suggested me if it's possible to completely disable the caching of the shared memory, it would also be useful for further developments in the firmware.

    Is it possible?

    It is not possible to disable cache on MSMCat its default physical address.

    To have an uncachable view of it, you have to remap it to another logical address, by means of MPAX registers and mark this area as non-cachable, by means of the MAR registers. See C66 Core Pack User Guide (SPRUGW0A)

    The external device should use the physical address (that is, the SRIO direct transfer have to use address 0x0Cxxxxxx), while the CPU can use the logical address.

  • Thanks, we'll think about whether it is convenient or not to use this method and remap the shared memory.

    However, i've added in my code this

    rL1DIBAR= aSHARED_MEMORY_BASE + PktSize;
    rL1DIWC = PktSize/sizeof(UINT32);
    _mfence();

    but when i execute UINT32 Read = DstAddress[j]; the first 16 word readen fron shared memory still has wrong values.

  • Try to invalidate also the prefetch buffer:  XPFCMD = 1, where XPFCMD is:

      (*(volatile unsigned int*)0x08000300)

    (See C66 Core Pack user manual)

  • Thanks! Now it's working correctly