This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSMC cache



HI 

I'm  probably having an issue about cache coherency   on c6678 (sometimes I read wrong values)

I want eight cores to access a shared variable in MSMC. Could you confirm that the following procedure is correct

I set cache as follow in the platform settings:
L2 cache= 32K
L1D cache = 32K
L1P cache = 32K

Variable definition as follow (pseudo code):

#pragma DATA_ALIGN (128);
#pragma DATA_SECTION(".MSharedSram")
var x;

I pad the variable so that sizeof(x) = 128

To modify the variable (pseudo code):

#define MY_SEM (17)

{
while ((CSL_semAcquireDirect (MY_SEM)) == 0);

CACHE_invL1d ((void*)&x, sizeof(x), CACHE_WAIT);
CACHE_invL2 ((void*)&x, sizeof(x), CACHE_WAIT);

x= x + 1; //write to x

CACHE_wbL1d ((void*)&x, sizeof(x), CACHE_WAIT);
CACHE_wbL2 ((void*)&x, sizeof(x), CACHE_WAIT);

CSL_semReleaseSemaphore (MY_SEM);
}

is it ok?

Thanks

Fabio

  • Hi Fabio,

    I've notified the design team. Their feedback will be posted here.

    Best Regards,
    Yordan
  • The code looks OK, I am not sure that the cache is the problem, so lets test it

    Disable cache (that is, set the cache size to 0) for both L1D and L2 and run the code again. See if the problem repeats itself.

    If it does, then it is not the cache and you continue to debug

    If the problem disappeared without cache, we will see what to do next.

    Two comments -

    MSMC memory is considered L2 so it is not cached by L2 - only by L1D (Unless you play games and I do not think so)

    Make sure that the optimization is OFF and that x is defined as volatile

    Ran

  • Hi

    Sorry, I will get back to you as soon as possible with any update on this

    Fabio

  • Hi

    I confirm that X is defined as volatile
    Optimization is set to 3 (interprocedure optimizations)

    Extra info:
    - my shared variable is a struct of size 128 byte containing a counter and other stuff
    - core0 updates this variable every 2 ms (counter++) and wakes up other core using Notify_sendEvent, other cores read the shared variable,
    acquire some data from Hyperlink and then start processing acquired data
    - TOTAL data acquisition from Hyperlink + TOTAL processing function take about 1.8-1.9 ms,obviously I must end processing before shared variable is updated again.
    - when I say that sometimes I read wrong values, I mean that I read OLD value (for example I expect counter=1000, I read counter 999).

    I hope I made myself clear

    I do the following tests:

    L2 cache= 0 (as suggested)
    L1D cache = 0 (as suggested)
    L1P cache = 32K
    optimization changed from 3 (interprocedure optimizations) to OFF (as suggested)
    CLEAR/REBUILD ALL

    --> Problem disappeared (it run for 2 hours) , but this test is not very reliable because setting optimization to OFF changes
    all processing timing,I mean...acquisition + processing take now about 25 ms (instead of 1.9),
    so I had to change updating time from 2 ms to 25 ms

    Why did you suggest to turn OFF optimization?? is it a problem to have Optimization = 3???

    So I try the following :
    L2 cache= 0
    L1D cache = 0
    L1P cache = 32K
    optimization back to 3
    updating time 2 ms
    CLEAR/REBUILD ALL

    --> problem is still present again

    Suppose I'm to sloow in processing data, I would expect something like:
    "expected counter= 1000, read counter=1001" (meaning that I miss an update)

    Instead, as already said I have "expect counter=1000, read counter 999) (meaning that I read OLD data)

    Do you think is a cache problem or do I have a bug in my code?

    Fabio
  • Fabio

    The reason why I asked you to turn off optimization is similar to the reason why I asked you to disable cache. To eliminate possibilities of the problem.

    That is, I wondered if somehow the compiler eliminates the X because it is not volatile somewhere. The optimizer gets ride of what it perceives as "dead code" namely, code that does not do anything. You know what I mean.

    I do not think it is a cache problem. (why loosing one value? if the cache is not updated you will stay with the original value, right?). I do think that this is a timing issue. - when you change the timing the problem disappears.

    So go back to the design board and analyze again all the data movements, and the delays that are associated with them. How do you read the data? EDMA or CPU? Do you account for overhead of the peripherals?
    My rule of Thumb - make sure that no bandwidth is more than 50% of the theory

    I hope it helps

    Ran
  • Hi

    I read data From HyperLink using EDMA

    Ok , I will investigate further ...

    thank you very much

    Fabio

  • Hi

    Sorry but... something strange is happening....

    I did the following:

    CACHE DISABLED, Optimization=3

    To avoid timing issue, I set updating time to 500 ms.....
    then I comment out acquisition and processing code, so now my Application does nothing except:
    -core0 updates variable every 500 ms
    -other cores read the variable and check that the counter is correctly incremented ----> NO PROBLEM

    Then I comment out cache_invalidate and cache_writeback, because I dont need to do this anymore since cache is disabled right?????

    And what happens is that read FAILS, Core1..2.3.4.5.6 Core7 read OLD values again

    Any idea????

    Fabio
  • additional info:

    MSMC is also used to store code, 8 cores run the same code stored in MSCM, is that a problem?

    Fabio
  • This issue is more and more interesting

    I think about some possible issues here -

    1. You think you disable the cache, but you really have not.   So when any core starts, print the value of the cache configuration register and verify that indeed the L1D cache is disabled. 

    2. Before core 0 sends the first interrupt (Notify) to all other cores, all other cores are in idle, right?

    3. The fact that it happens some times and not every time bothers me.  I suggest one more experiment.  Move the data to DDR instead of MSMC memory, and set the MAR register for this area to zero, no cache, no pre-fetch.   Then repeat the experiment with and without the cache invalidate and write back.  Lets see what you see.

    Ran

  • Hi
    you are right , I disable cache using platform settings, but then after boot I find:

    CACHE_L1PCFG = 4 //32kB
    CACHE_L1DCFG = 4 //32kB
    CACHE_L2CFG = 1 //32kB

    I'm looking into the code and I cant find who write those register....

    I will investigate and get back to you
    Thank you
  • Sorry, forget my previous post, I used a wrong SW project

    Fabio

  • Hi

    Found the problem
    After changing cache in platfom settings, a project clear all and rebuild all is highly recommended!!

    Fabio