MSMC cache

Fabio Cipriani

Intellectual 910 points

I'm probably having an issue about cache coherency on c6678 (sometimes I read wrong values)

I want eight cores to access a shared variable in MSMC. Could you confirm that the following procedure is correct

I set cache as follow in the platform settings:
L2 cache= 32K
L1D cache = 32K
L1P cache = 32K

Variable definition as follow (pseudo code):

#pragma DATA_ALIGN (128);
#pragma DATA_SECTION(".MSharedSram")
var x;

I pad the variable so that sizeof(x) = 128

To modify the variable (pseudo code):

#define MY_SEM (17)

{
while ((CSL_semAcquireDirect (MY_SEM)) == 0);

CACHE_invL1d ((void*)&x, sizeof(x), CACHE_WAIT);
CACHE_invL2 ((void*)&x, sizeof(x), CACHE_WAIT);

x= x + 1; //write to x

CACHE_wbL1d ((void*)&x, sizeof(x), CACHE_WAIT);
CACHE_wbL2 ((void*)&x, sizeof(x), CACHE_WAIT);

CSL_semReleaseSemaphore (MY_SEM);
}

is it ok?

Thanks

Fabio

over 8 years ago

0 Yordan Kovachev over 8 years ago

TI__Guru**** 161600 points

Hi Fabio,

I've notified the design team. Their feedback will be posted here.

Best Regards,
Yordan

0 ran35366 over 8 years ago in reply to Yordan Kovachev

TI__Genius 12805 points

The code looks OK, I am not sure that the cache is the problem, so lets test it

Disable cache (that is, set the cache size to 0) for both L1D and L2 and run the code again. See if the problem repeats itself.

If it does, then it is not the cache and you continue to debug

If the problem disappeared without cache, we will see what to do next.

Two comments -

MSMC memory is considered L2 so it is not cached by L2 - only by L1D (Unless you play games and I do not think so)

Make sure that the optimization is OFF and that x is defined as volatile

Ran

0 Fabio Cipriani over 8 years ago in reply to ran35366

Intellectual 910 points

Sorry, I will get back to you as soon as possible with any update on this

Fabio

0 Fabio Cipriani over 8 years ago in reply to Fabio Cipriani

Intellectual 910 points

Hi

I confirm that X is defined as volatile
Optimization is set to 3 (interprocedure optimizations)

Extra info:
- my shared variable is a struct of size 128 byte containing a counter and other stuff
- core0 updates this variable every 2 ms (counter++) and wakes up other core using Notify_sendEvent, other cores read the shared variable,
acquire some data from Hyperlink and then start processing acquired data
- TOTAL data acquisition from Hyperlink + TOTAL processing function take about 1.8-1.9 ms,obviously I must end processing before shared variable is updated again.
- when I say that sometimes I read wrong values, I mean that I read OLD value (for example I expect counter=1000, I read counter 999).

I hope I made myself clear

I do the following tests:

L2 cache= 0 (as suggested)
L1D cache = 0 (as suggested)
L1P cache = 32K
optimization changed from 3 (interprocedure optimizations) to OFF (as suggested)
CLEAR/REBUILD ALL

--> Problem disappeared (it run for 2 hours) , but this test is not very reliable because setting optimization to OFF changes
all processing timing,I mean...acquisition + processing take now about 25 ms (instead of 1.9),
so I had to change updating time from 2 ms to 25 ms

Why did you suggest to turn OFF optimization?? is it a problem to have Optimization = 3???

So I try the following :
L2 cache= 0
L1D cache = 0
L1P cache = 32K
optimization back to 3
updating time 2 ms
CLEAR/REBUILD ALL

--> problem is still present again

Suppose I'm to sloow in processing data, I would expect something like:
"expected counter= 1000, read counter=1001" (meaning that I miss an update)

Instead, as already said I have "expect counter=1000, read counter 999) (meaning that I read OLD data)

Do you think is a cache problem or do I have a bug in my code?

Fabio

0 ran35366 over 8 years ago in reply to Fabio Cipriani

TI__Genius 12805 points

Fabio

The reason why I asked you to turn off optimization is similar to the reason why I asked you to disable cache. To eliminate possibilities of the problem.

That is, I wondered if somehow the compiler eliminates the X because it is not volatile somewhere. The optimizer gets ride of what it perceives as "dead code" namely, code that does not do anything. You know what I mean.

I do not think it is a cache problem. (why loosing one value? if the cache is not updated you will stay with the original value, right?). I do think that this is a timing issue. - when you change the timing the problem disappears.

So go back to the design board and analyze again all the data movements, and the delays that are associated with them. How do you read the data? EDMA or CPU? Do you account for overhead of the peripherals?
My rule of Thumb - make sure that no bandwidth is more than 50% of the theory

I hope it helps

Ran

0 Fabio Cipriani over 8 years ago in reply to ran35366

Intellectual 910 points

I read data From HyperLink using EDMA

Ok , I will investigate further ...

thank you very much

Fabio

0 Fabio Cipriani over 8 years ago in reply to Fabio Cipriani

Intellectual 910 points

Hi

Sorry but... something strange is happening....

I did the following:

CACHE DISABLED, Optimization=3

To avoid timing issue, I set updating time to 500 ms.....
then I comment out acquisition and processing code, so now my Application does nothing except:
-core0 updates variable every 500 ms
-other cores read the variable and check that the counter is correctly incremented ----> NO PROBLEM

Then I comment out cache_invalidate and cache_writeback, because I dont need to do this anymore since cache is disabled right?????

And what happens is that read FAILS, Core1..2.3.4.5.6 Core7 read OLD values again

Any idea????

Fabio

0 Fabio Cipriani over 8 years ago in reply to Fabio Cipriani

Intellectual 910 points

additional info:

MSMC is also used to store code, 8 cores run the same code stored in MSCM, is that a problem?

Fabio

0 ran35366 over 8 years ago in reply to Fabio Cipriani

TI__Genius 12805 points

This issue is more and more interesting

I think about some possible issues here -

1. You think you disable the cache, but you really have not. So when any core starts, print the value of the cache configuration register and verify that indeed the L1D cache is disabled.

2. Before core 0 sends the first interrupt (Notify) to all other cores, all other cores are in idle, right?

3. The fact that it happens some times and not every time bothers me. I suggest one more experiment. Move the data to DDR instead of MSMC memory, and set the MAR register for this area to zero, no cache, no pre-fetch. Then repeat the experiment with and without the cache invalidate and write back. Lets see what you see.

Ran

0 Fabio Cipriani over 8 years ago in reply to ran35366

Intellectual 910 points

Hi
you are right , I disable cache using platform settings, but then after boot I find:

CACHE_L1PCFG = 4 //32kB
CACHE_L1DCFG = 4 //32kB
CACHE_L2CFG = 1 //32kB

I'm looking into the code and I cant find who write those register....

I will investigate and get back to you
Thank you

0 Fabio Cipriani over 8 years ago in reply to Fabio Cipriani

Intellectual 910 points

Sorry, forget my previous post, I used a wrong SW project

Fabio

0 Fabio Cipriani over 8 years ago in reply to Fabio Cipriani

Intellectual 910 points

Hi

Found the problem
After changing cache in platfom settings, a project clear all and rebuild all is highly recommended!!

Fabio

Processors

Processors forum

MSMC cache