AM5728: How to check L1 cache Coherency?

Hisao Uchikoshi

Part Number: AM5728

Hi Experts,

We want to confirm L1 cache coherency. When write variable which is on the main memory, there is sometimes the chance that its variable is not expected one. We want to confirm if it variable is still on L1 and not write on main memory or not. This variable are shared for both CPU core and are accessed independently.

This may be unclear to you so if you have any question, please let me know.

Regards,

Uchikoshi

over 4 years ago

0 Richard Woodruff over 4 years ago

TI__Mastermind 20275 points

Hello Uchikoshi-san,

Which L1 cache are you asking about? Multiple cores on the AM5728 have some concept of an L1 cache.

I'll assume you are talking about the dual-Cortex-A15's L1 caches. The A15's L1s are coherent (snoop tracked) to accesses originated by the other A15. Both of these cores are in the same 'inner' shared domain. There can be some programming procedures needed to ensure visibility semantic. An authoritative place for examples in how to maintain coherency between cache and non-cached observers is in ARM's Barrier Litmus Test Cookbook:

http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf .

Coherency is not automatically maintained between the different cores across the chip. It is software's responsibility to ensure proper cache maintenance between initiators (a core or a dma engine). Sometimes this through the use of cache maintenance instruction other times it's by marking memory as non-cached from all views and ensuring any on path buffers have flushed.

On the A15 side, sometimes the JTAG debugger can help you work out if something is in cache or not. While in debug mode the allocations are frozen. This allows dumping from multiple perspectives to help work out a snapshot of the value of a location. Typically you enable/disable the on path MMU or use a non-cached master like the DAP to read the raw contents.

Regards,

Richard W.

0 Hisao Uchikoshi over 4 years ago in reply to Richard Woodruff

TI__Genius 14300 points

Hi Richard,

Thank you for your reply.

My first question was not correct. Sometimes there is a case that after Core0 writes the data then Core1 reads it independently, it was not written one. I understand that CA15's L1s are coherent and any software method are NOT required. To be sure, we want to read L1 data via JTAG. Can you please instruct how we can read it?

Regards

Uchikoshi

0 Richard Woodruff over 4 years ago in reply to Hisao Uchikoshi

TI__Mastermind 20275 points

Hello Uchikoshi-san,

The ARM processor uses a weak memory ordering scheme. This can sometimes result in non-intuitive ordering sensitivities to each CPU observer. The barrier and litmus document I sent from ARM goes through cases where if a programmer needs to depend on the value of a shared variable at a particular point in time, it will be the programmer's responsibility to synchronize the accesses to that variable on each CPU. The ARM architecture provides special instructions (isb, dmb, dsb, strex, ldrex) as primitives to achieve this. Usually, the operating system or c-library will use these instructions to build up higher-level software structures like semaphores, monitors, etc. The software engineer then will use those well-known higher-level support functions as the basis to do proper multi-processor value sharing. These low-level ARM instructions are lightweight compared to full software cache maintenance but they cannot be ignored. Other types of more coarse-grained sharing are handled by the operating system itself. This allows things like process migration from CPU to CPU to work without the programmer having to take special action.

You will need to inspect the code around the variable you are having an issue with and see if it fits one of the patterns in the document and if how its been implemented complies with the rules. To share properly it may make sense to reuse some well-known library instead of trying to hand code.

One important point is the weak memory semantic is only offered for memory with the proper MMU attributes. If each CPU has defined the memory region with the wrong attributes there will be no hardware assist in the coherency at all. The general rule for the A15 is the memory region must be marked as 'copy-back + shared" for the hardware coherency to track it. Sometimes a system programmer might try and use a 'normal-non-cached' type to simplify the coding, but, there is still buffering at each CPU on the path, and without adding proper buffer draining the effects of a memory write may take a while to be seen by the peer processor.

As far as using the debugger, the AM57xx does not provide direct access to the L1 cache. The methods to do this are blocked by security settings. However, by opening a debugger window on each processor it is possible to look at memory through from each CPUs perspective. Further, on a given CPU the MMU can be toggle on and off to allow seeing the main memory and the cache view. Generally while using a debug-halt mode available to JTAG new cache allocations will be frozen. Using this method sometimes cache and mmu setting issues can be worked out. The details of how to execute this is beyond what can be written in an e2e post. On some newer of the AM6xx and AM7xx processors, the debugger does have direct access to the cache tag and values so finding these issues is simplified.

Regards,

Richard W.

Processors

Processors forum

AM5728: How to check L1 cache Coherency?