TDA4VH-Q1: Cache Coherency

Ash A.

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: ASH, TDA4VL, TDA4VM, TDA4VH

Hi there,

I'm not sure I fully understand / agree with the resolution posted on the linked question (i.e. "TDA4VH-Q1: OpenVX tivxMemBufferMap/tivxMemBufferUnmap cache consistency problem?")

Sure, A72 as a symmetric multiprocessor is cache coherent itself in that if one core of the A72 CPU writes to memory, its internal cache coherency protocol will ensure that the other cores in the same CPU update their own copies of that data in their own core-local caches, but how is an A72 core supposed to know of changes made to a location of memory made by another processor in the system, say a C7x, except if (a) that memory is uncached, or (b) that A72 core invalidates the cache lines corresponding to that address range?

Consider this scenario: An A72 core has read from memory location 0xABCD and consequently has the corresponding cache line in its local cache. Now, while that particular A72 core will be informed of changes by the other cores on the same A72 CPU, how is it supposed to know that a C7x DSP wrote to location 0xABCD as part of a compute offload to that DSP core, hence making that A72 core's cached copy invalid?

Don't we need a cache invalidation regardless? In a co-processing environment where an A72 is offloading some compute to another processor, such as a DSP core, shouldn't buffer map on the A72 always include a cache invalidation?

over 2 years ago

0 Fabiana Jaimes over 2 years ago

TI__Mastermind 19670 points

Hi Ash,

Due to a holiday in India, half of our team is currently out of office. Please expect a 1~2 day delay in responses.

Apologies for the delay and thank you for your patience.

- Fabiana Jaimes

0 Ash A. over 2 years ago

Prodigy 180 points

Ash A. said:
?

For the sake of completeness, I would also like to add that the same logic applies for cache flushes / write-backs. While each core in an A72 is cache coherent, how is a C7x core supposed to know of the latest memory writes in an A72 core without a flush?

0 Ash A. over 2 years ago in reply to Fabiana Jaimes

Prodigy 180 points

Thanks Fabiana. This is urgent. We have a release pending and I would appreciate it if I could get an answer soon. Many thanks!

0 Nikhil Dasan over 2 years ago in reply to Ash A.

TI__Guru* 86776 points

Sure, please expect a delay in response to due holidays in India.

0 Nikhil Dasan over 2 years ago in reply to Nikhil Dasan

TI__Guru* 86776 points

Hi,

Sorry for the delay in response.

For the case between A72 and C7x, need both sides to configure the memory region to be sharable. Then, the HW will take care to let A72 to get the right data write by C7x. So, the A72 side Cache INV and WB can be skipped.

Regards,

Nikhil

0 Ash A. over 2 years ago in reply to Nikhil Dasan

Prodigy 180 points

Hi Nikhil,

Thank you for your answer. I am asking to satisfy my own curiosity at this point and I would appreciate any answer that would help me expand my understanding, but how is that done at the hardware level? Is there some sort of cache coherency protocol between these two processors (A72 and C7x) that allows them to message each other or are they communicating through uncached memory?

Thank you!

+1 Richard Woodruff over 2 years ago in reply to Ash A.

TI__Mastermind 24155 points

Hello,

The answer to your questions varies depending on the which member of the TDA4 family you are looking at. It also depends on which point of the address space is being shared. MSMC3-direct connected resources (clusters, local-sram, l3-cache, ddr) get some level of hardware coherency support.

For TDA4VM and TDA4VL both the A72 cluster and C7x corepack/s are direct connected to the MSMC3 interconnect. Each cluster supports hardware coherency and will issue ~MESI messages which are communicated through the MSMC3 interconnect. The MSMC3 tracks and optimizes traffic messages using its snoop cache. The A72 and C7x MMUs share a common format which includes a sharability attribute. Memory pages which hold that attribute are tracked by the coherency logic and thus need no software maintenance at the producer or consumer side.

For TDA4VH each A72 cluster is direct connected to the MSMC3 so both A72 clusters are fully coherent between each other however each C7x is no longer direct connected to the MSMC3. This means that some SW cache operations are needed when exchanging data between an A72 instance and a C7 instance. What is required depends on who is the producer and who is the consumer. For the C7 to read data the A72 writes, it first has to invalidate its local cache. For the A72 to see data written by the C7, the C7 would first has to flush its cache into the MSMC3. For TDA4VH to support the scaled up use cases (from TDA4VL/M) extra DDR channels were added along with more local memory in each C7x instance. This extra per-C7x memory was added in a way which amounts to the A72 to C7x flows having coherency at the level of 'io-coherency' instead of full coherency.

I'll attach a two diagrams from designers which show was is necessary for each TDA4 SOC. The DRU is a prefetch engine. The SOC is reset of chip below the nav-north-bridge. Traffic from the SOC will flow in to the MSMC3 as cached (C) or non-cached (NC). Data which flows

/resized-image/__size/640x480/__key/communityserver-discussions-components-files/791/tda4v_5F00_m_5F00_l_5F00_2023_2D00_10_2D00_27_5F00_12h15_5F00_13.png

/resized-image/__size/640x480/__key/communityserver-discussions-components-files/791/tda4vh_5F00_2023_2D00_10_2D00_27_5F00_12h15_5F00_29.png

The software in TI's SDK is designed to follow the rules as described above. Most end users at the A72 and C7x cluster level are probably adding/building applications atop framework API abstractions so they likely are not worrying about the lower layer plumbing. Users in the rest of SOC (R5 and other) probably need more awarness if they are trying to communicate back to CPU cluster level applications.

Regards,

Richard W.

0 Ash A. over 2 years ago in reply to Richard Woodruff

Prodigy 180 points

Hi Richard. I'm grateful for you having taken the time to write such a detailed answer. This was so satisfying to read and learn. Thanks!

Processors

Processors forum

TDA4VH-Q1: Cache Coherency