This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: How to disable cacheability in MSM or DDR ?

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH, TDA4VM

Hello

My platform is the J784S4 EVM with the TDA4VH processor.

An FPGA is writing data by PCIe into the SoC memory (central MSM or DDR4).
The C7x core is polling this memory to detect when the FPGA has finished writing. (we cannot use interrupts for now)

--> because of the way the cache works, as soon as the C7x core reads into MSM or DDR memory, data is placed into the cache.
so to detect that the FPGA wrote, the C7x must always invalidate the cache line before re-reading in memory --> it's inefficient.

My question is : how can I configure the MSM or DDR to be uncacheable ? I haven't found an easy way to do it

any help is appreciated

Best regards
Clement

  • Hi Clement,

    I think I understand the immediate goal and issue. Basically, we want C7x to keep on polling DDR directly, but since cache is enabled, C7x pulls data from cache instead of DDR unless cache is invalidated to force C7x to pull the latest data from DDR. Furthermore, we want to make it so that we do not need cache invalidation for performance improvements.

    However, I assume the overall intent of C7x polling for data is because there is plan for C7x to do some processing on the data after FPGA has finished write. Disabling cache will most likely impact performance for processing portion of the data, which I think may defeat the purpose of removing cache invalidation. But are we suspecting that removing cache invalidate happening during the polling phase would create more performance upgrade than the performance downgrade from making all data uncacheable?

    Regards,

    Takuma

  • Hi Takuma

    You've perfectly understood the use case.

    Yes, I don't know if making all data uncacheable is worse or not than doing cache invalidation in a loop. --> my goal is to compare the two solutions
    For the record I only invalidate the last line (128 bytes) of cache and it takes around 300ns ... 

    so do you a solution to make the MSM and DDR uncacheable (at least a part of them) ?

    Thanks
    best regards
    Clement

  • any news to make the MSM and DDR uncacheable ?

  • Hi Clement,

    I had notified a colleague of mine to see if they would be able to comment, but it seems they have left on leave - my apologies. I will try to get a different colleague of mine to comment.

    In the meantime, I am not too knowledgeable on the cacheability topic but I can comment what I have heard from others in the past. For enabling cache coherency between A72 and PCIe, we allegedly need to set aype = 0 for enabling snoop filter on MSMC, and set up outer/inner shareability memory attributes. This scenario is different from our current situation, but similar. So I assume doing the opposite, that is, setting atype = 3 and not setting outer/inner shareability attributes would disable caching.

    Regards,

    Takuma

  • Hi Clement,

    I got a response from Richard who is very knowledgeable in cache, and to quote:

    "

    TDA4VM/L have full coherency between A72 and C7x as long as MMU for both marks DDR or MSMC-SRAM with inner sharability attribute. Both use the arm large descriptor format for mmu so its straight forward to enable or disable (well as straight forward as the arm docs make it).

    A DSP cluster has a CMMU which is used to attach cache and other attributes. The table format is the same as what AARCH64 provides. By marking an area as non-cached it won’t allocate in MSMC.

    "

    He also linked a related E2E: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1283968/tda4vh-q1-cache-coherency.

    Based on this, it seems like using the CMMU, we can mark a section in memory as non-cached, and this should disable caching.

    Regards,

    Takuma

  • Hi
    Thank you for the answers and link

    However what I lack is the detailled way to do it 
    I had already looked at the TRM doc and others docs on the CMMU, I've seen for example UTLB_ATTR.MEMTYPE attributes where we can set the Memory Type bit field to "normal non-cacheable memory"

    But in code the only examples and driver code I see in the SDK call

    Mmu_map(virtual addr, physical addr, size, attributes, secure mode)

    but the "attributes" field doesn't include the memory type, it only has : secure/non secure, access permission, privExecute, UserExecute, Shareability, MAIR index and global fields

    in the MMU related C files, I see nothing to set the memory type to non-cacheable. 

    --> How do I do it?

  • Hi Clement,

    I have notified Richard, who is an expert in this area in terms of cache, memory, and everything ARM to see if I can get their comments on this thread.

    In the meantime, I am aware of a memory map guide for J7x devices that can be found here that should be able to set cacheability: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/psdk_rtos/docs/user_guide/developer_notes_memory_map.html.

    Regards,

    Takuma

  • Hello Clement,

    I think Takuma has given the relevant pointer about how to config SW so the memory map sets the attribute to non-cached.

    I would suspect you can also generate a PCIe MSI interrupt from your device after it posts its write to DDR.  I suspect you could use this as an async trigger to do the invalidation the C7 side before a read.  This would remove the polling.  If the operation is frequent this also might end up better.

    From another E2E, I recall you have a TRACE32 debugger.  The debugger knows the MMU format of the C7x (and ARM) and you can verify the code changes are correct at run time.  If you give the command "mmu.info 0x<bufferaddress>" then hit 'scan' it will decode attributes of the translation.  That or 'mmu.list.pagetable' then look up where you buffer is in the range.   The C7x and the A72 both shared the same MMU table format.   ... I find as a hack, you probably can even use TRACE32 to 'find the entry you want, by finding the actual table entry address "mmu.dump.pagetable" then by editing that address in memory.  Now (after any tlbs flush) that new entry will be used.  This is a way to binary edit your existing image so its non-cached, then see if the visibility issue goes away.  A run time hack is a useful quick test, but run time coding needs to be more strict to ensure proper ordering for expected visibility.

    Regards,
    Richard W.
  • Hello Richard, Takuma

    Thanks for the pointers, I had overlooked this page of the documentation and it can indeed help me by showing how it's implemented and customize it to my needs.

    I agree that FPGA interrupt is the way to go but unfortunately the FPGA design can't be modified for now.

    Yes, once my trace32 debugger is up & running, it's one of the first use case I'll look at, so the commands and hack are very helpful.

    I'll keep the forum updated on my progress on this.

    Regards
    Clement

  • Hi again,

    I looked a bit at the files mentionned in the doc, it's mostly files that I had looked over before

    But maybe you can help me understand something

    for the main.c file in vision_apps/platform/j784s4/rtos/c7x_1 in the MMU initialisation function the only difference between a CACHED area and a NON CACHED area is the MAIR id 

    Basically the function goes like in pseudo code

    /* non-cacheable zones */
    set MAIR to MAIR0
    
    call Mmu_init functions with addr src, dst and size
    
    /* cacheable zones */
    set MAIR to MAIR7
    
    call Mmu_init functions with addr src, dst and size

    1) Does that mean that somehow MAIR0 controls what is non-cacheable and MAIR7 is for what is cacheable ?

    2) Does the MAIR choice matter? What was the reasoning behind this choice ? why not MAIR1, MAIR2, ...? I see nothing about this

    I haven't found any documentation addressing that.
    Regards
    Clement 

  • Hello Clement,

    MAIR entries are used in conjunction with MMU page tables.  The composition of these + some other registers define how a run time translation is formed.

    The end tables are where you add or change your ranges.

    The format for the C7 and the A72 are the same for MMU and MAIR.  Scanning the ARM doc's may help in understanding.  I doubt you will have a reason to every modify the MAIR as the base code is setting it up.

    https://developer.arm.com/documentation/ddi0406/c/System-Level-Architecture/System-Control-Registers-in-a-VMSA-implementation/VMSA-System-control-registers-descriptions--in-register-order/MAIR0-and-MAIR1--Memory-Attribute-Indirection-Registers-0-and-1--VMSA

    Regards,
    Richard W.
  • Hi Richard,

    It was a good ressource
    I finally managed to set the MSM as non-cached by changing its MAIR attribute to 4 which is inner & outer non cacheable

    With the link you provided + mmu.h file where const int MAIR4 is defined to 0x44, one can indeed understand its configuration.
    It's a pity it wasn't commented in the source code though

    Thanks again

    Best regards
    Clement

  • Hello Clement,

    Glad to know you are progressing.  Thanks for the feedback, we can send that back to the C7x code owners.  I have recollection the TRM may have a note about this, but there can be a big distance from the TRM to different coding implementations.

    Hopefully you are now in a good position to benchmark different sharing strategies for your application. 

    Regards,
    Richard W.