This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: Can I define my ring in a non-cachable region if I am only using one core directly in a HS-FS device?

Other Parts Discussed in Thread: AM6442

I am using DMASS in a HS-FS device AM6442 in the aerospace industry. we need to simplify the code to reduce the cost as much as possible. And I have a question about Ring definition in the memory.

I see that in example codes and the TRM, it is mentioned that the Packet Descriptor, the data buffers, and the rings have to be aligned with the cache line for optimal transfer. 

I see in the example that the packet descriptors and data buffers are defined as non-cachable but the rings are cachable. Can you please explain to me why they need to be cachable?

I am using HS-FS device AM6442, therefore DMSC (core M3) is playing a role on my board, however, the only core I directly use is R5f_0_0.

Since I am only using directly only one core (not considering the DMSC) along with the DMASS, can I define it as non-cachable as well? In that case,

  • I will not need to use memory fences when queuing/dequeuing (CSL_archMemoryFence)
  • I will not need to use the functions CacheP_wb and CacheP_inv to write-back or invalidate the cache.

Thanks, 

Boshra

  • Hello Boshra,

    I am looking at your queries and you may expect reply in one or two days .

    Regards,

    S.Anil.

  • Some of the replies have been accidentally deleted in the thread . I apologize for that and adding our conversion below.

    Hello Boshra,

    Sorry I understand your pain. I am busy on other customer escalations and after that I got sick. Due to this my reply, was delayed.

    Please see the below points.

    When we do the cache_wb call for the source buffer, the processor will keep a copy of this structure buffer in the MAIN memory area. So, DMA will use only MAIN memory addresses to transfer data from one location to another, and it does not know about cache memories.

    Now, when the DMA completion event is completed, the processor will try to read destination buffer data from the cache and not from the MAIN memory since we kept these MSRAM and DDR memory locations in the cache type, and the processor does not know that this destination memory is updated with new data by DMA. So, when we do cache_invalidation, the processor looks for data from the MAIN memory and not from the cache. So, the user is able to see data in the destination buffer after cache_invalidation.

    If you do non-cache for MSRAM memory locations in MPU region settings, then do not require any cache wb and cahe_invalidation; you can see content data on destination memory after DMA completion.

    So, we can keep the source and destination buffer in non-cached so that you can skip the cache_wb and cahe_invalidation for source buffer and destination buffer, but packet description should keep them in the cache since this step should be done in initializations for only one time .

    In your Application you are facing any timing issues while performing cache_invalidation for every DMA completion event ?

    Cache_wb will be use only one time during initializations ? These two functions create any timing issues ?

    Boshra said:

    I see in the example that the packet descriptors and data buffers are defined as non-cachable but the rings are cachable. Can you please explain to me why they need to be cachable?

    In examples, we are using MSRAM memory for DMA operations and all these memory locations belong to the cache. Can you please point out where we mentioned these locations as non-cached ? If I am missing something ?

    Please provide above answers .

    Regards,

    S.Anil.


    Boshra replied to AM6442: Can I define my ring in a non-cachable region if I am only using one core directly in a HS-FS device?.

    Hi,

    This is the example that I used as my reference, attached to this ticket:

    e2e.ti.com/.../4597533

    Both the source and destination buffers are located in a non-cachable memory section " and this section of MSRAM based on the MPU: 

    In addition to that the Descriptor, Ring memory, and data buffers are all aligned to the cache line in the example. As the following code.

    static uint8_t gRxFqRingMem[UDMA_ALIGN_SIZE(UDMA_TEST_RING_MEM_SIZE)] __attribute__((aligned(UDMA_CACHELINE_ALIGNMENT)));static uint8_t gUdmaRxHpdMem[UDMA_ALIGN_SIZE(UDMA_TEST_DESC_SIZE)] __attribute__((aligned(UDMA_CACHELINE_ALIGNMENT)));  uint32_t gUartDestBufping[128] __attribute__((aligned(128), section(".bss.nocache")));uint32_t gUartDestBufpong[128] __attribute__((aligned(128), section(".bss.nocache")));
    You can see that the  section "bss.nocache in MSRAM is defined as non-cachable (at address 0x70060000 based on its linker file).

    Swargam Anil said:

    Can you please point our the TRM as well ? Since I could not find this info in TRM as you know this is big chapter .

    In the TRM version H, the cacheline alignment is mentioned at section 11.1.3.2, page 5954, as followed:
    "Even though descriptors and buffers may be allocated on any 16-byte alignment, careful consideration of the
    alignment effects should be made based on the storage location and any cache related affects that may exist.
    If data structures are placed in off-chip SDRAM the burst size and alignment restrictions of the memory devices
    must be considered in order to avoid performance issues related to continually fetching mis-aligned blocks. In
    this case, the memory efficiency can be reduced to 50% because 2 memory bank lines are read for every line
    sized data fetch. Similarly, placing more than one descriptor or buffer object within a single cache line can cause
    the adjacent object to become corrupted during cache line writeback operations."

    Also in the "AM64x/AM243x Functional Safety Manual", for the BCDMA section it is written that: 

    "To support this software test, it is highly recommended to configure the target memory space as a strongly ordered,
    non-bufferable memory region for the CPU using the memory protection unit (MPU) or memory management
    unit (MMU) for the CPU. This ensures that the register write has completed before the read back is initiated."

    It is mentioned for target space memory, to me this means source/destination buffers. And based on R5f TRM, strongly-ordered memory region it shows that it is non-cachable, So my conclusion was that source and destination buffers have to be defined non-cachable. ?

    reference for Cortex R5f: developer.arm.com/.../CBBIDHID 

    After you sharing your example I checked another example "udma_memcpy_interrupt_am64x-evm_R5fss0-0_nortos_tim-arm-clang"and here also it is defined as chachable. So I'm very confused now. Is there a mistmach or I didn't understand it correctly? Can you explain me why this doesn't match?

    Swargam Anil said:

    but packet description should keep them in the cache since this step should be done in initializations for only one time . Can you confirm at your test bench ?

    Sorry It's not clear to me why Packet Descriptor should be kept in cache. You mean only for optimization since we only use it once?

    How about the Rings?

    Thanks, 

    Boshra

     

    Hello Boshra,

    Actually, in all our MCU+SDK examples the MSRAM memory is cached. So, in UDMA examples, all buffers should be placed in cache .That's reason before UDMA starts we do cache_wb for source buffer and after completion of DMA we do cache_invalidation.

    Actually, when we initialize the TRPD we only use cahe_wb for ring memory for only one time doing the initializations and here what is the problem with keeping ring memory in the cache ?

     After completion of DMA, for every time we need to do cache_invalidation for seeing the content by the R5F. This I can agree for every DMA completion we are doing cache_invalidation is taking time to process this operation.

    I am trying to understand you wanted to eliminate cache_wb and cache_invalidation is your code ?

    What is the use case ?

    Why do you want to go to UART with BCDMA ?

    Please share your inputs so that I can try to assist you better.

    Regards,

    S.Anil.


    Boshra replied to AM6442: Can I define my ring in a non-cachable region if I am only using one core directly in a HS-FS device?.
    Swargam Anil said:
    I am trying to understand you wanted to eliminate cache_wb and cache_invalidation is your code ?
    regarding this, did you read this paragraph of my previous answer, I started by looking into the safety manual before any implementation, and as I mentioned there, based on TI safety manual my interpretation was that it is highly recommended the source and destination buffers be strongly ordered meaning that it can not be cached:
    Boshra said:
    Also in the "AM64x/AM243x Functional Safety Manual", for the BCDMA section it is written that:
    "To support this software test, it is highly recommended to configure the target memory space as a strongly ordered,
    non-bufferable memory region for the CPU using the memory protection unit (MPU) or memory management
    unit (MMU) for the CPU. This ensures that the register write has completed before the read back is initiated."
    It is mentioned for target space memory, to me this means source/destination buffers. And based on R5f TRM, strongly-ordered memory region it shows that it is non-cachable, So my conclusion was that source and destination buffers have to be defined non-cachable. ?
    Can you confirm if my understanding here is correct or not?
    Swargam Anil said:
    Why do you want to go to UART with BCDMA ?
    I use BCDMA because I'm interested in triggering the start of the transfer as well as I want it to happen infinitely, and these features are only provided by BCDMA and not PKTDMA.
    Thanks,
    Boshra.

  • "To support this software test, it is highly recommended to configure the target memory space as a strongly ordered,
    non-bufferable memory region for the CPU using the memory protection unit (MPU) or memory management
    unit (MMU) for the CPU. This ensures that the register write has completed before the read back is initiated."
    It is mentioned for target space memory, to me this means source/destination buffers. And based on R5f TRM, strongly-ordered memory region it shows that it is non-cachable, So my conclusion was that source and destination buffers have to be defined non-cachable. ?

    Hello Boshra,

    Thanks for sharing the details.

    We don't check mostly the safety manual, and  we can check the TRM.

    I am discussing with other experts as well for the above point.

    Regards,

    S.Anil.

  • Thanks. Looking forward to hearing back from you.

    Thanks,

    Boshra

  • Hello. 

    Any updates?

    Thanks, 

    Boshra

  • Hello Boshra,

    I have posted queries to an expert, and he a was on leave till tomorrow, and hopefully, you may get, reply by the day after tomorrow.

    Actually, I have internally checked with the India Team about your query and don't get any conclusion on this and taking help from Dallas Team.

    Regards,

    S.Anil.

  • Thats appreciated.

    Boshra

  • Hi, 
    Any updates? 

    It has been more than two weeks from your latest update instead of the 1-2 days that was originally anticipated. 

    Thanks, 

    Boshra

  • There are several branches in this thread. But let's try to tackle some of these one by one.

    I see in the example that the packet descriptors and data buffers are defined as non-cachable but the rings are cachable. Can you please explain to me why they need to be cachable?

    Making something cacheable is a performance improvement, never a requirement. So you make something cacheable to improve performance, but logically everything will work. Downside is of cacheable is responsibility for cache coherency, and getting that logically correct. So nothing is required to be cacheable. For example ARM Cortex R programmers guide, https://documentation-service.arm.com/static/60ffb7c39ebe3a7dbd3a78b7?token= section 9.3.2 Memory Types for a good overview.

    Since I am only using directly only one core (not considering the DMSC) along with the DMASS, can I define it as non-cachable as well? In that case,

    • I will not need to use memory fences when queuing/dequeuing (CSL_archMemoryFence)
    • I will not need to use the functions CacheP_wb and CacheP_inv to write-back or invalidate the cache.

    No. You still need cache management coherency/consistency between the R5 core and the external initiator (DMA). There are write buffers, reordering and various other mechanisms that need to be synchronized to guarantee ordering. Brute force way to do this is make everything strongly ordered, which usually is unacceptably low performance. Strongly ordered means the R5 is running one instruction at a time, stalling while the writes are in flight etc.

    Also in the "AM64x/AM243x Functional Safety Manual", for the BCDMA section it is written that:
    "To support this software test, it is highly recommended to configure the target memory space as a strongly ordered,
    non-bufferable memory region for the CPU using the memory protection unit (MPU) or memory management
    unit (MMU) for the CPU. This ensures that the register write has completed before the read back is initiated."
    It is mentioned for target space memory, to me this means source/destination buffers. And based on R5f TRM, strongly-ordered memory region it shows that it is non-cachable, So my conclusion was that source and destination buffers have to be defined non-cachable. ?
    Can you confirm if my understanding here is correct or not?

    You skipped over the key statement at the start of the paragraph:

    In order to ensure proper configuration of memory-mapped control registers in the module, it is highly recommended that software implement a test to confirm proper operation of all control register writes.

    followed by your copied text. So the statement applies to configuration registers. Nothing to do with descriptors or buffers.

      Pekka