This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM4376: Write-through cache memory mapping

Part Number: AM4376

Tool/software: Linux

I am using Linux and mapping memory for uio. I want to map memory so that write through cache is used. I think the processor would support it. I use now:

vma->vm_page_prot=__pgprot_modify(vma->vm_page_prot, L_PTE_MT_MASK, L_PTE_MT_DEV_SHARED|L_PTE_SHARED);

 result= remap_pfn_range(vma,
          vma->vm_start,
          idev->info->mem[mi].addr >> PAGE_SHIFT,
          vma->vm_end - vma->vm_start,
          vma->vm_page_prot);

which results in no cache used. if I use " L_PTE_MT_DEV_SHARED|L_PTE_SHARED|L_PTE_MT_WRITETHROUGH", I have slower memory access, for read and write. I would expect write-access being the same, but read being much faster if I read the same memory several times.

  • What Linux version is this?
  • Hi, Annette,

    I have a few things need to check with our internal ARM and Linux experts. Things are related to A9 capability,etc. It may take a while before I post anything here, but I'll keep you updated.

    Rex
  • Hi, Annette,

    I read the ARM A9 r2p0 TRM, and in section 5.3, it indicates

    You can configure the MMU to perform hardware translation table walks in cacheable
    regions by setting the IRGN bits in the Translation Table Base Registers. If the encoding
    of the IRGN bits is write-back, then an L1 data cache look-up is performed and data is
    read from the data cache. If the encoding of the IRGN bits is write-through or
    non-cacheable then an access to external memory is performed.

    The way I interpret is that if it is write-through, it won't be cached. Is that also the way you understand?

    Rex
  • Hi Rex,

    from this describtion, it would seem that write-through cache is not possible, as it is identical with non-cached. I was referring to " PL310 Cache Controller Technical Reference Manual" and in chapter 2.2 a table describes different modes, and

    "Cacheable write-through, allocate on read"

    is exactly what I am looking for and what we used with other CPUs. I think the PL310 is responsible for L2 cache, so the combined information would mean L2 cache supports write through, while L1 cache ignores it. 

    Using the memory with L2 "write through" cache should be faster then no cache at all, but I need to be able to configure it from Linux.

  • we found a flag

    L_PTE_MT_DEV_CACHED

    to be used for mapping which seems to create the behavior we want. If you could confirm which cache strategy is applied by this?

  • Hi, Annette,

    I am glad to hear your issue is resolved. That flag and others are defined as the table below.

    * MT type       Pre-ARMv6      ARMv6+ type / cacheable status
    * UNCACHED      Uncached       Strongly ordered
    * BUFFERABLE    Bufferable     Normal memory / non-cacheable
    * WRITETHROUGH  Writethrough   Normal memory / write through
    * WRITEBACK     Writeback      Normal memory / write back, read alloc
    * MINICACHE     Minicache      N/A
    * WRITEALLOC    Writeback      Normal memory / write back, write alloc
    * DEV_SHARED    Uncached       Device memory (shared)
    * DEV_NONSHARED Uncached       Device memory (non-shared)
    * DEV_WC        Bufferable     Normal memory / non-cacheable
    * DEV_CACHED    Writeback      Normal memory / write back, read alloc
    * VECTORS       Variable       Normal memory / variable

    Please click "Resolved" if that answers your question. 

    Rex

  • the flag we used is L_PTE_MT_DEV_CACHED. The table above suggests this would result in Writeback-Cache, which is NOT what we want, and is also, if we measure the timing, not what we get. We think that L_PTE_MT_DEV_CACHED results in a Writethrough-Cache, so it is either this documentation or TI- implementation wrong. We would like to clarify the issue so that we don´t use a configuration which just produces the correct result by accident.
  • Hi, Annett,

    I had a lengthy discussion with our ARM expert, and he suggests the following:

    1. To understand what is the exact page table entry configuration used it would be useful to get a dump of the page table entry itself rather than the snippet of modifying a few bits out of it. Specifically TEX, C, B bits. From ARM Cortex A Series Programmer’s guide ARMv7 (infocenter.arm.com/.../index.jsp ) it looks like the TEX should be 000 and C should be 1 for both write through and write back. Bit B being 0 is “write-through, no allocate on write” and 1 is “write-back, no allocate on write”.
    2. When modifying page table entries manually TLB maintenance is required for the changes to take effect. ARMv7 architecture manual section B4.2.2, ARM Cortex-A series programmers guide section 10.5 TLB coherency have some background. Linux flush_tlb_range() is an example.
    3. It would be good to understand what is the goal of trying to use write-through? Is it to avoid SW managed cache writeback? How is the performance effect of write-through being measured would be important to understand. Write-through will not have a performance effect compared to write back until the write buffer gets exhausted. Read allocation is independent of write policy. ARM Cortex A Series Programmer’s guide ARMv7 (infocenter.arm.com/.../index.jsp ) chapter 9 caches has an overview of the behavior of caches, write buffers, etc. that could be useful.
    4. The AM43x ARM Cortex A9 is a Standard A9 r2p10 with L2 cache that is standard ARM PL310 R3p2. All MMU and cache features are mainline Linux supported with code shared across all ARMv7’s. All ARM documentation for the versions listed applies.

    As pointed out in the last bullet, this part of code is not TI’s implementation, but upstream ARM linux kernel code. For more details on the configuration and its behavior, please consult ARM.

    Rex