Linux/AM4376: Write-through cache memory mapping

Annette Kratz

Part Number: AM4376

Tool/software: Linux

I am using Linux and mapping memory for uio. I want to map memory so that write through cache is used. I think the processor would support it. I use now:

vma->vm_page_prot=__pgprot_modify(vma->vm_page_prot, L_PTE_MT_MASK, L_PTE_MT_DEV_SHARED|L_PTE_SHARED);

result= remap_pfn_range(vma,
          vma->vm_start,
          idev->info->mem[mi].addr >> PAGE_SHIFT,
          vma->vm_end - vma->vm_start,
          vma->vm_page_prot);

which results in no cache used. if I use " L_PTE_MT_DEV_SHARED|L_PTE_SHARED|L_PTE_MT_WRITETHROUGH", I have slower memory access, for read and write. I would expect write-access being the same, but read being much faster if I read the same memory several times.

over 7 years ago

0 Biser Gatchev-XID over 7 years ago

TI__Guru**** 393215 points

What Linux version is this?

0 Annette Kratz over 7 years ago in reply to Biser Gatchev-XID

Prodigy 210 points

Linux 4.9.68

0 Rex Chang over 7 years ago in reply to Annette Kratz

TI__Guru 50170 points

Hi, Annette,

I have a few things need to check with our internal ARM and Linux experts. Things are related to A9 capability,etc. It may take a while before I post anything here, but I'll keep you updated.

Rex

0 Rex Chang over 7 years ago in reply to Rex Chang

TI__Guru 50170 points

Hi, Annette,

I read the ARM A9 r2p0 TRM, and in section 5.3, it indicates

You can configure the MMU to perform hardware translation table walks in cacheable
regions by setting the IRGN bits in the Translation Table Base Registers. If the encoding
of the IRGN bits is write-back, then an L1 data cache look-up is performed and data is
read from the data cache. If the encoding of the IRGN bits is write-through or
non-cacheable then an access to external memory is performed.

The way I interpret is that if it is write-through, it won't be cached. Is that also the way you understand?

Rex

0 Annette Kratz over 7 years ago in reply to Rex Chang

Prodigy 210 points

Hi Rex,

from this describtion, it would seem that write-through cache is not possible, as it is identical with non-cached. I was referring to " PL310 Cache Controller Technical Reference Manual" and in chapter 2.2 a table describes different modes, and

"Cacheable write-through, allocate on read"

is exactly what I am looking for and what we used with other CPUs. I think the PL310 is responsible for L2 cache, so the combined information would mean L2 cache supports write through, while L1 cache ignores it.

Using the memory with L2 "write through" cache should be faster then no cache at all, but I need to be able to configure it from Linux.

0 Annette Kratz over 7 years ago in reply to Annette Kratz

Prodigy 210 points

we found a flag

L_PTE_MT_DEV_CACHED

to be used for mapping which seems to create the behavior we want. If you could confirm which cache strategy is applied by this?

0 Rex Chang over 7 years ago in reply to Annette Kratz

TI__Guru 50170 points

Hi, Annette,

I am glad to hear your issue is resolved. That flag and others are defined as the table below.

* MT type Pre-ARMv6 ARMv6+ type / cacheable status
* UNCACHED Uncached Strongly ordered
* BUFFERABLE Bufferable Normal memory / non-cacheable
* WRITETHROUGH Writethrough Normal memory / write through
* WRITEBACK Writeback Normal memory / write back, read alloc
* MINICACHE Minicache N/A
* WRITEALLOC Writeback Normal memory / write back, write alloc
* DEV_SHARED Uncached Device memory (shared)
* DEV_NONSHARED Uncached Device memory (non-shared)
* DEV_WC Bufferable Normal memory / non-cacheable
* DEV_CACHED Writeback Normal memory / write back, read alloc
* VECTORS Variable Normal memory / variable

Please click "Resolved" if that answers your question.

Rex

0 Annette Kratz over 7 years ago in reply to Rex Chang

Prodigy 210 points

the flag we used is L_PTE_MT_DEV_CACHED. The table above suggests this would result in Writeback-Cache, which is NOT what we want, and is also, if we measure the timing, not what we get. We think that L_PTE_MT_DEV_CACHED results in a Writethrough-Cache, so it is either this documentation or TI- implementation wrong. We would like to clarify the issue so that we don´t use a configuration which just produces the correct result by accident.

0 Rex Chang over 7 years ago in reply to Annette Kratz

TI__Guru 50170 points

Hi, Annett,

I had a lengthy discussion with our ARM expert, and he suggests the following:

1. To understand what is the exact page table entry configuration used it would be useful to get a dump of the page table entry itself rather than the snippet of modifying a few bits out of it. Specifically TEX, C, B bits. From ARM Cortex A Series Programmer’s guide ARMv7 (infocenter.arm.com/.../index.jsp ) it looks like the TEX should be 000 and C should be 1 for both write through and write back. Bit B being 0 is “write-through, no allocate on write” and 1 is “write-back, no allocate on write”.
2. When modifying page table entries manually TLB maintenance is required for the changes to take effect. ARMv7 architecture manual section B4.2.2, ARM Cortex-A series programmers guide section 10.5 TLB coherency have some background. Linux flush_tlb_range() is an example.
3. It would be good to understand what is the goal of trying to use write-through? Is it to avoid SW managed cache writeback? How is the performance effect of write-through being measured would be important to understand. Write-through will not have a performance effect compared to write back until the write buffer gets exhausted. Read allocation is independent of write policy. ARM Cortex A Series Programmer’s guide ARMv7 (infocenter.arm.com/.../index.jsp ) chapter 9 caches has an overview of the behavior of caches, write buffers, etc. that could be useful.
4. The AM43x ARM Cortex A9 is a Standard A9 r2p10 with L2 cache that is standard ARM PL310 R3p2. All MMU and cache features are mainline Linux supported with code shared across all ARMv7’s. All ARM documentation for the versions listed applies.

As pointed out in the last bullet, this part of code is not TI’s implementation, but upstream ARM linux kernel code. For more details on the configuration and its behavior, please consult ARM.

Rex

Processors

Processors forum

Linux/AM4376: Write-through cache memory mapping