This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

ti.sysbios.family.arm.a15.Cache Cache_inv and Cache_wb functions in SYS/BIOS 6.45.1.29 don't work when the virtual address extends to the end of address space

Other Parts Discussed in Thread: SYSBIOS, 66AK2H14

With SYS/BIOS 6.45.1.29 was attempting to call the ti.sysbios.family.arm.a15.Cache Cache_inv and Cache_wb functions for a Cortex-A15 in a 66AK2H14 to operate on the upper 2Gbytes of the virtual address space using the following parameters :

        Cache_inv (0x80000000, 0x80000000, Cache_Type_ALL, TRUE);
        Cache_wb (0x80000000, 0x80000000, Cache_Type_ALL, TRUE);

As part of the test program, the Cortex-A15 PMU counters were configured to measure the cache operation events around calls to the Cache_inv and Cache_wb functions:

- 0x46 : Level 1 data cache Write-Back - Victim 
- 0x47 : Level 1 data cache Write-Back - Cleaning and coherency
- 0x48 : Level 1 data cache invalidate
- 0x56 : Level 2 data cache Write-Back - Victim
- 0x57 : Level 2 data cache Write-Back - Cleaning and coherency
- 0x58 : Level 2 data cache invalidate

However, when the Cache_wb and Cache_inv  arguments were set to blockPtr = 0x80000000 and byteCnt = 0x80000000 the PMU counters showed that only one cache line was operated on. Single stepping into the Cache_wb and Cache_inv functions confirmed that with blockPtr = 0x80000000 and byteCnt = 0x80000000 only a single cache line was being operated on. The Cache_wb and Cache_inv functions end up calling the ti_sysbios_family_arm_a15_Cache_wb__E and ti_sysbios_family_arm_a15_Cache_invL1d__I assembler routines with the following loop structure:

@
@ ======== Cache_wb ========
@ Writes back the range of MVA in data cache. First, wait on any previous cache
@ operation.
@
@     Cache_wb(Ptr blockPtr, SizeT byteCnt, Type type, Bool wait)
@
@       r0 - contains blockPtr
@       r1 - contains byteCnt
@       r2 - contains bit mask of cache type (unused)
@       r3 - contains wait
@
        .text
        SMP_ASM(.func ti_sysbios_family_arm_a15_smp_Cache_wb__E)
        UP_ASM(.func ti_sysbios_family_arm_a15_Cache_wb__E)

SMP_ASM(ti_sysbios_family_arm_a15_smp_Cache_wb__E:)
UP_ASM(ti_sysbios_family_arm_a15_Cache_wb__E:)
        add     r1, r0, r1              @ calculate last address
        bic     r0, r0, #Cache_sizeL1dCacheLine - 1
                                        @ align address to cache line
1:
        mcr     p15, #0, r0, c7, c10, #1 @ write back a cache line
        add     r0, r0, #Cache_sizeL1dCacheLine
                                        @ increment address by cache line size
        cmp     r0, r1                  @ compare to last address
        blo     1b                      @ loop if count > 0
        tst     r3, #0x1                @ check if wait param is TRUE
        beq     2f
        dsb                             @ drain write buffer
2:
        bx      lr
        .endfunc

@
@ ======== Cache_invL1d ========
@ Invalidates a range of MVA (modified virtual addresses) in the L1 data cache
@     Cache_invL1d(Ptr blockPtr, SizeT byteCnt, Bool wait)
@
@       r0 - contains blockPtr
@       r1 - contains byteCnt
@       r2 - contains wait
@
        .text
        SMP_ASM(.func ti_sysbios_family_arm_a15_smp_Cache_invL1d__I)
        UP_ASM(.func ti_sysbios_family_arm_a15_Cache_invL1d__I)

SMP_ASM(ti_sysbios_family_arm_a15_smp_Cache_invL1d__I:)
UP_ASM(ti_sysbios_family_arm_a15_Cache_invL1d__I:)
        add     r1, r0, r1              @ calculate last address
        bic     r0, r0, #Cache_sizeL1dCacheLine - 1
                                        @ align blockPtr to cache line
1:
        mcr     p15, #0, r0, c7, c6, #1 @ invalidate single entry in DCache
        add     r0, r0, #Cache_sizeL1dCacheLine
                                        @ increment address by cache line size
        cmp     r0, r1                  @ compare to last address
        blo     1b                      @ loop if > 0
        tst     r2, #0x1                @ check if wait param is TRUE
        beq     2f
        dsb                             @ drain write buffer
2:
        bx      lr                      @ return
        .endfunc

The problem is that the loops in the assembler routines calculate the end address as zero when blockPtr = 0x80000000 and byteCnt = 0x80000000, leading to only one cache line being operated on.

Initially attempted to work-around the problem by setting the byteCnt argument to one byte less. However, a Data Abort due to a Translation Fault occurred when the assembler routines then attempted to perform a cache operation on address zero:

        Cache_inv (0x80000000, 0x7FFFFFFF, Cache_Type_ALL, TRUE);
        Cache_wb (0x80000000, 0x7FFFFFFF, Cache_Type_ALL, TRUE);

The work-around which ended up using was making two calls to the Cache_inv and Cache_wb functions, the first performed operations on all but the final cache line and then a second which operated only on the last cache line:

        Cache_inv (0x80000000, 0x7FFFFFC0, Cache_Type_ALL, TRUE);
        Cache_inv (0xFFFFFFC0, 0x40, Cache_Type_ALL, TRUE);
        Cache_wb (0x80000000, 0x7FFFFFC0, Cache_Type_ALL, TRUE);
        Cache_wb (0xFFFFFFC0, 0x40, Cache_Type_ALL, TRUE);

Is it a bug that Cache_inv and Cache_wb only operate on one cache line when called with blockPtr = 0x80000000 and byteCnt = 0x80000000?

  • Hi Chester,

    Can you please attach your map file?

    Steve
  • Also, which hardware platform are you on?
  • Also, which hardware platform are you on?

    I am using the EVMK2H Rev 4 (with the DDR3A SO-DIMM upgraded to 8GBytes).

    Can you please attach your map file?

    I have attached the complete project which contains the map file 1464.66AK2H14_sys_bios_cache_ops_CortexA.zip along with the custom platform 4606.platforms.zip where the custom platforms directory is placed in the same CCS workspace as the project.

    As far as the memory map goes there are two regions:

    Memory Configuration
    
    Name             Origin             Length             Attributes
    MSM              0x0c000000         0x00600000         xrw
    DDR3             0x80000000         0x80000000         xrw

    The MSM region is used to allocate the SYS/BIOS program Code Memory, Data Memory and Stack Memory. The DDR3 region, which covers the upper 2Gbytes of virtual address space is used by the application for a memory test, for which the entire 2Gbytes was intended to be passed in calls to Cache_inv() and Cache_wb().

    In the attached project in the test_memory_region() function there is currently a work-around applied to make two calls Cache_inv() and Cache_wb() when region=0x80000000 and region_size_bytes=0x80000000 :

    static Void test_memory_region (Void *const region, const SizeT region_size_bytes, const post_write_cache_ops cache_op)
    {
    <snip>
        /*@todo Avoid bug where only one cache line is affected by the cache operation */
        const Bool workaround_final_cache_op_problem = ((UInt32) region + region_size_bytes) == 0;
        Void *const final_cache_line = (Void *) ((UInt32) region + region_size_bytes - Cache_sizeL1dCacheLine);
    <snip>
    
        case POST_WRITE_INV:
            if (workaround_final_cache_op_problem)
            {
                Cache_inv (region, region_size_bytes - Cache_sizeL1dCacheLine, Cache_Type_ALL, TRUE);
                Cache_inv (final_cache_line, Cache_sizeL1dCacheLine, Cache_Type_ALL, TRUE);
            }
            else
            {
                Cache_inv (region, region_size_bytes, Cache_Type_ALL, TRUE);
            }
            break;
    
        case POST_WRITE_WB:
            if (workaround_final_cache_op_problem)
            {
                Cache_wb (region, region_size_bytes - Cache_sizeL1dCacheLine, Cache_Type_ALL, TRUE);
                Cache_wb (final_cache_line, Cache_sizeL1dCacheLine, Cache_Type_ALL, TRUE);
            }
            else
            {
                Cache_wb (region, region_size_bytes, Cache_Type_ALL, TRUE);
            }
            break;
    

    When the above is run it reports that Cache_inv and Cache_wb perform data cache Write-Back - Cleaning and coherency on 510 out of the total 512 L1D cache lines and on 65534 out of the total 65536 L2 cache lines.

    Starting test of memory region 0x80000000 ... 0xffffffff
    Tested memory region 0x80000000 ... 0xffffffff using cache op Cache_inv, with 0 readback failures
    PMU counter changes during Cache op:
      Level 1 data cache Write-Back - Victim = 0
      Level 1 data cache Write-Back - Cleaning and coherency = 510
      Level 1 data cache invalidate = 510
      Level 2 data cache Write-Back - Victim = 0
      Level 2 data cache Write-Back - Cleaning and coherency = 65534
      Level 2 data cache invalidate = 0
    <snip>
    Tested memory region 0x80000000 ... 0xffffffff using cache op Cache_wb, with 0 readback failures
    PMU counter changes during Cache op:
      Level 1 data cache Write-Back - Victim = 1
      Level 1 data cache Write-Back - Cleaning and coherency = 510
      Level 1 data cache invalidate = 0
      Level 2 data cache Write-Back - Victim = 0
      Level 2 data cache Write-Back - Cleaning and coherency = 65534
      Level 2 data cache invalidate = 0

    If the program is changed to remove the work-around, by setting workaround_final_cache_op_problem to FALSE, then when the program is run it reports that Cache_inv and Cache_wb are only operating on one cache line:

    Tested memory region 0x80000000 ... 0xffffffff using cache op Cache_inv, with 0 readback failures
    PMU counter changes during Cache op:
      Level 1 data cache Write-Back - Victim = 1
      Level 1 data cache Write-Back - Cleaning and coherency = 1
      Level 1 data cache invalidate = 1
      Level 2 data cache Write-Back - Victim = 0
      Level 2 data cache Write-Back - Cleaning and coherency = 1
      Level 2 data cache invalidate = 0
    <snip>
    Tested memory region 0x80000000 ... 0xffffffff using cache op Cache_wb, with 0 readback failures
    PMU counter changes during Cache op:
      Level 1 data cache Write-Back - Victim = 0
      Level 1 data cache Write-Back - Cleaning and coherency = 1
      Level 1 data cache invalidate = 0
      Level 2 data cache Write-Back - Victim = 0
      Level 2 data cache Write-Back - Cleaning and coherency = 1
      Level 2 data cache invalidate = 0

    With the program in this state you should be able to step-into the Cache_wb() and Cache_wb() functions called with blockPtr = 0x80000000 and byteCnt = 0x80000000 and see that a cache maintenance operation is only performed on one virtual address.

  • Hi Chester,

    It looks like you asked a related question that Ashish answered:

    e2e.ti.com/.../544363

    I'm just wondering if your question was answered by that thread? Does the system reset suggestion resolve this problem?

    If not, just let me know and we can continue to work this out.

    Otherwise, please mark this thread as "answered."

    Thanks,

    Steve
  • Steven Connell said:
    I'm just wondering if your question was answered by that thread? Does the system reset suggestion resolve this problem?

    Sorry if my previous posts were confusing, but the system reset suggestion doesn't resolve the intended query for this thread.

    The purpose of this thread was just to report that when I created a test program which made either of the following calls I expected the requested cache maintenance operations to be performed on all the virtual address space from 0x80000000 ... 0xffffffff, where the cache maintenance operations were performed on virtual addresses which incremented by the cache line size of 64 bytes:

    Cache_inv (0x80000000, 0x80000000, Cache_Type_ALL, TRUE);
    Cache_wb (0x80000000, 0x80000000, Cache_Type_ALL, TRUE);

    However, as of the ti.sysbios.family.arm.a15.Cache module in SYS/BIOS 6.45.1.29 the actual result was that cache maintenance operation was only performed on the initial virtual address of 0x80000000, due to the specific combination of the arguments blockPtr = 0x80000000 and byteCnt = 0x8000000.

    This is not currently causing me a problem, but was just wondering if the difference between my expected behaviour and the actual behaviour was considered a bug or not.

  • Hi Chester,

    Ok, I spoke to my colleague about this and indeed it does look like a problem. Basically and edge case that wasn't caught.

    He is looking into it and determining what should be done about it, so I will have to get back to you once I know more.

    As you said, you're not held up by this, and in any case you have a workaround, so I think you should be OK.

    Steve
  • Hi Chester,

    I filed a bug against SYS/BIOS to fix the logic to determine the cache lines that need to be maintained.

    Looking at the start address and size you are using, it looks like you want to write-back invalidate the entire cache. I would suggest using Cache_wbInvAll() API for this purpose. It does not take any arguments and is much more efficient for doing a maintenance operation on entire cache.

    Best,
    Ashish
  • Ashish Kapania said:
    I filed a bug against SYS/BIOS to fix the logic to determine the cache lines that need to be maintained.

    Thanks for that.

    Ashish Kapania said:
    Looking at the start address and size you are using, it looks like you want to write-back invalidate the entire cache. I would suggest using Cache_wbInvAll() API for this purpose

    The program was a test using:

    - Cache_invL1dAll : Which performs an invalidate by set/way

    - Cache_inv : Which performs an invalidate by Modified Virtual Address

    - Cache_wbInvAll : Which performs a clean and invalidate by set/way

    - Cache_wb : Which performs a write back by Modified Virtual Address

    This was to aid my understanding of Why does the Cache_inv() API flush the cache on Cortex-A15 ?