This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377D: Data Cache Behavior When Reading from Flash Memory

Part Number: TMS320F28377D

I have a question regarding data cache behavior when copying data from flash memory to RAM.

**Environment:**
- CPU: TMS320F28377D
- Board: Custom PCB
- Clock frequency: 200MHz (5ns/cycle)
- Code section: .TI.ramfunc (executed from RAM)
- Flash wait states: 3 cycles
- Data cache: Enabled
- Prefetch: Enabled

**Test Performed:**
I measured the processing time by toggling GPIO13 while copying 128 bits of data from flash memory (Sector H: 0x0a1700) to RAM (LS5: 0x00ac00) using two different methods:

1. PREAD instruction: Reading 16-bit data 8 times
2. MOVL instruction: Reading 32-bit data 4 times

**Measurement Results:**
- PREAD: 170ns
- MOVL: 70ns
- Results were identical when reversing the execution order
PREAD_8times.PNG

**Test Code:**
        MOVW    DP,#_GpioDataRegs

; copy flash to RAM ( word * 8 )
        MOVL    XAR7,#0x0a1700              ; Source      (Flash Sector H)
        MOVL    XAR4,#0x00ac00              ; Destination (LS5 RAM)
        
        OR      @_GpioDataRegs+2,#0x2000    ; GPIO13=High
        
        RPT     #7
||      PREAD   *XAR4++,*XAR7
        
        OR      @_GpioDataRegs+4,#0x2000    ; GPIO13=Low
        
        NOP
        
; copy flash to RAM ( double word * 4 )
        MOVL    XAR7,#0x0a1700              ; Source      (Flash Sector H)
        MOVL    XAR4,#0x00ac00              ; Destination (LS5 RAM)
        
        OR      @_GpioDataRegs+2,#0x2000    ; GPIO13=High
        
L1:     MOVL    ACC, *XAR7++
        MOVL    *XAR4++, ACC
L2:     MOVL    ACC, *XAR7++
        MOVL    *XAR4++, ACC
L3:     MOVL    ACC, *XAR7++
        MOVL    *XAR4++, ACC
L4:     MOVL    ACC, *XAR7++
        MOVL    *XAR4++, ACC
        
        OR      @_GpioDataRegs+4,#0x2000    ; GPIO13=Low

**My Hypothesis:**
For MOVL instruction: The first read incurs wait cycles (3 cycles) for a total of 4 cycles, while subsequent reads hit the cache and take only 1 cycle each. Total: 5ns × (4+7) = 55ns
For PREAD instruction: The cache is not utilized, and wait cycles occur for every read. Total: 5ns × 4 × 8 = 160ns

**Questions:**
1. Is this hypothesis correct?
2. Why doesn't the PREAD instruction utilize the cache, even though both data cache and prefetch are enabled?
3. Is there any possibility that I'm misconfiguring something or using these instructions incorrectly?

Any insights would be greatly appreciated.

Thank you in advance for your help.

  • Hi,

    I apologize as the expert is currently out of office. Please expect a response sometime later next week. 

    Kind regards,
    AJ Favela 

  • **Questions:**
    1. Is this hypothesis correct?
    2. Why doesn't the PREAD instruction utilize the cache, even though both data cache and prefetch are enabled?
    3. Is there any possibility that I'm misconfiguring something or using these instructions incorrectly?

    This is correct, and is due to how the prefetch is implemented.  Only instruction fetches will utilize the pre-fetch mechanism, so even though a PREAD is using the program bus, it will bypass the prefetch buffer since it is not an instruction fetch.

    As for the data cache, it is only valid for data bus accesses, and since the PREAD forces the read from the program bus, it won't be used here either.

    Is there a particular reason you need to use a PREAD in your application?  Since the memory map is unified on the C28x device, flash data is available to read on both the program and data bus freely.  As you have noted the fetch using PREAD is only 16-bits, so it is not as flexible as data space accesses which support both 16 and 32 bit accesses.

    Best,

    Matthew

  • Dear Matthew,

    Thank you for your response.
    I now understand that when reading data from flash using PREAD, the program bus is used instead of the data bus, which is why the data cache doesn't operate.

    I was using the memcpy() function in my C source code to copy data from flash to RAM, but it took longer than I expected.
    When I investigated the cause, I found that PREAD was being used in the assembly listing, which is why I compared it with MOVL for performance improvement and posted my original question.

    Based on your explanation, I have follow-up questions:

    1. Does this mean that the data cache is not utilized for the following instructions that use the program bus?
    - PREAD loc16, *XAR7
    - PWRITE *XAR7, loc16
    - MAC P, loc16, *XAR7/++
    - DMAC ACC:P, loc32, *XAR7/++
    - IMACL P, loc32, *XAR7/++
    - QMACL P, loc32, *XAR7/++

    2. Are there any other cases where the data cache is not utilized?

    Thank you in advance for your help.

    Best regards,

  • Toru,

    Thanks for the background.  I wasn't aware the memcopy used the PREAD, but I can see the reasoning, to utilize the other bus so that we don't have constant bus stalls copying from one memory to another.  As you said, due to this, there will be some inefficiency due to lack of cache.

    The list you have made is also what I know will be impacted here, both the verbose PREAD/PWRITE, but also the MAC functions that utilize both the program and data bus to increase efficiency.

    Best,

    Matthew