TMS570LC4357: Unexpected Performance Monitoring Unit Data Cache Miss counter ?

Franck WARTEL

Intellectual 450 points

Part Number: TMS570LC4357

Reopening of Unexpected Performance Monitoring Unit Data Cache Miss counter discussion.

over 7 years ago

0 QJ Wang over 7 years ago

TI__Guru**** 196736 points

Hi Franck,

I am sorry that I could not answer you today. I will study your question and give you my comments as soon as I can. Thanks

0 QJ Wang over 7 years ago in reply to QJ Wang

TI__Guru**** 196736 points

Hi Franck,

I did a lot of tests today (use my own code rather than memset(..)) and saw the similar result:

1. bye write, byte read: 1MBytes

2. 32-bit word write and read

//write bytes from EMIF address
void str_bytes(uint32_t addr, uint32_t numBytes) {
asm("\tadd r1, r0, r1");
asm("\tmov r2, #0");
asm("copy_loop: ");
asm("\tstrb r2, [r0], #1");
asm("\tadd r2, r2, #1");
asm("\tcmp r0, r1");
asm("\tblt copy_loop");
}

//read bytes from EMIF address

void ldr_bytes(uint32_t addr, uint32_t numBytes) {

asm("\t add r1, r0, r1");
asm("copy_loop1: ");
asm("\t ldrb r3, [r0], #1");
asm("\t cmp r0, r1");
asm("\t blt copy_loop1");
}

The value of Data cache miss for both read and write are much less then the expected value. I don't know how the PMU counts the event.

0 QJ Wang over 7 years ago in reply to QJ Wang

TI__Guru**** 196736 points

The data cache miss is defined as:

Each data read from or write to normal Cacheable memory that causes a refill from the level 2 memory system generates this event. Each access to a cache line to normal Cacheable memory that causes a new linefill is counted.

I will do more investigation. Sorry for that.

0 Franck WARTEL over 7 years ago in reply to QJ Wang

Intellectual 450 points

HI QJ,

Thanks for the tests, for keeping this topic alive and for keeping me posted of your additional investigations.

Franck.

0 QJ Wang over 7 years ago in reply to Franck WARTEL

TI__Guru**** 196736 points

Thank you for your patience and understanding.

0 QJ Wang over 7 years ago in reply to QJ Wang

TI__Guru**** 196736 points

I will keep this thread open for further investigation

0 QJ Wang over 7 years ago in reply to QJ Wang

TI__Guru**** 196736 points

Hello Franck,

This is what we got from ARM regarding to the cache miss:

The PMU events are counting a high number of write instructions, a similar, slightly smaller, number of cache line evictions, but only a very few cache linefills.

I suspect the reason for these numbers is simply that the processor core only needs to generate this small number of linefills.

When the processor core is writing to a full cacheline then at first the processor core will trigger a single external linefill based on the first write instruction. The subsequent write instructions will fill up the store buffer and it is quite possible that the write instructions will write to the full cache line space before the linefill access returns from external memory. This means that this linefill data is no longer required, and can be discarded when it is returned from the memory system. Meanwhile, the instructions will start writing to the next cacheline location and will again trigger a linefill. If the external memory access time is sufficiently long, then eventually the Cortex-R5 core will have the maximum number of outstanding linefills possible and it will not be able to issue anymore until one of the outstanding linefills completes. When this happens the write instructions can continue to fill up the store buffer before issuing a new linefill. If the write instructions can fill up a full cacheline they can be added into the cacheline location without ever triggering an external linefill, so the cache location is updated without the need for a linefill access.

So, this is what I suspect is happening in this test, the Cortex-R5 does not need to issue a significant number of linefill accesses because the store instructions are filling up the cacheline locations without needing to issue a linefill.

0 Franck WARTEL over 7 years ago in reply to QJ Wang

Intellectual 450 points

Hi QJ,

Thanks for the clarification, which makes completely sense as my test is filling the cache lines in a completely linear way.

From the performance point of view indeed such an optimization behaves as a cache hit, as the line is marked dirty without the need to pay the penalty of the external memory access.

Thanks again for your support.

Best Regards,

Franck.

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: Unexpected Performance Monitoring Unit Data Cache Miss counter ?