Hello Ti,
We are working on
AM243x
The latest SDK : 09.00.00 on PCIE.
Based on the example pcie msi interrupt,
- The EP initializes one big buffer (256kB)
- The RC reads the big buffer 16B by 16B and copies its own memory.
We observe some inconsistent results for measuring the latency :
In the first case :
#define READ_BUF_SIZE 4
“memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, READ_BUF_SIZE * sizeof(uint32_t));”
the result is 8 us
In the second case :
uint32_t l = 4;
memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, l * sizeof(uint32_t));
the latency result is 20us
We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes. We would like to understand :
- When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?