This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello Ti,
We are working on
AM243x
The latest SDK : 09.00.00 on PCIE.
Based on the example pcie msi interrupt,
We observe some inconsistent results for measuring the latency :
In the first case :
#define READ_BUF_SIZE 4
“memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, READ_BUF_SIZE * sizeof(uint32_t));”
the result is 8 us
In the second case :
uint32_t l = 4;
memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, l * sizeof(uint32_t));
the latency result is 20us
We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes. We would like to understand :
Hello Tianyi,
We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes.
If you have optimizations turned on then that is expected because the compiler is going to be able to better optimize using a static variable that is hard coded compared to a local variable that can be dynamically changed.
When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?
I don't have that level of detail about how the PCIe controller is operating and I don't really see how one versus the other would impact the latency when the change presented is going from hard coded to local variable?
Best Regards,
Ralph Jacobi