This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM2434: PCIe

Part Number: AM2434

Hello Ti,

We are working on

AM243x

The latest SDK : 09.00.00 on PCIE.

Based on the example pcie msi interrupt,

  • The EP initializes one big buffer (256kB)
  • The RC reads the big buffer 16B by 16B and copies its own memory.

We observe some inconsistent results for measuring the latency :

In the first case :

#define READ_BUF_SIZE 4

“memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, READ_BUF_SIZE * sizeof(uint32_t));”

the result is  8 us

In the second case :

uint32_t l = 4;

memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, l * sizeof(uint32_t));

the latency result is 20us

We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes. We would like to understand :

  • When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?
  • Hello Tianyi,

    We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes.

    If you have optimizations turned on then that is expected because the compiler is going to be able to better optimize using a static variable that is hard coded compared to a local variable that can be dynamically changed.

    When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?

    I don't have that level of detail about how the PCIe controller is operating and I don't really see how one versus the other would impact the latency when the change presented is going from hard coded to local variable?

    Best Regards,

    Ralph Jacobi