AM2434: PCIe

Tianyi Liu

Intellectual 590 points

Part Number: AM2434

Hello Ti,

We are working on

AM243x

The latest SDK : 09.00.00 on PCIE.

Based on the example pcie msi interrupt,

The EP initializes one big buffer (256kB)
The RC reads the big buffer 16B by 16B and copies its own memory.

We observe some inconsistent results for measuring the latency :

In the first case :

#define READ_BUF_SIZE 4

“memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, READ_BUF_SIZE * sizeof(uint32_t));”

the result is 8 us

In the second case :

uint32_t l = 4;

memcpy(rc_sram_buff, transBufAddr + sizeof(uint32_t) * k * READ_BUF_SIZE, l * sizeof(uint32_t));

the latency result is 20us

We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes. We would like to understand :

When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?

over 2 years ago

0 Ralph Jacobi over 2 years ago

TI__Guru*** 135355 points

Hello Tianyi,

Tianyi Liu said:
We checkout the ASM memcpy during debug step by step and we saw there are some difference in ASM between both of the codes.

If you have optimizations turned on then that is expected because the compiler is going to be able to better optimize using a static variable that is hard coded compared to a local variable that can be dynamically changed.

Tianyi Liu said:
When doing a memcpy from PCIE0_DAT0 region of 16Bytes, does the PCIE controller do 4 TLP read operations of 4B or does it do one TLP read operation of 16B ?

I don't have that level of detail about how the PCIe controller is operating and I don't really see how one versus the other would impact the latency when the change presented is going from hard coded to local variable?

Best Regards,

Ralph Jacobi

Arm-based microcontrollers

Arm-based microcontrollers forum

AM2434: PCIe