This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: PCIe performance TDA4VM to TDA4VM

Part Number: TDA4VM

Hello experts,

we have done a benchmark of PCIe communications between two TDA4VM evaluation boards 

we are in gen3, 2x lane width, we use the ARM cores to do the PCIe bring up and then the C7x cores to do the benchmark

the test is composed of a simple 'for' loop where we write data to the endpoint
we get this result, however we have trouble interpreting it.

--> do you have any idea why the peak is at 128 bytes?
--> more generally, is the graph shape as expected by TI experts ?

Best,
Clement

  • Clement, 

    Sorry for the delayed response. You will not achieve full PCIe bandwidth with CPU access, which is limited by C7 instruction width and cache fetch capability. In your result, you mentioned

        >>"the test is composed of a simple 'for' loop where we write data to the endpoint",

    So I assume the x-axis in your plot is the length of the inner loop? If so, this means you achieve the best cache alignment when it is fetching from the PCIe interface. note that the cache line size of C7 is 128 bytes. 

    To achieve the maximum bandwidth, you will need to use the DMA driver. there are some numbers published in the Linux SDK:

       https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-jacinto7/08_04_00_11/exports/docs/devices/J7/linux/Release_Specific_Performance_Guide.html#pcie-driver

     please take a look. 

    Jian

  • Thank you Jian

    The performance data are for a SSD with large data sizes, that's not our use case but thank you.

    Yes we plan to use DMA but it was more complex to use it. We also have done some tests from the A72 cores with and without DMA.

    I see the link with the cache line size now. But the TRM doc is unclear of the cache configuration by default.
    --> Are the L1D and L2 caches write-through or write-back by default ?

    if it is configured as write-back how do I make sure data is actually written to the end device ? (i.e. that the data don't just stay in cache)

    Thank you 
    Clement

  • Clement, 

    Could you let me know how you created your C7 project? you will need to check your MMU translation table and attributes for the PCIe space. It is likely defaults to non-cacheable memory. I checked the PDK, but did not see any C7 startup code. 

    Jian

  • it's a very basic C7x project, bare-metal, no MMU initialization code. Basically it's a "hello world' with PCIe accesses in a 'for' loop.

    the code is run after the A72 ARM cores boot up the evaluation board and initialises the PCIe interfaces.

    Clement

  •   do you mean that we should be using this kind of function call in the init part of the code

    (void)OsalMmuMap(0x40000000U, 0x40000000U, 0x20000000U, &attrs, isSecure);

    I found this in another example. But where can I find ressources on how to setup the MMU from the C7x cores ?

    Best,
    Clement