J784S4XEVM: PCIe EP/RC transfer speed performance slow in SDK 11

Part Number: J784S4XEVM

Tool/software:

Hello,

I tried to perform a PCIe EP/RC benchmark using two J784S4 evaluation boards, as described in the SDK 11 documentation for Linux : 3.2.2.11. PCIe End Point — Processor SDK Linux for J784s4 Documentation

Here are my results:

PCEe link : Gen3 x 4 : 

LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <8us

LnkSta: Speed 8GT/s, Width x4 (downgraded)

Read test : 71 MB/s

Write test : 175 MB/s

Copy test : 69 MB/s

[  187.177484] pci_epf_test pci_epf_test.0: WRITE => Size: 102400 B, DMA: YES, Time: 0.000585615 s, Rate: 174858 KB/s

[  189.925869] pci_epf_test pci_epf_test.0: READ => Size: 102400 B, DMA: YES, Time: 0.001444180 s, Rate: 70905 KB/s

[  192.293888] pci_epf_test pci_epf_test.0: COPY => Size: 102400 B, DMA: YES, Time: 0.001486355 s, Rate: 68893 KB/s

A similar issue is also mentioned in this forum thread : TDA4VH-Q1: PCIe EP/RC transfer speed performance slow - Processors forum - Processors - TI E2E support forums

These results do not meet expectations. Has a solution been proposed ?

Regards,

Bruce

  • Hi Bruce, 

    The PCIe EP/RC is not the best application for performance testing. It is more of an example for testing out the functionality of PCIe. For testing performance, we would recommend fio. 

    Regardless, the speeds do seem to be a bit slow even for PCIe EP/RC. What does the setup look like in terms of hardware? Two J784S4 EVMs connected to J17 or J14 connector?

    Regards,

    Takuma 

  • Hi Takuma,

    My configuration includes two J784S4 connected to connector J14 (both of them).

    Do you have a reference protocol with FIO that I could test on my configuration?

    Regards,

    Bruce

  • Hi Bruce, 

    We test performance with SSD using FIO. The configuration for FIO is in the Performance Guide section of SDK docs: https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-j784s4/11_00_00_08/exports/docs/devices/J7_Family/linux/Release_Specific_Performance_Guide.html#pcie-driver

    In terms of performance numbers, below are what I saw on J784S4 on the J14 x4 lane and J17 x2 lanes respectively by connecting two SSD cards. Commands were ran simultaneously to show parallel transaction.

    /run/media/nvme0n1/test-pcie-1: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=4
    fio-3.17-dirty
    Starting 1 process
    
    /run/media/nvme0n1/test-pcie-1: (groupid=0, jobs=1): err= 0: pid=1383: Mon Mar 13 16:09:44 2023
      read: IOPS=605, BW=2421MiB/s (2538MB/s)(142GiB/60006msec)
        slat (usec): min=40, max=1994, avg=58.59, stdev=28.59
        clat (usec): min=6122, max=10257, avg=6548.78, stdev=120.19
         lat (usec): min=6181, max=12247, avg=6607.70, stdev=124.32
        clat percentiles (usec):
         |  1.00th=[ 6259],  5.00th=[ 6390], 10.00th=[ 6390], 20.00th=[ 6456],
         | 30.00th=[ 6456], 40.00th=[ 6521], 50.00th=[ 6521], 60.00th=[ 6587],
         | 70.00th=[ 6587], 80.00th=[ 6652], 90.00th=[ 6718], 95.00th=[ 6718],
         | 99.00th=[ 6783], 99.50th=[ 6849], 99.90th=[ 6915], 99.95th=[ 6980],
         | 99.99th=[ 9634]
       bw (  MiB/s): min= 2344, max= 2432, per=99.99%, avg=2420.41, stdev=10.49, samples=120
       iops        : min=  586, max=  608, avg=605.08, stdev= 2.62, samples=120
      lat (msec)   : 10=99.99%, 20=0.01%
      cpu          : usr=0.22%, sys=3.61%, ctx=36317, majf=0, minf=542
      IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=36312,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=2421MiB/s (2538MB/s), 2421MiB/s-2421MiB/s (2538MB/s-2538MB/s), io=142GiB (152GB), run=60006-60006msec
    
    Disk stats (read/write):
      nvme0n1: ios=144984/0, merge=0/0, ticks=873731/0, in_queue=873731, util=99.87%
    

    /run/media/nvme1n1/test-pcie-1: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=4
    fio-3.17-dirty
    Starting 1 process
    
    /run/media/nvme1n1/test-pcie-1: (groupid=0, jobs=1): err= 0: pid=1384: Mon Mar 13 16:09:44 2023
      read: IOPS=382, BW=1528MiB/s (1603MB/s)(89.6GiB/60010msec)
        slat (usec): min=67, max=2207, avg=86.61, stdev=33.12
        clat (usec): min=7181, max=44285, avg=10379.10, stdev=1179.60
         lat (usec): min=7267, max=46494, avg=10466.05, stdev=1186.68
        clat percentiles (usec):
         |  1.00th=[ 9896],  5.00th=[10028], 10.00th=[10028], 20.00th=[10028],
         | 30.00th=[10028], 40.00th=[10028], 50.00th=[10159], 60.00th=[10159],
         | 70.00th=[10159], 80.00th=[10159], 90.00th=[10159], 95.00th=[13435],
         | 99.00th=[15664], 99.50th=[16057], 99.90th=[16581], 99.95th=[17171],
         | 99.99th=[40109]
       bw (  MiB/s): min= 1168, max= 1576, per=99.99%, avg=1528.25, stdev=95.43, samples=120
       iops        : min=  292, max=  394, avg=382.06, stdev=23.86, samples=120
      lat (msec)   : 10=3.48%, 20=96.50%, 50=0.02%
      cpu          : usr=0.17%, sys=3.28%, ctx=22933, majf=0, minf=543
      IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued rwts: total=22929,0,0,0 short=0,0,0,0 dropped=0,0,0,0
         latency   : target=0, window=0, percentile=100.00%, depth=4
    
    Run status group 0 (all jobs):
       READ: bw=1528MiB/s (1603MB/s), 1528MiB/s-1528MiB/s (1603MB/s-1603MB/s), io=89.6GiB (96.2GB), run=60010-60010msec
    
    Disk stats (read/write):
      nvme1n1: ios=366040/0, merge=0/0, ticks=3352351/0, in_queue=3352351, util=99.87%
    

    Regards,

    Takuma

  • Hi Takuma,

    Thank you for your results. 

    What interests me more is my performance when transferring J784S4 to J784S4 in Gen3 x 4. I get really poor performance in both SDK11 and SDK9. Do you get the same performance as me with the same configuration ?

    Regards,

    Bruce

  • Hi Bruce,

    I will need two days to get obtain an extra J784S4. I will test it out on the EVM boards to see if I can replicate the observed numbers.

    Regards,

    Takuma