This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM4376: ICSS-EMAC performance

Part Number: AM4376

Tool/software: Linux

Hi,

Background

  • Processor SDK Linux 5.0
  • 2xAM4376 connected over PRU ICSS-EMAC interfaces with external PHYs in Dual EMAC mode
  • Variable CPU loading with application code

The current developer's guide has an FAQ discussing throughput of the driver here which mentions ~94Mbps has been seen for both UDP and TCP connections. A previous debug guide for the ICSS EMAC LLD makes mention of the fact CPU loading would be the only cause of decreased throughput on the interface as the driver is tested during release to cope with maximum line rate.

Problem

While using the PRU-ICSS EMAC driver for sending and receiving data between AM4376 devices, throughput will drop when the A9 has an increased load from other application code. The LLD debug guide also makes mention of two potential workarounds - interrupt pacing and storm prevention. Each of these currently have a declared limitation of not being supported by Dual EMAC mode per the latest developer's guide.

When the A9 is unloaded, the ~94Mbps rate can be seen, but as the CPU is loaded with more and more processing the rate will drop significantly until ~25Mbps is seen with both TCP and UDP bi-directional connections. RX overflow errors are seen once the rate begins to drop, indicating the RX queues are not being serviced often enough.

Questions

  • The FAQ addressing throughput sets up a unidirectional test. Is the ICSS EMAC tested for a bidirectional connection at this same ~94Mbps rate?
  • Has any test data been taken with varying CPU loads for the ICSS EMAC performance?
  • Is interrupt pacing planned to be supported in future SDK releases for the ICSS Dual EMAC mode?
  • Is it possible to increase the RX queue size in prueth.c to give more time for the CPU to service the RX queue (as mentioned in the LLD debug guide here?

Thank you for any help or insight you may provide!

Best Regards,

Mark-

  • Hi,

    I am still checking into the questions you have listed. The latest guide shows RX Interrupt pacing is in the most recent SDK, but based on the documentation the feature only looks to be available in HSR/PRP mode. I may not be following what you are describing though.

    downloads.ti.com/.../Industrial_Protocols_HSR_PRP.html

    Can you provide background for the data throughput requirements of the use case? For example is this application an intended end point or a gateway? Is the application network data throughput dependent?

    To answer one question concerning testing is that there has not been any by TI with varying CPU loads and ICSS EMAC performance since this would be an application specific question.

    Based on what I have discussed so far with other team members is that changing the queue sizes does not appear to be supported.

    Best Regards,
    Schuyler
  • Hi,

    Our application use case is more of a gateway. 

    In our custom design we have implemented two independent ethernet ports. 1 port is of the embedded 3 port switch (RMII) on the AM4376 and the other port is via the PRU (MII). Data is intended to come through one of the ethernet ports and then passed after processing (packet inspections) through the other port and vise versa.

    Since we are running both ports at 100Mbps we expected both ports will support the max speed without any packets loss simultaneously however we are experiencing packets loss with the following test scienario

    Test setup:

    Ethernet port 1 (RMII): Iperf UDP @ 30Mbps bi-directional to another AM4376

    Ethernet port 2 (PRU): Iperf UDP @ 30Mbps bi-directional to another AM4376

    HW configuration: CPU core speed 800MHz, single 16 bit DDR3L @ 400MHz

    Is there any recommendation to get higher UDP throughput on both ports with no packet loss?

    regards

    Khalid

  • Hi,
    The PRU eth driver is designed currently to handle packets in a low latency manner and a low throughput. This pru port is geared more towards low bit rate industrial protocols. The CPSW driver is better at throughput.

    Since there are only 2 Ethernet ports listed, is there a reason why dual mac mode on the CPSW was not used here in the design?

    There might be some UDP streaming improvements, though the ones I recommend are typically around the CPSW.

    The first post described the packet loss, is that in the driver or network stack level? At the stack level there is buffering sizing that can be done.

    Could you please attach the results of ethtool -S for each port?

    The application you mentioned implies the packets are coming up to user space into a daemon, is this correct?

    Are you using SystemD for user space, is this a TI file system that you are working with?

    Best Regards,
    Schuyler