This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5748: GMAC Rx pacing algorithm

Part Number: AM5748

Hello All,

I did some measurements with the RX pacing on the GMAC eth0 interface.

As input (rx data) we have two 2.4kHz streams and one 1.2kHz stream.

That means we receive in a 1ms time frame between 5 and 7 packets.

Rx pacing is set to 500us:

root@cpm:~# ethtool -c eth0

Coalesce parameters for eth0:

Adaptive RX: off  TX: off

stats-block-usecs: 0

sample-interval: 0

pkt-rate-low: 0

pkt-rate-high: 0

 

rx-usecs: 500

rx-frames: 0

rx-usecs-irq: 0

rx-frames-irq: 0

 

tx-usecs: 0

tx-frames: 0

tx-usecs-irq: 0

tx-frames-irq: 0

 

rx-usecs-low: 0

rx-frame-low: 0

tx-usecs-low: 0

tx-frame-low: 0

 

rx-usecs-high: 0

rx-frame-high: 0

tx-usecs-high: 0

tx-frame-high: 0

  

I did a measurement with LTTNG. On the Trace Compass Resource diagram screenshot I added the time between the eth0 interrupts (IRQ/86, numbers are in us).

Could you explain why we see this three different gaps between the IRQ/86 interrupts (200us, 400us and 1000us)?

It seems that we have a dynamic setting of the Rx pacing during run time, even though I have set the pacing to 500.

Could you explain more in detail the algorithm which is used.

Best regards,

Andreas

  • Hello Andreas,

    I'm sorry, our expert in this area is out this week and will not be able to reply until early next week. I apologize for the delay.

  • Hi early next week is over and we have not yet a response or explanation to the topic we have reported.

    Stephan

  • Hi,

    We apologize for the delay, this question requires us to involve other teams. Which TI SDK version are you using? Also are you using the RT or non RT version of the kernel.

    Best Regards,

    Schuyler 

  • Hello,

    The Linux image we are using is the TI SDK 5.3.

    We are using the RT Patch.

    Best Regards

    Stephan

  • Hi,

    The reason for the different gaps that you are seeing is that with the CPSW driver NAPI scheduling (polling) is being used to schedule RX processing. There are two different methods to schedule RX packet processing, interrupt based and NAPI scheduling. 

    When using the NAPI scheduler the interrupts do not directly trigger RX processing. The NAPI scheduler will do some interrupt pacing on its own as it is designed to increase throughput and reduce overhead at the cost of packet processing latency.  So using interrupt pacing while using NAPI is kind of a conflict in some respects as both approaches are trying to achieve the same result of reducing Interrupt processing overhead. 

    Whether NAPI scheduling is used or enabling interrupt processing depends on what the application is trying to do. If the application requires high network throughput then NAPI scheduling should be used. This is what TI uses as a default and tests. If the application requires a more deterministic processing and there is a low network bandwidth requirement then interrupt processing could be used. Interrupt pacing would work better in interrupt processing mode. TI does not test this currently.

    Since you were looking at the processing time could you describe what your application or perhaps the network processing requirements? It looks like you are trying to get some determinism in packet processing. Are you leveraging anything in RT to assist with scheduling?

    Best Regards,

    Schuyler

  • Hi,

    we want to reduce the GMAC interrupt load caused by packages received.

    On the other hand we have to take care not to introduce a too high latency to the packages received.

     

    Important is that the max latency introduced by this mechanism must be predictable.

    That’s why we have chosen coalescing (set to 500 us).

    With this we expected to have a pedictable max latency of the packages received of 500 us.

    After looking at it we have the following question:

    • Can we disable NAPI on the GMAC driver by a setting of configuration option?
    • Can we change the NAPI setting from NAPI scheduling to interrupt processing mode?
    • What would be your recommended setting for our use case?

    BR

    Stephan Gerspach

  • Hi,

    I am out of the office until the 4th. 

    I will look into the questions that you are asking with more detail when I return. It is possible to switch to interrupt processing vs. the NAPI polling. TI only currently tests NAPI though on the CPSW ports so at the moment we do not have any recommendations we can share.

    Do you have a network bandwidth you need to achieve? Will you be using the PREEMPT RT kernel? 

    Best Regards,

    Schuyler

  • Hi Schuyler,

    We have the GMAC connected to an ethernet, where we receive Sampled Data.

    We have several data providerson the ethernet (GMAC) each of them sending frames with a rate of 2.4 kHz (one frame every 208 us)

    The data providers are in synch, so we get the frames more or less at the same point in time.

    With the coalescing we wanted to achieve that not each individual package will create an interrupt we wanted to batch them to reduce the interrupt load on the CPU caused by the frames received.

    The date we receive on the GMAC we will process, repackage and send them out again on the PRU's. For the whole chain we have a limited time budget of < 2 ms. Therefore we have to take care for the latency introduced by the coalescing and have to balance the reduced interrupt load against the latency introduced.

    Important is that the max latency introduced by this mechanism must be predictable.

    That’s why we have chosen coalescing (set to 500 us).

    Here we would expect that each frame received is delayed max 500 us, but we see some outliers with around 1 ms.

    Those outliers we must avoid.

    Yes we are using the PREEMPT RT kernel.

  • Hi,

    Thank you for the description of the network traffic description.

    I was incorrect in the earlier post, the Linux CPSW driver cannot be switched from NAPI to RX interrupt only out as a kernel configuration. There are some tuning options that maybe usable to perhaps move towards your goal but that will have to be determined.

    There are two approaches TI can suggest. The first is attempting to remove the impact of the NAPI scheduler:

    Experiment with these sysfs options for setting NAPI budgets:

    net.core.dev_weight = 64 - max number of packets that kernel can handle on a NAPI interrupt

    net.core.dev_weight_rx_bias = 1 - influences the proportion of the netdev_budget spent on RPS based packet processing during RX softirq cycles

    net.core.dev_weight_tx_bias = 1 - max number of packets processed during a TX softirq cycle

    net.core.netdev_budget = 300 - Max number of packets taken from all interfaces in one polling cycle (NAPI poll).

    net.core.netdev_budget_usecs = 2000 - Maximum number of microseconds in one NAPI polling cycle.

    The values above are the default values. After looking at these parameters the netdev_budget might be increased based on the packet processing you described. The NAPI period may be ending before all the packets are processed. These are probably not the only values to be adjusted to meet your requirements.

    Also for now when experimenting do not enable rx interrupt pacing.

    The other approach is to enable sysctl parameters busy_poll and busy_read. These are used to lower the latency of waiting for events or packets on an interface. The units are in uS and the recommendation is to start with 50. I have not seen recommendations for setting the value to 500uS which is your current goal. Kernel documentation mentions that sockets being used should use the SO_BUSY_POLL option. 

    You may need to experiment with both approaches. Were there packet outliers before trying interrupt pacing? And was there processing load concerns before pacing was tried? 

    Best Regards,

    Schuyler

  • The ISR Pacing block is an adaptive interrupt pacing block. That is it will adapt the interrupt rate to the desired rate by adjusting the delays based on the current interrupt load. Based on thresholds it may saturate, double, increment, decrement or half the current interrupt delay steps as to adapt the interrupt delay to meet the desired rate. The thresholds are generated from the programmed desired rate. The delay steps are the 4uS steps defined by the prescaler. This method allows the selected ISR to correct the interrupt load very quickly and target the desired ISR rates while keeping the latency to the first events extremely low. The adaptation updates the rate every 250 4uS tics based on the current rate. In the event that the current interrupt rate is greater than two times the desired rate prior to the next update, the full saturation delay will be added and the 250 4uS tics will be restarted. So if you programmed the rate to be 5K, but an attempt of 20K occurred, you would see 10 interrupts followed by 1024uS delay due to the saturation case, then quickly asymptotes to 5 interrupts per millisecond.

     

  • PS: The lower the programmed rate, the courser the correction. The higher the programed rate the finer the correction.

  • Hi,

    I am closing the thread for the moment. If you have additional questions you open the thread again with a reply.

    Best Regards,

    Schuyler