AM2431: Ethernet performance

Larry Chen

Part Number: AM2431
Other Parts Discussed in Thread: SYSCONFIG

Tool/software:

Hi all TI experts,

https://software-dl.ti.com/mcu-plus-sdk/esd/AM243X/latest/exports/docs/api_guide_am243x/enetlld_performance.html.

According to the link above, when testing with iperf using the AM243-LP, with a data length of 1470 bytes, the best case RX speed can reach up to 110 Mbps. Does this mean that although the AM243 supports gigabit Ethernet speeds, the MCU itself cannot process gigabit data within one second?

Best regards,

Larry

over 1 year ago

0 Ashwani Goel over 1 year ago

TI__Mastermind 27360 points

Hi Larry,

Thanks for your query.

Looks like you are working on latest AM243x MCU+SDK.

Which port ICSSG or CPSW are you working on?

The examples in SDK do not have project settings to get optimal performance.

Here are some points the customer can work on to optimize performance as per use case.

Add more packet buffers
Stack placement
Packet buffer placement
Overall memory placement

Let me know your use case or end goal you are trying to achieve?

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Thanks for your reply,

Which port ICSSG or CPSW are you working on?

I am using CPSW port 1.

In my application, a server sends raw Ethernet packets to the AM2431. Each packet is 1461 bytes, with a packet interval of about 5 microseconds. The maximum amount is 8640 packets per cycle, followed by a 20ms wait before sending again. All packets are broadcast.

I have tried adding more packet buffers, but there is a limitation in syscfg where the packet pool size can only go up to 192, which is far from my goal.

Could you explain the rest of your options?

Stack placement
Packet buffer placement
Overall memory placement

Thanks you.

Best regards,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
All packets are broadcast

You are using application / example Debug or release mode?

You are using port in Switch mode or MAC mode?

Packet coming to board have VLAN and priority enabled?

If yes, what are the packet with what priority?

If your case have bursty traffic. Can you try shaping the traffic reaching to HOST by using below IOCTL?

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

You are using application / example Debug or release mode?

I tested in both Debug and Release modes, but I didn't see any improvement.

You are using port in Switch mode or MAC mode?

I am using MAC mode.

Packet coming to board have VLAN and priority enabled?

No, I didn't enable VLAN and priority. My application needs to receive all packets and parse their content to decide whether to bypass or process them.

If your case have bursty traffic. Can you try shaping the traffic reaching to HOST by using below IOCTL?

I am wondering, since my application transfers raw Ethernet packets, will using this function cause my packets to drop or stay in some buffer?

Best regards,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Thanks Lary,

Larry Chen said:
Each packet is 1461 bytes, with a packet interval of about 5 microseconds. The maximum amount is 8640 packets per cycle, followed by a 20ms wait before sending again. All packets are broadcast.

So, you are working on CPSW ports, with UDP Broadcast frames + DUAL MAC mode.

Are you getting same results with unicast frames (directed to HOST) ?

Larry Chen said:
Could you explain the rest of your options?

Stack placement

Packet buffer placement

Overall memory placement

As you are using AM234x-LP, I am assuming that you have everything in MSRAM only.

Meanwhile, I will internally check if we can suggest you something to get better performance as compared to AM243x MCU+ SDK: Ethernet Performance (ti.com).

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Sorry for the late replay,

So, you are working on CPSW ports, with UDP Broadcast frames + DUAL MAC mode.

Actually, I'm not using UDP; my application works with raw Ethernet packets.

Are you getting same results with unicast frames (directed to HOST) ?

In unicast mode, everything works perfectly because the load isn’t as heavy as in broadcast mode. It only receives the data it needs, but I still need broadcast mode to work.

Best regards,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
In unicast mode, everything works perfectly

Thanks for update Larry.

You should get same performance for unicast switching as well.

But, broadcast frames need to switch to another port as well as consume by host. So, overall performance reduced.

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

But I'm using an external Ethernet switch, so my AM2431 doesn't need to handle switching frames between ports. It only needs to receive frames from one port and, if needed, send frames to that port. Does this still consume a lot of processing time?

Best regards.

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
external Ethernet switch

Okay, So your setup is:

AM243x<=> External-Switch <=> PC

PC is sending broadcast frames.

So, my assumption is that all fraems are going to AM243x-Port as well.

Which is forwarding to another port of AM243x as well as Host port.

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Let me explain the details. My AM2431 uses the RGMII-1 interface connected to my external Ethernet switch IC, so all frames communicate through this port.

According to your description, even if I don’t try to send frames to another port, it will automatically send to port 2. Will this still consume MCU processing time? And even if port 2 isn’t connected, will the MCU still spend time trying to receive frames from it?

Best regards,

Larry

0 Ashwani Goel over 1 year ago in reply to Ashwani Goel

TI__Mastermind 27360 points

Ashwani Goel said:
AM243x<=> External-Switch <=> PC

PC is sending broadcast frames.

Can you confirm your setup?

What is connected on Port-1, Port-2?

Can you please share a block diagram for better understanding?

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

What is connected on Port-1, Port-2?

I am using the AM2431 connected to an external Ethernet switch IC via the RGMII interface. According to the syscfg below, I am using RGMII 1 for CPSW, so I believe my Ethernet switch is connected to Port 1, while Port 2 is not being used.

Can you please share a block diagram for better understanding?

Best regard,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Hi Larry,

Let me review it and get back to you by next week.

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Is there any new progress on this?

Best regard,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Hi Larry,

Thanks for your patience.

I discussed this internally, but not clear do you really need BC frames on single MAC port per Product.

PC is sending specific frames, need to be consumed by specific AM243x?

Product-1-CPSW-Port is receiving and reverting back same frame which will go to next Product-2-CPSW-Port as well. Is it intentional?

If not, can you use MC frame and make CPSW-ALE entry to consume specific packet by specific Product (HOST).

Local (HOST) Rx performance will be affected if you use BC packets as it will increase continuous parallel local Rx/Tx processing.

Larry Chen said:
even if I don’t try to send frames to another port, it will automatically send to port 2

If you disable another port in sysconfig, then NO forwarding to another port.

Larry Chen said:
although the AM243 supports gigabit Ethernet speeds

Basically, CPSW IP supports 1G for switching to another port, not HOST Rx.

Larry Chen said:
the best case RX speed can reach up to 110 Mbps.

We are working to improve our local Rx/Tx performance benchmarking number.

Regards

Ashwani

0 Larry Chen over 1 year ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

PC is sending specific frames, need to be consumed by specific AM243x?

The PC, which acts as the controller, sends different frames for various purposes. I can send MAC frames, and it works very well, but one of the conditions requires sending a broadcast frame for better efficiency, so I am trying to implement that.

Product-1-CPSW-Port is receiving and reverting back same frame which will go to next Product-2-CPSW-Port as well. Is it intentional?

When the Ethernet IC in Product-1 receives a frame, it automatically transfers it to the AM2431 host port and to Product-2. It doesn't require AM2431 involvement for this.

Basically, CPSW IP supports 1G for switching to another port, not HOST Rx.

I am curious about this: does it mean CPSW can only handle up to 1G when transferring frames to port 2, but not for the host port? The hardware interface is RGMII, and I thought this was a 1G speed interface. From my experiments so far, my assumption is that the AM2431 host port can receive and transfer frames at 1G speed, but it cannot handle 1Gbit of data in 1 second. Can you confirm this?

Best regard,

Larry

0 Ashwani Goel over 1 year ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
When the Ethernet IC in Product-1 receives a frame, it automatically transfers it to the AM2431 host port and to Product-2

So, for Porduct-1: you want BC frame to be consumed by AM243x. Don't need to send back to Ethernet switch of product-1. correct?

Larry Chen said:
It doesn't require AM2431 involvement for this.

What is the expectation from AM243x on Product-2? similar as for product-1?

Larry Chen said:
From my experiments so far, my assumption is that the AM2431 host port can receive and transfer frames at 1G speed, but it cannot handle 1Gbit of data in 1 second. Can you confirm this?

I need to check on this internally and get back to you.

Regards

Ashwani

0 Larry Chen 11 months ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

So, for Porduct-1: you want BC frame to be consumed by AM243x. Don't need to send back to Ethernet switch of product-1. correct?

Correct.

What is the expectation from AM243x on Product-2? similar as for product-1?

Product-2 is similar to Product-1. The BC frame will carry information that allows them to identify the captured content belonging to them. In our use case, a full system will have 10 to hundreds of products in a daisy chain.

Best regard,

Larry

0 Larry Chen 11 months ago in reply to Larry Chen

Prodigy 220 points

Hi Ashwani,

Is there any new progress on this?

Best regard,

Larry

0 Ashwani Goel 11 months ago in reply to Larry Chen

TI__Mastermind 27360 points

Hi Larry Chen ,

I was on vacation last week. So could not check on this.

Will get back to you by next week.

Regards

Ashwani

0 Ashwani Goel 11 months ago in reply to Ashwani Goel

TI__Mastermind 27360 points

Hi Larry Chen,

In summary, you have multiple AM243x-evm based custom boards in daisy chain.

PC<=> custom-board-1<=> custom-board-2 <=> custom-board-3

Custom board has AM243x-evm + Switch

Now, you are sending UC +BC frames from PC.

With UC frames you are getting expected performance on AM243x-evm.

You are facing performance issue when BC frames are also include in traffic?

Can you help me with test case, (how you are measuring throughput for UC and BC frames) ?

Where are you checking throughput on PC, switch or R5F-Host ?

Regards

Ashwani

0 Larry Chen 11 months ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Can you help me with test case, (how you are measuring throughput for UC and BC frames) ?

Sorry for the delayed response; I've been busy with other projects recently.
Our product primarily focuses on displaying images, so when a packet is received, it is parsed and the image content is displayed. Based on the display status of the image, I can easily determine if there are any transmission issues.

In addition, I conducted another experiment: upon receiving a packet, I set a GPIO pin to high, and after processing, I set the GPIO pin to low. By observing this GPIO pin through a logic analyzer (LA), it became apparent that, according to the Ethernet buffer size I configured, each image packet is 1518 bytes. At regular intervals, I could see the number of times the GPIO pin was set high corresponded to the Ethernet buffer size divided by 1518.

This indicates that when the task checks the DMA buffer, it only processes packets that the DMA buffer can receive. However, when my PC sends out broadcast (BC) frames, the total number is significantly larger than the maximum size the DMA buffer can handle. Therefore, I concluded that the AM243 might not be able to meet our requirements for processing a large volume of data efficiently.

Where are you checking throughput on PC, switch or R5F-Host ?

In the PC, I used Wireshark to monitor the packets being sent and received.

Best regard,

Larry

0 Ashwani Goel 11 months ago in reply to Larry Chen

TI__Mastermind 27360 points

Thanks Larry,

As per your use case, what is the overall traffic rate expected to be handled by AM243x?

It seems layer traffic handling only and traffic rate also low. Should be possible with AM243x.

Are you in contact with some TI field team (FAE) ?

Default packet handling method is interrupt based.

you can switch to "batch processing" with "polling method" instead of interrupt.

You can also try increasing buffer count as well in example.syscfg.

Regards

Ashwani

0 Larry Chen 11 months ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

As per your use case, what is the overall traffic rate expected to be handled by AM243x?

Our product requires 13,824 bytes per image, with one frame sent approximately every 30 ms. This results in 33 frames per second. For a larger system estimation with a total of 200 modules, the calculation is as follows:

13,824 * 8 * 33 * 200 = 729,907,200 bits/sec

In BC mode, each module would need to handle 729 Mbps of data. This is the performance level I aim to achieve.

Are you in contact with some TI field team (FAE) ?

I will try to contact with FAE in my side.

you can switch to "batch processing" with "polling method" instead of interrupt.

Can you please explain how to use "batch processing" with "polling method"?

I am no longer using interrupts to handle packets. Instead, I periodically check the Ethernet buffer using a timer, with an interval of approximately 1 ms.

You can also try increasing buffer count as well in example.syscfg.

I tried increasing the Large Pool Packet Count to its maximum value, but it was still insufficient to handle the load.

Best regard,

Larry

0 Ashwani Goel 11 months ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
each module would need to handle 729 Mbps of data. This is the performance level I aim to achieve.

And currently you are getting 110Mbps.

Currently, How much CPU (R5F core) is loaded ??

I will discuss this internally and get back to you.

Larry Chen said:
I will try to contact with FAE in my side.

Looking forward to sync with you.

Regards

Ashwani

0 Larry Chen 10 months ago in reply to Ashwani Goel

Prodigy 220 points

Hi Ashwani,

Currently, How much CPU (R5F core) is loaded ??

Because we are using AM-2431, we can only use one core.

Can you please explain how to use "batch processing" with "polling method"?

Best regard,

Larry

0 Ashwani Goel 10 months ago in reply to Larry Chen

TI__Mastermind 27360 points

Larry Chen said:
how to use "batch processing" with "polling method"?

For that you can refer LwIP networking examples in "C:\ti\mcu_plus_sdk_am64x_10_00_00_20\examples\networking\lwip" location.

Regards

Ashwani

Arm-based microcontrollers

Arm-based microcontrollers forum

AM2431: Ethernet performance