This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS: C6678 small UDP packet reception and transmission latency

Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

hi all,

I am trying to do a UDP traffic load benchmark for C6678. Found out that the result is quite disappointing. My benchmark shows that the DSP can receive and send around 12k to 15k packets in a second. The latency is about 700us for the DSP to receive a packet and send it out.  

my setup is

The host PC and the DSP is connected to a common switch.

I had written an udp echo client on a PC host to send UDP packets in size of 400 bytes. Due to the nature of the application we are trying to develop, the packet size is small and we need to send a lot of packets. On the DSP, I am running the ndk helloworld example in pdkc6678 which is essentially the echo server. On my host PC, I send a packet and wait for the packet to come back before sending another one. The PC host use the same socket for send and receive. The DSP uses the same socket for send and receive.

I am using mcsdk 2.1.2.6, ndk 2.21.2.43, MCSDK 6678 1.1.2.6 and CCS6.1.3

The board vendor's support package only support the versions of package i listed above.

Benchmark result:

My result shows that the DSP can receive and send around 12k to 20k packets per second. For my application, the DSP will receive at least 15k packets and send 15k packets at maximum load. As you can see, the benchmark shows that the DSP is almost max out when at max load and it leaves no cpu run time for doing anything else.

The reason why i believe the DSP is the bottleneck is:

In wireshark, i can see that it takes 700us for a packet to go to the DSP and come back. I think this delay is huge. In wireshark, i can also see that the PING packets latency is very small. Therefore, I don't think the Ethernet switch is causing the problem. It only takes my PC 70 us to receive a packet and send another one out. Theoretically, the DSP should not be slower than my PC.

I am wondering whether there is anyway i can optimize ndk and sysbios setting to reduce the latency. Thanks.

  • There is an error in my benchmark timing calculation. The latency is still 700us from wireshark. But the max speed I can achieve now is about 1.4k packets inbound and outbound.
  • The correct benchmark result is 1200 messages per second with the average round trip delta time of 840us. Again i think that's a huge latency at least for my application.
  • Hello Wei Chen91,

    What is maximum throughput you are getting when CPU is fully loaded? Also for UDP echo are you using NDK daemon or your own socket?

    Please check if below help

  • hi, i am not sure about the CPU load. But i don't think CPU capacity is a problem. The TI's benchmark of an old processor showed that driving the ethernet to max load only took 20% of cpu cycles. my throughput is about 1400(message per second)*500(bytes per message) = 7Mbps.
    I am using the network deamon echo example. And I checked out that post, it didn't have much information on what optimization he did to improve the throughput.

    My concern now is that sysbios is adding a lot of overhead doing socket operation when receive and transmit the packets. I used the light weight network stack provided by the board vendor(commagility). I was able to achieve 70Mbps and possibly higher using an echo server with the same packet size of 500 bytes. But in this case, there is no sysbios running on the processor however, we are sure that we are going to use sysbios to develop our application because we want to take advantage of features like IPC.
    I will try and post the configuration file sysbios and the echo server for DSP when I have access to my workstation in my lab. But as of right now, ndk buffers are all in the L2 cache.

    again, I think it's an SYSBIOS configuration issue. I was able to rule out that the ethernet switch was adding the delay by doing a direct connection between the board that the host PC.

    I will run some more tests on monday.
  • Hello Wei Chen91,

    Have you modified config file to increase NDK buffer sizes and no. of PBMs packets. The default buffer size for UDP is 8K. You can increase it upto 65K.

    http://processors.wiki.ti.com/index.php/NDK_Static_Internal_Memory_Manager

    http://processors.wiki.ti.com/index.php/NDK_Dynamic_Memory_Manager#UDP_sockets

    Also please share your sys bios configuration file.