This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK performance

I have a performance problem with the NDK 1.93 stack running a TCP application task on DSP BIOS 5.32.02.

I have some results from the spraaq5 tests, using a Windows PC to talk with my target C6455/EMAC board which embeds a 1Gbit PHY. The PHY is correctly configured through EMAC/MDIO and is clearly in 1Gb mode. To summarise the test results, for a frame size of 8192 (and scaling for the other sizes), when spraaq5 runs at full rate (i.e. no sleep) a rate of about 20MB/sec is achieved - less than I would think from a 1Gb link but quite satisfactory; but when spraaq5 runs with any frame rate delay (I tried the given 10mS as well as 5mS and 1mS) the rate falls off to about 0.5MB/sec for any frame rate - this is unacceptable for my application. The spraaq5 test adequately simulates my multi-threaded application, so that the TCP stream is not continuous. 

Yet more information - It is some time since I worked with TCP, but recall Nagle holdoffs and have tested NDK configuration, NC_SystemOpen options and socket options (TCP_NOPUSH, SO_SNDBUF...). I have also tested the .tcf configuration for CACHE allocation and DSP BIOS task priorities to no avail. The runtime EMAC stats registers look good. EMAC interrupts arrive and are enqueued at the STKEVENT_ETHERNET fine enough. And we use CAT6 cabling appropriate for 1Gb ethernet.

Question: Could this be related to NDK or TCP, or to some other C6455 configuration like PRI_ALLOC? Or...? Many thanks in advance for any hints :-)

  • Achieving 20MB/sec is equivalent to 20kB/mSec. Therefore a delay of 1mS will reduce the throughput by 2.5 times 8KB (=20KB) packets, such that 8KB data should be sent at 1/(2.5 + 1)*20MB/sec, or about 5.7MB/sec. http://msdn.microsoft.com/en-us/library/ms686298(VS.85).aspx tells us sleep is prone to error in the actual delay [so calling sleep(1) may result in sleep(0..2)], so we can account for a further 50% error here, giving about 2.85MB/sec. The same link also reminds us tasks made ready to run will delay on the ready queue, so perhaps we can account for the remaining 80% performance drop in this and in IP stack-ness and NIC-ness.

    In a nutshell, I would conclude spraaq5 is widely inaccurate when using a Windows PC as tester. I would expect a DSP-to-DSP tester-to-testee would obtain about 5.7MB/sec with a 1mS frame-rate delay, were it to achieve about 20MB/sec without delays.I'd be interested to hear of any results.

    Yet more information -I fixed my application performance by uncovering a bug in a linked library. In my tests, where high data throughput is a feature, I found Nagle (IPPROTO_TCP TCP_NODELAY) had little effect; SO_SNDBUF/SO_RCVBUF had a measurable affect; but by far the most critical variable is good use of cache if external RAM is needed for NDK_MMBuffer and NDK_PacketMem.