This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK performance on TMDSEVMC6678LE



I adjusted the NDK example client project to remove all the example network interfaces and to provide my own TCP sockets interface on a TMDSEVMC6678LE.  In order to benchmark the NDK performance the TCP socket interface opens a socket and waits for a request, when it gets the request it sends 102400 off 32 bit float values back over the socket.  This is repeated 100 times so that the interface transfers 40.96 MB.  A simple windows application times how long it takes from the first request until all the data has been returned.  While I appreciate that this will include some variations and inaccuracies from windows operating system delays, this is insignificant compared to the scale of the performance issues I am seeing.  The PC is connected directly to the EVM on GbE.

On the TMDSEVMC6678LE the transfers take in the region of 5.5 seconds, this is roughly 7.5 MB/s transfers or 60 Mb/s.  That seems way too low and is entirely out of keeping with the findings in http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/228388/806828.aspx#806828.  When I view the connection using Wireshark I can see the C6678 sending a few 1460 length packets consecutively a few microseconds apart, then an ACK from the PC a few microseconds after that, followed by a delay of 1 ms before the C6678 starts sending packets again.  Why such a delay?

When a similar test was  performed on a C6657 EVM using a simple NDK project built from scratch using XGCONF (see http://e2e.ti.com/support/embedded/tirtos/f/355/t/325841.aspx) the transfers take in the region of 2 seconds, this is roughly 20 MB/s transfers or 160 Mb/s.  Perhaps not earth shattering but in line with http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/228388/806828.aspx#806828 and acceptable, and about 2.5 times faster than the C6678!

Any ideas what is going on?  Why is the client project for the C6678 so much slower than the HUA project described in the link above?  Why is the client project for the C6678 so much slower than a simple NDK project built from scratch using XGCONF?  I am trying to use the client project as the basis of my own network interface but it would seem to have performance issues.

Any suggestions of things to try to improve the performance gratefully accepted.

  • Hi Will,


    Can you share which product versions are you using ?

    Best,

    Ashish

  • Hi Ashish,

    Thanks for the response.

    Sorry, I didn't think to put them in this post too, as per the linked post:

    Details of our development environment:

    TMDSEVM6678LE rev 3b

    CCS 5.4.0.00091

    MCSDK 2.1.2.6

    MCSDK PDK TMS320C6678 1.1.2.6

    NDK 2.22.3.20

    SYS/BIOS 6.35.1.29

    Regards,

     

    Will

  • Hi Will,

    there are some problems with the NIMU driver included in the MCSDK, which I already pointed out here:
    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/271367/960470.aspx#960470

    I don't understand why TI doesn't fix this.

    Ralf

  • Hi Ralf,

    Thanks for the post.  I very much appreciate the tips and the link to your previous work in this area.  I am clearly being naïve in expecting reasonable NDK performance out of the box on the C6678.  I was also naïve in expecting it to be simple to include the NDK in a simple SYS/BIOS project.  It beats me why this stuff is so poor on an EVM which you might think TI hope people will use to evaluate their products ... ah well.

    I will take a look at your suggestions next week and see if I can replicate your speed improvements.

    Thanks again,

     

    Will

  • Ralf and Will,

    Please provide more information like how you tested for NDK performance on your end.
    How you are differentiated with original NDK example(Client, Hello World) and your code(What are the modification done by you)?
    I like to reproduce your test scenario (share the test tool, setup and etc.,). So that will fix this if issue occurs.

  • Hi Pubesh,

    I stripped out the NDK example client project so that it starts a single sockets based server.  The server responds to a simple request packet of a few tens of bytes by returning a sequence of 100 packets each containing 400 k bytes of "data" - at this time just zeros.

    I developed a simple MS Windows console application that sends the simple request packet and consumes the data, timing how long it takes for this sequence from the send to the last data packet arrives.  I directly connected my PC's GbE port to the C6678 EVM.

    While I concede that this is a very simplistic measurement of performance (and after about 500 Mb/s useless as my MS Windows console application limits performance) it demonstrated that the performance out of the box of the NDK example client project is very poor.  I did not set out to test performance, I set out to develop my application but found the performance so obviously poor that I tweaked my test application to give me an indication of how poor.  I don't suggest this is the way to test performance ...

    I then followed Ralf's posts as above and dramatically improved performance.  From as little as 60 Mb/s to >500 Mb/s.  Thanks Ralf!