This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK TCP buffer size and speed

Other Parts Discussed in Thread: SYSBIOS

Hi all,

I have CCS5 on a Linux host, EVM6678, XDS560v2STM, ndk_2_20_04_26 and mcsdk_2_00_05_17.

For my first steps in networking, I have modified the NDK's 'hello world'
example to use TCP instead of UDP. The purpose of the program is to receive image data
from a host PC and store it in a byte array for later processing.
(I can post the code here, but maybe it is not necessary).

The TCP transport works fine, but there are two strange problems:

(1) I have set the TCP buffer size to 8k:

    rc = 8192;
    CfgAddEntry( hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPRXLIMIT, CFG_ADDMODE_UNIQUE,
    sizeof(uint), (UINT8 *)&rc, 0 );
    
but if I try to send chunks of size > 1k from the host, the whole TCP communication
becomes very slow, irregular, sometimes stops completely for a while, and then
continues, or hangs forever. A packet size of 1k is the maximum that works.
(There are no other user task running on the DSP so far, just copying into the
image array).

(2) If I reload the program, I have to do a system reset before, otherwise
the DHCP communication fails (maybe because the PHY is already configured).
However, if I do so, the TCP communication becomes *very* slow afterwards.
Transporting a 2MB image takes about 30sec over an GBit network, no matter if
I use Debug or Release configuration. To get back to the normal speed,
I have to power-cycle the EVM and the debugger.

If you have nay idea about this behavior or need more information,
please let me know.

Thanks in advance,
Marcus

  • Hello Markus,

    Is there any help on this topic?  I am trying to send out an image UDP and having problems.  I see your post is TCP, but perhaps if TI suggested a setting it might be helpful for me too.

    Thanks,

    Brandy

  • Hi Brandy,

    Sorry, I got no reply on this posting so far, so the problem still exists. However, there are one or two new versions of NDK in the meantime, so I should try to run the program again.

    What is your problem? I also started using UDP, but it was too slow and unreliable for my purposes, so I tried to switch to TCP.

    Regards

    Marcus

  • I am trying to send 1000 packets (at 1424 bytes each) in 500ms.  Doesn't seem like it should be so hard.  We only use UDP in our system for speed becuase we have a private network we are not too worried about UDP packets getting lost.

    I am sure I have to change setting somewhere, I just got to find it :)  There doesn't seem to be a UDP Transmit buffer setting as far as i can tell.  Not to mention I am not even sure the configurations I set are being taken.  For instance, see this post:

    http://e2e.ti.com/support/embedded/bios/f/355/p/190592/685652.aspx#685652

     

    Anyhow, if I find anything amazing - I'll let you know :)

    Brandy

     

     

  • Marcus – did you try using newer version of NDK and MCSDK. The NIMU interface has had some updates which might resolve your issue.

    Brandy – On the other post the NDK expert has responded. Hope that resolves.

  • Hello Varada,

    Yes, Steve and Jack are working to determine why I am getting the ICMP when I have asked for them not to be set.  However, I am still not able to send the full amount of UDP packets that I need.

    If you have any advice on this, I would greatly appreciate it.  I suspect the ICMPs are part of the problem but I am not sure it is all of it.

    Thanks,

    Brandy

  • Hello,

    we see the same behavior as described in the original post above in our project. I tried different TCP buffer sizes, packet sizes (and also different task priorities for the network stack and the receive task). The best transmission rate I can see is 30 mbit/s. The connection is very unstable, often the TI stack breaks down after a few seconds. Only a full system reset can resurrect the ip stack.

    Also the HUA application that is factory flashed (c6678 evm) works not reliable here. In very seldom cases it works one times (after reset of developer machine and powercycle of target). Mostly it does not work at all.

    The cabling however works reliable at 400 mbit/s when I connect the (same) ethernet cable to a PC instead of my target DSP EVMs. When I start HUA demo in the debugger instead of from flash the same (bad) behavior as when started from the EVM. I have several EVM here, same behavior on all boards.

    (I wonder if I should try to remove the USB JTAG and measure again. Also I switched to static IP instead of DHCP - I can try to revert that).

    Question: Is anyone out there who is able to send TCP data to an C6678 EVM (running SysBIOS) with more than 30 mbit/s (and over a reliable connection that lives more than 5 minutes) ? What about UDP ?

    Thank you,
    Roelof Berg

    www.berg-solutions.de (for customer mevis.fraunhofer.de )

     

  • We are taking a note and I am trying to loop in the experts from the benchmarking team to chime in.

  • The expert asks :

    "How does user generate image UDP packet and TCP packet? Is image UDP packet size greater than MTU size (1500 byte)? "

  • One more suggestion about your system's cache settings. Can you please check this as well.

    " What are his L1 and L2 cache settings?  These could have an effect on performance"

  • Hello,

    thank you for the fast response. More information:

    - Recent versions of everything, C6678 EVM with onboard JTAG

    - Static IP

    - Application: Original HelloWorld Application but some TCP Code (from TI 'TCP/UDP v4 Echo Server' example) copied into it 

    - No demon mode, single socket, NC-Socket, blocking    (I wonder if copying (not NC) sockets would be faster)

    - Only tested TCP

    - TCP segment siztes below MTU as well as huge transfers (I pass 100 MB to the Windows OS and Windows sends it)

    - Raising threadprio of network stack had no or bad results

    - Raising TCP NC rcv buffer size helped to speed up from < 1 MBs to about 30 MBs.  CFGITEM_IP_SOCKTCPRXLIMIT=65535 helped. More seemed to have no effect.

    - Task priorities: Network Stack like in HelloWorld example, Receiver task:

    hSrv = TaskCreate( task_tcp_srv, "TCPSrv", OS_TASKPRINORM, 4096, LocalPort, Arg2, Arg3 );

    - Packet generator: Windows Winsock code (tried blocking and nonblocking versions). E.g. I pass 100 MB to the Windows OS, then (in blocking mode) Windows sends this buffer with 30 MBit/s speed to the target and returns.

     

    Result: Speed limited to 30 MBit/s. (Also connection not reliable. Dies meanwhile and ressurects afterwards unpredictably).

     

    Original .cfg file from HelloWorld (UDP-Echo) example (I hope all L1 and L2 settings you need are included ?):

    /*

    *   @file  helloWorld.cfg

    *

    *   @brief  

    *      Memory Map and Program intiializations for the HPDSP Utility.

    *

    */

    /********************************************************************************************************************

    *  Specify all needed RTSC MOudles and ocnfigure them. *

    ********************************************************************************************************************/

    var Memory  =   xdc.useModule('xdc.runtime.Memory');

    var BIOS    =   xdc.useModule('ti.sysbios.BIOS');

    var Task    =   xdc.useModule('ti.sysbios.knl.Task');

    var HeapBuf =   xdc.useModule('ti.sysbios.heaps.HeapBuf');

    var Log     =   xdc.useModule('xdc.runtime.Log');

    /*

    ** Allow storing of task names. By default if you name a task with a friendly display name it will not be saved

    ** to conserve RAM. This must be set to true to allow it. We use friendly names on the Task List display.

    */

    //Defaults.common$.namedInstance = true;

    Task.common$.namedInstance = true;

    var Clock   =   xdc.useModule ('ti.sysbios.knl.Clock');

    /*

    ** Interface with IPC. Depending on the version of BIOS you are using the

    ** module name may have changed.

    */

    /* Use this for pre BIOS 6.30 */

    /* var Sem     =   xdc.useModule ('ti.sysbios.ipc.Semaphore'); */

    /* Use this for BIOS 6.30 plus to get the IPC module */

    var Sem = xdc.useModule ('ti.sysbios.knl.Semaphore');

    var Hwi = xdc.useModule ('ti.sysbios.hal.Hwi');

    var Ecm = xdc.useModule ('ti.sysbios.family.c64p.EventCombiner');

    /*

    ** Configure this to turn on the CPU Load Module for BIOS.

    **

    */

    /*

    var Load       =   xdc.useModule('ti.sysbios.utils.Load');

    Load.common$.diags_USER4 = Diags.ALWAYS_ON;

    */

    var Diags       = xdc.useModule('xdc.runtime.Diags');

     

    /* Load the CSL package */

    var Csl     = xdc.useModule('ti.csl.Settings');

    /* Load the CPPI package */

    var Cppi                        =   xdc.loadPackage('ti.drv.cppi');    

    /* Load the QMSS package */

    var Qmss                        =   xdc.loadPackage('ti.drv.qmss');

    /* Load the PA package */

    var Pa = xdc.useModule('ti.drv.pa.Settings');

    /* Load the Platform/NDK Transport packages */

    var PlatformLib  = xdc.loadPackage('ti.platform.evmc6678l');

    var NdkTransport = xdc.loadPackage('ti.transport.ndk');

    /*

    ** Sets up the exception log so you can read it with ROV in CCS

    */

    var LoggerBuf = xdc.useModule('xdc.runtime.LoggerBuf');

    var Exc = xdc.useModule('ti.sysbios.family.c64p.Exception');

    Exc.common$.logger = LoggerBuf.create();

    Exc.enablePrint = true; /* prints exception details to the CCS console */

    /*

    **  Give the Load module it's own LoggerBuf to make sure the

    **  events are not overwritten.

    */

    /* var loggerBufParams = new LoggerBuf.Params();

    loggerBufParams.exitFlush = true;

    loggerBufParams.numEntries = 64;

    Load.common$.logger = LoggerBuf.create(loggerBufParams);

    */

    /*

    ** Use this load to configure NDK 2.2 and above using RTSC. In previous versions of

    ** the NDK RTSC configuration was not supported and you should comment this out.

    */

    var Global       = xdc.useModule('ti.ndk.config.Global');

    /*

    ** This allows the heart beat (poll function) to be created but does not generate the stack threads

    **

    ** Look in the cdoc (help files) to see what CfgAddEntry items can be configured. We tell it NOT

    ** to create any stack threads (services) as we configure those ourselves in our Main Task

    ** thread hpdspuaStart.

    */ 

    Global.enableCodeGeneration = false;

     

    /* Define a variable to set the MAR mode for MSMCSRAM as all cacheable */

    var Cache       =   xdc.useModule('ti.sysbios.family.c66.Cache');

    //Cache.MAR224_255 = 0x0000000f;

    var Startup     =   xdc.useModule('xdc.runtime.Startup');

    var System      =   xdc.useModule('xdc.runtime.System');

     

     

    /*

    ** Create a Heap.

    */

    var HeapMem = xdc.useModule('ti.sysbios.heaps.HeapMem');

    var heapMemParams = new HeapMem.Params();

    heapMemParams.size = 0x300000;

    heapMemParams.sectionName = "systemHeap";

    Program.global.heap0 = HeapMem.create(heapMemParams);

    /* This is the default memory heap. */

    Memory.defaultHeapInstance  =   Program.global.heap0;

    Program.sectMap["sharedL2"] = "DDR3";

    Program.sectMap["systemHeap"] = "DDR3";

    Program.sectMap[".sysmem"]  = "DDR3";

    Program.sectMap[".args"]    = "DDR3";

    Program.sectMap[".cio"]     = "DDR3";

    Program.sectMap[".far"] = "DDR3";

    Program.sectMap[".rodata"] = "DDR3";

    Program.sectMap[".neardata"] = "DDR3";

    Program.sectMap[".cppi"] = "DDR3";

    Program.sectMap[".init_array"] = "DDR3";

    Program.sectMap[".qmss"] = "DDR3";

    Program.sectMap[".cinit"] = "DDR3";

    Program.sectMap[".bss"] = "DDR3";

    Program.sectMap[".const"] = "DDR3";

    Program.sectMap[".text"] = "DDR3";

    Program.sectMap[".code"] = "DDR3";

    Program.sectMap[".switch"] = "DDR3";

    Program.sectMap[".data"] = "DDR3";

    Program.sectMap[".fardata"] = "DDR3";

    Program.sectMap[".args"] = "DDR3";

    Program.sectMap[".cio"] = "DDR3";

    Program.sectMap[".vecs"] = "DDR3";

    Program.sectMap["platform_lib"] = "DDR3";

    Program.sectMap[".far:taskStackSection"] = "L2SRAM";

    Program.sectMap[".stack"] = "L2SRAM";

    Program.sectMap[".nimu_eth_ll2"] = "L2SRAM";

    Program.sectMap[".resmgr_memregion"] = {loadSegment: "L2SRAM", loadAlign:128}; /* QMSS descriptors region */

    Program.sectMap[".resmgr_handles"] = {loadSegment: "L2SRAM", loadAlign:16}; /* CPPI/QMSS/PA Handles */

    Program.sectMap[".resmgr_pa"] = {loadSegment: "L2SRAM", loadAlign:8}; /* PA Memory */

    Program.sectMap[".far:IMAGEDATA"] = {loadSegment: "L2SRAM", loadAlign: 8};

    Program.sectMap[".far:NDK_OBJMEM"] = {loadSegment: "L2SRAM", loadAlign: 8};

    Program.sectMap[".far:NDK_PACKETMEM"] = {loadSegment: "L2SRAM", loadAlign: 128};

    /* Required if using System_printf to output on the console */

    SysStd          =   xdc.useModule('xdc.runtime.SysStd');

    System.SupportProxy     =   SysStd;

    /********************************************************************************************************************

    * Define hooks and static tasks  that will always be running.               *

    ********************************************************************************************************************/

    /*

    ** Register an EVM Init handler with BIOS. This will initialize the hardware. BIOS calls before it starts.

    **

    ** If yuo are debugging with CCS, then this function will execute as CCS loads it if the option in your

    ** Target Configuraiton file (.ccxml) has the option set to execute all code before Main. That is the

    ** default.

    */

    Startup.lastFxns.$add('&EVM_init');

    /*

    ** Create the stack Thread Task for our application.

    */

    var tskNdkStackTest  =   Task.create("&StackTest");

    tskNdkStackTest.stackSize  = 0x1400;

    tskNdkStackTest.priority    = 0x5;

     

    /*

    ** Create a Periodic task to handle all NDK polling functions.

    ** If you are using RTSC configuration with NDK 2.2 and above, this is done by default and

    ** you do not need to do this.

    */

    /*var prdNdkClkParams         =   new Clock.Params ();

    prdNdkClkParams.period      =   0x64;  

    prdNdkClkParams.startFlag   =   true;

    Program.global.clockInst1   =   Clock.create("&llTimerTick", 5, prdNdkClkParams);

    */

    /*

    ** If you are using RTSC configuration with NDK 2.2 and above, this is done by default, else

    ** register hooks so that the stack can track all Task creation

    Task.common$.namedInstance  =   true;

    Task.addHookSet ({ registerFxn: '&NDK_hookInit', createFxn: '&NDK_hookCreate', });

    /* Enable BIOS Task Scheduler */

    BIOS.taskEnabled =   true;

    /*

    * Enable Event Groups here and registering of ISR for specific GEM INTC is done

    * using EventCombiner_dispatchPlug() and Hwi_eventMap() APIs

    */

    Ecm.eventGroupHwiNum[0] = 7;

    Ecm.eventGroupHwiNum[1] = 8;

    Ecm.eventGroupHwiNum[2] = 9;

    Ecm.eventGroupHwiNum[3] = 10;

     

     

    Question:

    Would the performance be affected when I load a release-build over CCS, leave EVM onboard USB-JTAG connected and run the application in CCS ?

  • Hello,

    I solved the issues meanwhile.

    Topic 1) Speed

    There was an issue in my client application for speed measurement. Now I see 250 mbit/s which is ok for our purposes (RJ45 jack connection). Solved :)

    Topic 2) Stability

    In the original NDK hello world app are some duplicate lines:

    ...     rc = NC_SystemOpen( NC_PRIORITY_LOW, NC_OPMODE_INTERRUPT ); ...     printf(VerStr); ...     hCfg = CfgNew(); ...

    When I do NOT remove this duplicate lines the stability is sufficient. If the duplicate lines are removed the tcp link is very unstable. So I will leave the lines present. Solved :)

    Thank you,
    Roelof Berg

     

  • Solved maybe, but I am wondering why?  If the lines truly are duplicated - why does it fix the stability?  Is it something about the order?

  • During further tests I saw still stability issues but less often. There is a chance, that the duplicated lines have no effect, I tried about four times with and without. If the random stability issues occured in the right moments it would have looked like removing the lines would have any effect ...

    I transmit 1 MB over TCP about every 15 minutes. About 1 of 10 transmissions is very slow, while the other transmissions are ok. I can close the client (PC) TCP socket and open another on (on the PC) and the next transmission is fast again.

     

  • You know, I was just wondering....  I set my ndk up with high priority - have you tried this?  It was recommended in SPRU523g near page 51.

  • Hello,

    adjusting the priprity was not successful (if the network priority is too high the image data will not be processed due to something like thread starvation and the effective capacity to process packets is reduced). I assume my special situation is a receiver stall, the receiver receives more slowly than the sender sends. Now I wonder about the fact that a proper TCP implementation is capable of handling a receiver stall. And while I try to ged rid of the stall as one solution option I also debugged the TCP communication as a second option for our issue.

    So: In wireshark I can see strange bahavior when the receiver task is too slow. Could this be a bug in the TCP implementation (NDK 2.12.1.28, V 2.22.3.20 has the same behvaior but I didn't check in Wireshark yet) ? In the attached network trace the TCP-Packet P59 contains 26280 bytes in its window and in the following window update request I can see the DSPs buffer to be reduced from 65K to 35K, so this data has arrived. But in the ACK of Packet P80 only 167901-166441 (see 58) = 1460 bytes are ACKed. The preceeding window-update messages do not fit to only 1460 processed bytes, the window-update messages indicate that all of the 26280 bytes have been processed. And beginning from then everything goes wrong, even leading to [TCP ACKed unseen segment] in packet P125.

    Could it be possible that the SYSBIOS TCP Stack has a bug that generates a wrong Ack-Number in Packet P80 ? Shouldn't it be something like Ack 192721 (if the window-update messages before are evidence for processed packets) ?

    If yes, would it be possible that this bug only occurs at a receiver stall and is therefore so seldom that it wasn't spotted before and hasn't been fixed yet ?

    Thank you,
    Roelof

  • Hi,

    It seems that the problems in this thread could be related to cache or prefetch buffer:
    http://e2e.ti.com/support/embedded/bios/f/355/p/253237/886691.aspx

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/214649/762847.aspx

    Maybe you can take a look at the ips.* and tcps.* statistic values.

    There is also a problem with the NDK dropping outgoing packets when the send queue is full:
    http://e2e.ti.com/support/embedded/bios/f/355/t/253488.aspx

    I don't understand why TI doesn't fix these issues.

    Ralf

  • Hello Ralf,

    thank you, I will look at the statistic values. Meanwhile I was able to ged rid of the receiver stall by using the multicore architecture of our DSPs. (One core reads the data into a buffer, the other cores preprocess the data at the same time ...). This way we seem to receive all data without TCP resends, this solves our issue (as long as our long term tests will not show more issues) ...

    Best regards,
    Roelof

  • I encounter a similar problem using udp C6678 EVM board. The board was choked after 30 minutes of date transmission and becomes extremely slow or completely stoped. Once you board hangs. can you check the value of RXGOODFRAME register (located at 0x2090b00) and see if value increases? that register keep track of the number of frames received by the network switch.

    here is my post:

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/278508/971884.aspx#971884

    I have filed a bug report to TI and hopefully it can be solved soon.

  • We have a similar problem, here is the post:

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/243353/999492.aspx

    Note that we don't use MSMC so I suppose the prefetch buffer bug is not relevant here (is it?)

  • Hello,

    very interesting. Thank you for connecting the discussions.

    We decided to switch to PCIe instead of Ethernet meanwhile. I will drop a note here in a few weeks whether PCIe is more an 'runs out of the box' experience than Ethernet on the C6678. As PCIe is faster anyway and has a also a lower latency I assume that we will be quite happy with this decision.

    Best regards,
    Roelof Berg

     

  • My apologies... Prefetch buffer bug IS relevant here. Actually in the client example they receive data in DDR3 and then copy it to PBM located in L2SRAM. We have patched the NIMU driver as Ralf suggested and now it works.

    Regards,

    Dmitry

  • Above I promised to inform here whether PCIe works more reliable than TCP. Answer: It does ! We used an DSPC-8681 PCIe adapter and it worked good (on 64 bit hosts at least) without the need for any patches. And when DMA is used (and the buffers are filled in-place in the kernel-buffers, without wasting memcpy operations) it is extremely fast. Also roundtrips are utmostly fast. We're very happy about the decision to leave Ethernet (also because we still weren't able to get TCP running. But who cares about TCP in the internet-of-things-age, right ;)