This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Intermittently NDK NetScheduler was terminated after calling socket recv()

Other Parts Discussed in Thread: SYSBIOS, TMS320C6455

I am using NDK 2.24.00.11 with C6455 ethernet driver.

I created a SOCK_STREAM server socket with IPPROTO_TCP on the C6455 DSP, and I have a client socket created on a different Host microprocessor.

Host microprocessor socket sends the data to the DSP, and intermittently DSP recv() function returns 0.

I found out that somehow the following code in the NDK's pbm.c PBM_free() function got executed by the previous recv() function.

if(!pPkt || (pPkt->Type != PBM_MAGIC_POOL && pPkt->Type != PBM_MAGIC_ALLOC)) {
  DbgPrintf(DBG_ERROR, "PBM_free: Invalid Packet");
  return;
}

And sometimes the following code in the NDK's pbm.c PBMQ_enq() function got executed by the previous recv() function.

if( pPkt->Type != PBM_MAGIC_POOL && 
    pPkt->Type != PBM_MAGIC_ALLOC )
{
  DbgPrintf(DBG_ERROR, "PBM_enq: Invalid Packet");
  return;
}

Since debug flag is DBG_ERROR, DbgPrintf() will call NC_NetStop(), which will stop the NetScheduler.

What could cause this? Do you have any suggestion what to check?

  • Hi,
    I have a couple of questions to help understand what's going on:
    - What's your network topology?
    - Can you attach a wireshark capture that shows the problem?
    - Is your server socket blocking or non-blocking?
    - Did something go wrong on the client because from the docs, recv() "Returns 0 on
    connection oriented sockets where the connection has been closed by the peer"

    Thanks,
    Moses
  • Hi Moses,
    thanks for the reply.

    - It's a one to one connection between DSP and Host. DSP has two sockets (one for rx, one for tx). Host has two sockets (one for rx, one for tx). Server socket is the rx, client socket is the tx.

    - I need to modify my board to sniff the ethernet port.

    - server socket is blocking

    - I didn't notice anything wrong with the client. recv() returns 0 because it executes the following code in the SockRecv()
    if( ps->StateFlags & SS_CANTRCVMORE)
    {
    error = 0;
    goto rx_dontblock;
    }

    Then I found that ps->StateFlags SS_CANTRCVMORE flag was set by SockShutdown().
    SockShutdown() was called by SockClose()
    SockClose() was called by SockCleanPcb()
    SockCleanPcb() was called by SPIpNet with Op == CFGOP_REMOVE

    If I traced back the code, I found out that the DSP exit NetScheduler (see my original posting).
  • When you say the DSP exit the NetScheduler, do you mean NC_NetStart() returned back to your application? Can you tell if NC_NetStop was called?
  • Hi Brad,

    Yes.
    I could tell NC_NetStop was being called by DbgPrintf when level is DBG_ERROR.

    Jimmy
  • It seems then we need to track down who called DbgPrintf. What was printed out of the DbgPrintf?

  • Sometimes I got PBM_enq: Invalid Packet, sometimes PBM_free: Invalid Packet.

    I mentioned this on my first posting.

    Thanks
  • Jimmy,
    Why do you have to modify your board for Wireshark? Can you just put a switch between the DSP and the client and connect your PC to that same switch to sniff it? You'll need a “smart switch” with port mirroring to do this by the way.
    Can you reproduce the error consistently? You can put break points into those 2 failure cases. When it hits the breakpoints, you can look at the B3 register to get the return address of the calling function. It'll help to know what function-call led to it ending up in PBM_free.

    Moses
  • I enabled port mirroring on the switch, but I couldn't see the traffic on the port I dedicated to monitor the traffic. I have to look into this more.

    With the debugger hooked and if I loaded the code using debugger, I couldn't reproduce the issue.
    Previously I was be able to track the code execution by writing few patterns to the EMIF bus and monitor it using Logic analyzer.
  • I found out the problem was due to my incorrect L1 cache setting in the Platform.xdc on my Linux build environment.
    This solution also fixes another HPI issue that I observed ( e2e.ti.com/.../1421367 )


    We are not using CCS for our Linux build environment. Below are steps I did to figure out the issue.
    -----------
    Since I couldn't reproduce the problem when I built my project on Windows, I compared the map and myproj_dsp_p64Pe.c files generated from Linux build and Windows build.

    I noticed the following difference: (Windows value is in parenthesis)
    __FAR__const CT__ti_sysbios_family_c64p_Cache_initSize ti_sysbios_family_c64p_Cache_initSize__C = {
    ti_sysbios_family_c64p_Cache_L1Size_32K, (ti_sysbios_family_c64p_Cache_L1Size_0K)
    ti_sysbios_family_c64p_Cache_L1Size_32K, (ti_sysbios_family_c64p_Cache_L1Size_0K)
    ti_sysbios_family_c64p_Cache_L2Size_0K, (ti_sysbios_family_c64p_Cache_L2Size_0K)
    };

    In my project I wanted to use L1 as RAM, and specified L1Cache to 0K.

    My Platform.xdc file:

    metaonly module Platform inherits xdc.platform.IPlatform {
    config ti.platforms.generic.Platform.Instance CPU =
    ti.platforms.generic.Platform.create("CPU", {
    clockRate: 831.6,
    catalogName: "ti.catalog.c6000",
    deviceName: "TMS320C6455",
    externalMemoryMap:
    [
    ],
    l1DMode: "0k",
    l1PMode: "0k",
    l2Mode: "0k",
    });

    instance:
    override config string codeMemory = "IRAM";
    override config string dataMemory = "IRAM";
    override config string stackMemory = "IRAM";

    ---------------(I added these new lines to fix the problem): -------------------
    config String l2Mode = "0k";
    config String l1PMode = "0k";
    config String l1DMode = "0k"';
    }