This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to debug NDK problems

I'm having problems with NDK and can't figure out how to track down the cause.

The goal is to have a PC use Ethernet TCP/IP to exchange data with core 0 in a evm6678l board.This data will be sent from the PC to core 0 and then to other cores using IPC. The processed data will be sent back to core 0 and then to the PC over the same socket.

I have hacked the hua_evmc6678l demo program to contain the network code for my server (a basic socket-based server) and it works OK looping the data back to the PC. It works in both static IP and DHCP modes.

So I added the networking code to the program that exchanges the data with other cores. I copied the setup code for QMSS, CPPI, and PA. I copied with some modifications (no web server, for example) the NDK configuration code. The .cfg file was a challenge because there was no way to do a direct comparison of my .cfg with that of the demo program to see what I needed to add, but I think I got it all.

The final program sort of starts to work and then fails (the NDK code stops sending ACK packets) when running with a static IP. Sometimes the failure comes after a lot of TCP retransmits, but other times it fails after the first set of TCP packets with no retransmit. After this happens, the board does not respond to a ping. If I set it up so that the 6678 only sends packets to the PC, then it works fine. If I set it up so the 6678 only receives packets from the PC then it fails. It even fails if I severely restrict the rate that the PC is sending data (1024 bytes every 100 ms).

With DHCP it appears to get an IP address but ping from the PC can't see it using either the host name or the IP address, and the PC program fails to connect. I added DHCP to the code so that I could test using our house ethernet. The static IP configuration uses a USB to Ethernet adapter (I can't use static IP addresses on our house ethernet) and I wanted to get the adapter out of the loop in case it was the problem.

The ROV logs don't show anything obvious to me. All of the tasks are waiting on semaphores. There's plenty of heap left, and no stack has overflowed. No error message is written to the console.

So, where do I start looking for the cause of these problems?

Thanks,
Fred

  • Fred,

    If enabled, do you see the console printout about which IP and MAC address the DSP is configured for?  Something like:

    [C64XP_0] emac_init: core 0, port 0, total number of channels/MAC addresses: 1/1

    [C64XP_0] MAC addresses configured for channel 0:

    [C64XP_0] 3C-2D-B7-7C-FD-B0

    [C64XP_0] emac_open core 0 port 0 successfully

    [C64XP_0] Registration of the EMAC Successful, waiting for link up ..

    [C64XP_0] Network Added: If-1:192.168.1.1

    [C64XP_0] Port 0 Link Status: 1000Mb/s Full Duplex on PHY 24

    This might only apply to an EVM project based off the MCSDK; I'm not sure if the standalone NDK libs would print this message also.  If the last line is not printed, then the NDK is not fully configured.  Sometimes I've seen that happen when a BIOS task hogs the CPU and doesn't let the NDK finish.  A lot of times this is a task priority issue or just an error in a loop so a task never pends.

    Also, how does your application create tasks which receive/send TCP?  Are they created dynamically or statically?  I noticed a similar problem when a receive/send task was created statically and ran before the NDK was configured (i.e. before runNetwork ran).  Moving the task creation to a dynamic call in the NC_NetStart NetworkOpen function fixed that.  I'd check that no NDK functions like recv() or send() are called before runNetwork has finished.  (Note that in this example though, the DSP never responded to pings...so maybe you're seeing something else.) 

    Nick

  • Hi Nick,

    In the hua_evmc6678l "hack" I'm dynamically starting the server task just before the BIOS_start() call, but the  first thing that the server task does is TaskSleep(2500). (I wasn't kidding about a hack :-).

    In my program, I have a static semaphore semNetowrkUp. NetworkOpen() posts it and the task with my socket() code pends  on it. I added a TaskSleep(1000) after the pend for these tests to make sure that all network setup is done before the server does anything.

    I have found out that the problem with using DHCP is that our server seems to return addresses that are already in use! So, the problem when using DHCP isn't necessarily a problem with the NDK. I'll have to restrict my testing to the local network for now.

    My server was using System_printf() to print messages. I noticed that a lot of the messages printed by the demo program log were missing from my log. I changed my code to use platform_write() just like the demo program and the messages reappeared. It seems that there is a conflict between System_printf() and platform_write() that TI needs to address.

    Here's the log from the demo program. The last three entries are from my server.

    [C66xx_0] CPPI successfully initialized
    [C66xx_0] PA successfully initialized
    [C66xx_0] HUA version 2.00.00.04
    [C66xx_0] Hostname: tidemo-CE3630.sarnoff.internal
    [C66xx_0] MAC Address: 90-D7-EB-84-A3-92
    [C66xx_0] EVM in StaticIP mode at 192.168.2.100
    [C66xx_0] Set IP address of PC to 192.168.2.101
    [C66xx_0] PASS successfully initialized
    [C66xx_0] Ethernet subsystem successfully initialized
    [C66xx_0] Ethernet eventId : 48 and vectId (Interrupt) : 7
    [C66xx_0] Registration of the EMAC Successful, waiting for link up ..
    [C66xx_0] Network Added: If-1:192.168.2.100
    [C66xx_0] Service Status: Telnet   : Enabled  :          : 000
    [C66xx_0] Service Status: THTTP    : Enabled  :          : 000
    [C66xx_0] ImageServer compiled Mar 23 2012 14:11:58
    [C66xx_0] Start ImageServer on port 3627
    [C66xx_0] Accepted connection from 101.2.168.192:2914 on socket 815D1EC4

    Here's the entries from my program. I have removed the CPPI and PA startup messages  but left in the error messages in case there is a failure. I also don't have the http or telnet server. The last three messages are from the server code, not the network initialization code. The "Connected" message is printed after the accept() call returns.

    [C66xx_0] 00000.000 fdOpenSession: OOM
    [C66xx_0] START TaskEnet port 3627
    [C66xx_0] MAC: 90-d7-eb-84-a3-92
    [C66xx_0] Hostname: tidemo-CE3630.sarnoff.internal
    [C66xx_0] IP: 192.168.2.100, Mask: 255.255.254.0
    [C66xx_0] Host: 192.168.2.101, Gateway: 192.168.2.101
    [C66xx_0] PASS successfully initialized
    [C66xx_0] Ethernet subsystem successfully initialized
    [C66xx_0] Ethernet eventId : 48 and vectId (Interrupt) : 7
    [C66xx_0] Registration of the EMAC Successful, waiting for link up ..
    [C66xx_0] ConfigBoot: Network Added: If-1:192.168.2.100
    [C66xx_0] START ImageServer port 3627
    [C66xx_0] Connected to 101.2.168.192:3131 socket 802c9cc4
    [C66xx_0] TaskInput: Open socket 802c9cc4

    I don't know why fdOpenSession() shows up in my log only once and never in the demo program log. I do call it in every task that uses socket calls.

    So now we get to behavior. The first few times that I ran the hacked demo program it would send a few packets and then stop responding to ACK packets from the PC. I have a Wireshark log if you would like to see it. I ran it a few times and the behavior was pretty much the same. Ping still worked but the TCP stack seemed to be hung. This seems to be different from what I saw before, but it did work one time. The hacked demo ran through the test whole data set (80 720x480 images sent uncompressed), but the wireshark log showed a lot of retransmitted ACK packets from the PC.

    After several attempts of running my program, it never completed a single image. Sometimes it would get several dozen packets before hanging. Most often it would quit after the first few packets.

    I'm using IPC 1.24.0.16, PDK 1.0.0.17, NDK 2.20.4.26, SYS/BIOS 6.32.5.54, XDC 3.22.4.46, and CCSv5.1.1

    Thanks and have a good weekend,
    Fred

  • It sounds like you are creating a BIOS task dynamically before the call the BIOS_start().  I'm not sure this will work correctly since BIOS isn't running until BIOS_start() is called and cannot manage the Heap for dynamic tasks.  I'm curious, does the task show up in ROV if you pause your program after initialization?  I'd also recommend removing the Task_sleep() calls. In a lot of cases, having to use Task_sleep() is a sign that something isn't quite right in the flow of a program.  Instead of the semaphore, you can create the task in NetworkOpen().  You can also open a file descriptor session in NetworkOpen().  See below for an example:

    static HANDLE hTask;

    static void NetworkOpen()
    {

     hTask= TaskCreate(myTask, "&myTask", OS_TASKPRINORM, OS_TASKSTKNORM, 0, 0, 0); // 0's are arg0, arg1, and error buf
     fdOpenSession(hTask);

     }

    I'm not familiar with the hua_evm demo or printouts for TCP.  I've mainly used UDP with the NDK.  Therefore those printouts could be correct.

    Nick

  • Nick,

    I'm not trying to improve the hacked hua_evmc6678l program. I'm trying to make my program work.

    In my program all of the tasks are started statically with entries in the SYS/BIOS configuration. A static semaphore prevents the task that makes the socket() calls from running before the network is up. I added the Task_sleep() after the semaphore pend just to see if there was any change in behavior, but there was not so I removed it. This arrangement should have the same effect as your example code.

    BTW, your code has a potential race condition. If "myTask" has to be started with a higher priority than the NetworkOpen() task, then the fdOpenSession() call could happen too late. The safest way to call  fdOpenSession() is in the task that needs to have a file descriptor table.

    Starting tasks before BIOS_start() is OK according to the teacher in the SYS/BIOS course I took last month. The task is put in the ready to run state, but does not run until BIOS_start() starts things up. That works in the hacked hua_evmc6678l program because the server code in that task does start to run. The Task_sleep() in that code is, as I said, a hack that I put in before I knew how NetworkOpen() could be used. It's not worth cleaning up that code, though, since my program has the additional IPC features.

    While thinking about this over the weekend, I realized that the IPC example code does not set up any interrupts in the BIOS configuration, but the NDK demo code does. I think that the IPC code uses an interrupt for the Notification manager, but maybe I'm reading too much into the documentation. Could there be some invisible interrupt conflict that could be causing the TCP stack from responding to ACK packets? This is just a wild guess, but that's what I'm reduced to at this point.

    Fred

  • A couple of tips I've found by a combination of slog and a wide variety of forum posts.  I'm fairly new at SYS/BIOS and the NDK so am happy to be corrected.

    (1) If you're loading an NDK/MCSDK sample based app on an EVM6678L through the CCS 5.1 debugger, you can't usually just "reload" code.  You have to do a system reset and then reload, otherwise the NDK doesn't come back up properly.  The NDK sometimes crashes and burns (fails to register EMAC) and sometimes I think it appears to work but ping or connect requests from the PC are ignored. 

    Sometimes System Reset isn't enough and you have to do a "Global Default Setup" from the GEL script (my post).  I've made this an OnReset action in my GEL file so I don't have to do this every time. 

    It's a bit of a pain but I gather this is a known problem.  I'd love someone to tell me what I need to software reset in my code to get around this.

    (2) It seems to depend on the DHCP server software whether the DHCP negotiation works or not. I gather this is probably because the DHCP client was changed around the end of the MCSDK beta period, and the packets in the post-beta NDK are a length which is not standards compliant.  (http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/99/t/6182.aspx )  I was trying to use the free dhcpserver.de version for testing, and confusingly enough it worked with the older demo apps supplied on the EVM flash but not on the MCSDK 2.00.0x applications!  Your mileage may vary.  I've just changed to static IP and that works fine.

    (3) Building a debug version of the NDK that you can step into is actually fairly straightforward.  I found that quite helpful for certain things.

    (4) To handle the problem of waiting for the stack to start up, I have a statically created semaphore which I post in the NetworkIpAddr callback ie. after the stack is up AND the address to bind your sockets to exists.  Then the statically created server thread just pends on that, after which it can create an fd environment and bind to sockets.  It depends on your use case:  the NDK daemon system is nice for TCP apps with infrequent connections (such as a web interface) and automatically gives you a callback when you need it, but didn't suit our high-frequency UDP based application.

    Hope that's a help.

    Gordon

  • I definitely agree with Gordon's points above.  I've taken to power cycling the DSP and reconnecting with CCS (rather than resetting/restarting from CCS only) to guarantee that the NDK is getting a fair shot at starting up.  Also, rebuilding the NDK with -g is helpful in stepping into function calls and seeing where potential hangups are.

  • I tried power cycling (or using the hard reset button on the EVM) for a while, but it's painfully tedious.  I found that re-running the Global Default Setup script when things got stuck worked for me and was a lot faster.  If you get the GEL script right, you can do "System Reset", "Build", "Load program" (which then runs Global Default Setup) and you're up with 3 steps.  Sadly the "auto reload after build" has never worked quite right for me with the NDK.   That's a shame, because when it does work it's really handy.

  • Hi Gordon and Nick,

    Thanks for the hints. My program uses IPC as well as NDK, so I need to load a couple of cores to run my test. I defined a Debug Configuration to load the cores. When I need to reload my programs, I terminate the emulator connection, power cycle (the power brick is on a switched power strip within easy reach), and then run the Debug Configuration. It's a pain but it works every time.

    I've done some more experimenting with the hacked version of hua_evmc6678l. If I disable the http server by commenting out the CfgAddEntry() that starts it, then my server gets stuck just like it does in the program I wrote from scratch! That's pretty discouraging. It seems to me that there is a fundamental design flaw in NDK.

    So, I'll have to apply some magical thinking to get my program to work reliably. Maybe I'll define a http server that does nothing to make TCP work correctly. Ugh.

    Fred

  • Hi Fred,

    I don't have any easy answers, and while I can suggest some things to try I suspect you've thought of them too.

    One suggestion is to wonder if you're getting memory corruption somehow.  This might not be in your application code, but perhaps something in the NDK or IPC configuration?  That your code works for a while and then dies is suspicious.

    The "fdOpenSession: OOM" is also suspicious.  Whenever I've got that, it's because I've tried an fdOpenSession when the NDK stack has not actually started, and nothing then works properly. A debug (-g) build of the NDK can be helpful for this kind of thing, because you can then drop breakpoints on that message inside the NDK.

    Can you get your code to work single-core at all? 

    Which platform package are you using, and how are you setting up the IPC shared region(s)?  The advantage of starting from the image_processing demo and its custom platform is that IPC and the NDK are both configured including memory maps and all. 

    Regarding .cfg files, we started with the 'helloWorld' NDK demo but hit various problems.  I then worked up a synthesis of the hua and image_processing demo configuration and compiler options, and that seems to work.  Although it isn't totally straightforward, you can get quite a long way with winmerge on your cfg files.

    Gordon

  • Well, it seems to be working now. The problem wasn't in my code or memory corruption but mostly in the SYS/BIOS configuration. It's not very fast because it gets lots of TCP retry and retransmit errors. That knocks the speed down by a lot, but at least the data gets there. That's OK for debugging but I'll need to fix that eventually.

    Here's the major changes that I made:

    1. First I duplicated the section map of the demo program. Since that puts a lot of data into MSMCSRAM (Multicore shared memory) I had to move my IPC message heap to DDR3 and dedicate the multicore memory to a single core. That didn't work but I didn't restore my original layout since doing this was a lot of work and forced me to change all of the platforms for all of the cores.

    2. Then I started removing packages from the SYS/BIOS configuration on the chance that there was some kind of conflict. The Log package seems to be the culprit. Maybe it was a combination of it and something that I removed before it, but I don't have time to try all combinations.

    The fdOpenSession message turned out to be the call I put into the NDK configuration task before the configuration code. It wasn't necessary but I didn't realize that at the time I added it.

    While doing all of this and looking on the E2E forums for clues, I noticed that TI doesn't support the NDK package anymore. No wonder! If I had known this before I started then I probably would have found a different way to get data into the board.

    Fred

  • Fred Brehm said:
    While doing all of this and looking on the E2E forums for clues, I noticed that TI doesn't support the NDK package anymore. No wonder! If I had known this before I started then I probably would have found a different way to get data into the board.

    Fred,

    I'm just curious what led you to believe that the NDK is no longer supported? Did someone tell you this on the community support forums? Or perhaps this conclusion was due to the high amount of non-TI people helping you on your forum post?

    The NDK is a product that is currently active and support questions are answered on a regular basis. 

    Thanks,

    Steve

  • On this page

    http://processors.wiki.ti.com/index.php/Before_asking_for_NDK_support

    It says

    Note: as of September 2009, TI provides the NDK on an 'as is' basis with no support.

    Is there a new announcement where TI is providing support again?

    Fred

  • Fred,

    Glad you got something working!

    Do you want to post your memory map and which platform you're using?  I'm not an expert but someone at TI might be able to give you some pointers.  For example, I now remember that you're supposed to have L2 cache enabled for the NDK, and when we read that it was not immediately obvious to us how to check the cache config.  If you're using a TI platform config it probably is on anyway. Also the balance between what you put in DDR3 and MSMC RAM might help performance. 

    It would be a shame not to be able to use the Log package - I think it's a most useful component of SYS/BIOS.  I'm puzzled as to why removing it from the config should have an effect on the NDK, which doesn't use the Log module at all.

    Gordon

  • I wound up using a custom configuration that is similar to

    C:\Program Files\Texas Instruments\mcsdk_2_00_05_17\demos\hua\custom\hpdspua\evmc6678l

    but only defines some of DDR3 as RWX instead of all of it. The undefined parts of DDR3 are reserved for the other 7 cores, and a very large chunk is defined as MP Heap. The memory section definitions is the same as in

    C:\Program Files\Texas Instruments\mcsdk_2_00_05_17\demos\hua\evmc6678l\evm.cfg

    I don't know how much of that complex memory layout is really necessary, but I don't have any more time to play with it. It mostly works now.

    My original map was much simpler. I wanted to use Multicore Shared Memory for IPC messages so I put everything in DDR3 and had all of L2 and L1 set up as cache (the whole program is too big to fit in L2 alone). That may still work but, as I said, I don't have time to experiment.

    The Log package is a big loss for debugging. Fortunately, the program for that core is done so there's no more debugging. Now I can concentrate on the programs for the other cores. The only things they need from the BIOS are the IPC packages.

    Fred