TI Experts,
I am running into an intermittent problem after integrating a modified version of the EMAC multicore example project in the PDK 1.1.2.6.
I am working with Advantech's DSPC-8681E card. I started with PA_multicoreExample_exampleProject included in the PDK and modified it so that it works with the 8681 card and sends packets out on the network instead of doing an internal loopback. I increased the TX and RX buffers sizes to allow for a max packet size of 1514 and increased the number of packets sent by the test code to 1000. I also added a delay between calls to SendPacket() as my capture interface was having trouble keeping up with rate.
I used TI's Desktop Linux SDK and the dsp_utils app included to do the following:
1. Reset all 4 DSPs using the dspresetall.sh script
2. Initialize all 4 DSPs using the init_dsps.sh script
3. Load the PA_multicoreExample_exampleProject .out file to cores 1-7
4. Repeat #3 for core 0 as core 0 should be loaded last since it starts the other cores
5. Wait for packets on capture interface
I can repeat this test many times and it seems to be mostly reliable except for the occasional missed packets on the capture interface, but this doesn't seem to be an issue with the C66x side.
I have two projects that I am trying to integrate this network I/O code from the PA_multicoreExample_exampleProject into. After integrating it with either project, I run the same above test substituting out the PA_multicoreExample_exampleProject .out file with the .out file for either project, but run into some problems.
Project A:
Intermittently the test will fail. I get a "No Tx free descriptor" error message from the SendPacket() function around the 289th packet, and I don't see any packets at all from the C66x on the capture interface. It almost always occurs on the 289th packet, but I have a seen a few cases where this wasn't the case yet the packet was near 289 (e.g. 287). This can happen on the first test after a server reboot or after several iterations of the test have ran successfully. If I replace Step #5 above with a sleep command and run the test in an infinite loop, the problem occurs in fewer iterations of the test. I have stripped down the project to the bare minimum as far as what code gets executed and have made it as close as possible to the modified version of PA_multicoreExample_exampleProject that I had working for the code that gets executed, but the problem still occurs.
When the problem occurs, the code continues to run and I can continue to receive packets. The next time I try to reset the DSP, I get the following error:
setPscState: dsp_id 0: Current transition in progress pid 2 mid 7 state: 0
setPscState: dsp_id 0: MD stat for pid 2 mid 7 expected state: 0 state: 10 timeout
setPDState: dsp_id 0: Previous transition in progress pid 2 state: 0
setPDState: dsp_id 0: Current transition in progress pid 2 state: 0
After looking in the source where this message is produced and understanding the numbers, it looks like there's a problem with resetting the PA. One thing I don't understand though is how the state is 10 as the only non-reserved values according to the PSC UG are 0 and 3.
Project B:
This project is a combination of the modified PA_multicoreExample_exampleProject and a modified version of the H.264 encoder example code. I have spent less time debugging this project, but the problem appears to be similar to what happens in project A. In this project TI Desktop Linux SDK is not used, but the test procedure is similar to what I have outlined above. In this case the "No Tx free descriptor" message from the SendPacket() function generally occurs for the 129th packet, but sometimes occurs slightly later (around packet 150). When the problem occurs, the behavior seems to be similar to project A, code keeps running, but no packets are seen on the network. In this case, the C66x chip can still be reset over PCIe with no errors from the function accessing the PSC, but the "No Tx free descriptor" message will be received from the SendPacket() function for the 129th packet for every subsequent run of the test.
In both projects, a server reboot is required to get out of the bad state. I have tried to reproduce the project with just the modified PA_multicoreExample_exampleProject , but haven't been able to do so.
Has anyone encountered an issue similar to this before?
What are some specific areas to look at while trying to debug this problem?
What areas in the PA_multicoreExample_exampleProject code or in the PA itself might be sensitive to other code?
Regards,
Chris
Signalogic