This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Recv from USB CDC often lags

I'm working on a cross-platform serial library (in C) to communicate to a USB CDC program on the MSP430. I have been noticing that when I run my program in linux, it works fantastically well. However, on Mac OSX or Windows 7/XP, every once i a while a packet of data takes a significant time to be received by my program. Since the behavior is pretty specific, I'll explain the gory details of what I am doing and when the lag occurs.

For flow control reasons and to keep buffer sizes small on the microcontroller, I designed a simple handshaking scheme to transfer data.
Ideally, my PC program and the microcontroller (MCU) would communicate in a packet format as follows:

PC <-- MCU - MCU sends the PC a 4-byte header to the PC
PC <-- MCU - MCU sends a 128-byte payload
PC --> MCU - PC sends a 4-byte acknowledgement that it is ready for the next packet
...repeat...

The above works fine when my program runs on linux (which is enough confirmation for me that it is not a microcontroller issue)

When running the program on OSX or Windows, it is able to recv about a dozen packets correctly and then it does this:

PC <-- MCU - MCU sends the PC a 4-byte header to the PC
PC x-- MCU - MCU transmits the 128-byte payload, but the PC doesn't recv it immediately
.......... - MCU is now waiting for an ACK while the PC program is waiting for the payload
...The MCU times out after several seconds of waiting (as it should). Then the PC finally gets the payload.

So, why does this happen?
Each packet transaction normally happens very quickly. Does the PC get overwhelmed with all the activity? Buffers shouldn't be filling up since I don't allow them to because of the handshaking. Increasing the timeout period on each end doesn't really help either (and a 10 second timeout is way too much anyways)

I have attached the source code for the abstraction layer that handles sending and getting data from the serial device. One thing to note is that I originally wrote it using only POSIX calls to make it easy to port to the other OS's. I rewrote portions for windows to use the native API and it reduced the frequency of these "stalls" occuring but it still messes up sometimes.

2703.serial_io.zip

Hopefully I explained what I am observing well enough. Let me know if I need to clarify anything.
-Alex

  • First I'm missing a bit the DCB settings - are you configuring your serial port in the device manager?

    I'd include something like:

    // Configure Serial Port
    DCB dcb; //Control Structure 

    //Get actual control settings
     GetCommState(hSerial,&dcb); 

    //change settings
     dcb.BaudRate=115200;
    dcb.fBinary=true; // has to be true (Windows doesn't support non-binary modes)
    dcb.ByteSize=8;
    dcb.Parity=NOPARITY;
    dcb.StopBits=ONESTOPBIT;

    // Set changed settings
     SetCommState(hSerial,&dcb); 

    But I guess that's not causing problems here - when these settings would be wrong you'd get no working communication at all.

    What irritates me is that after the MCU times out your PC receives the payload - is it the payload you were waiting for or is it the next payload sent by the MCU and is it correct?

    The only thing which seems a bit dodgy is your while(size) check, which requires size to get exactly to zero, but what happens if for some strange reason it gets negative (or very big positive) - but if that would happen I don't think a simple MCU reset would fix things - or is your PC also timing out and starting from waiting for the handshaking?

  • Yeah I doubt its the DCB settings. I didnt get around to setting those in the Windows code but I do in the POSIX code and I still get the same symptoms on OSX.

    -- "What irritates me is that after the MCU times out your PC receives the payload - is it the payload you were waiting for or is it the next payload sent by the MCU and is it correct?"

    I'm getting the correct payload. I believe what is happening is that the MCU sends the payload successfully and the PC receives it. It seems like that particular packet is getting "stuck" in the kernel/OS for a few seconds before my program get a hold of it.

    Once my program gets the payload, it accepts it correctly and replies with an ACK. Since the MCU isn't listening anymore, the PC also times out when waiting for the next header.

    Aside from the MCU receiving the rogue ACK response, it doesnt seem to be in a corrupted state needing a reset.

    EDIT: Also just to clarify (just in case)... The code I included in my original post is for the PC. Not the MSP430.

  • Hmm okay - maybe you could try and do a synchronous read and not an asynchronous? I don't know if they are handled differently by the kernel - but it could be that synchronous reads on communication ports get the data out of the buffers (or wherever the data goes first) faster. Your not taking any advantage of asynchronous read operations anyway, you loop until all data is received and then you exit the loop - no other process going on in there. 

  • Not sure what you mean by synchronous reads in this context. Could you elaborate?

    EDIT: Nevermind. Should've googled first. So effectively, my functions are wrapping async calls to make it look synchronous. I'll give it a try with the windows version since I can define timeout behavior directly. I'll have to look into the POSIX documentation but I didn't remember seeing anything about blocking read() write() calls.

    BTW, Thanks for taking time to help out. Much appreciated.

  • Ok so after some more experimentation, it turns out it is indeed a problem on the MSP430 end.

    It's looking like the cdcRecv operation that is called directly after the cdcSend for the payload causes the payload to get "stuck". Calling a cdcSend again results in both the payload and the new data to be sent over to the PC. No clue why this doesn't show up when talking to a Linux machine and does on others...

    My theory is that a longer cdcSend(128 bytes) cannot be [reliably] immediately followed by a cdcRecv(). I'm going to do a few more tests to verify this before I consider it a possible bug in TI's API.

  • On that end I'm not even a little help (I've only done a few small Windows serial projects - so also not an expert there). I've never done anything with the CDC - so maybe someone else can advise or test something.

**Attention** This is a public forum