This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3135MOD: Long random sending delays in RAW socket mode with CC3135

Part Number: CC3135MOD
Other Parts Discussed in Thread: CC3135

Hello,

I see long random delays in sending IP packets over a raw socket using sl_Send() command when I send UDP or TCP packets every 2..10ms. This delay often appears within 3…15 minutes and can last up to 3.6 seconds (!?). There is no clear pattern when the delay starts and how long it lasts. I also see much shorted delays lasting for 10…250ms. They appear more often, especially when the packet rate increases.

The socket was open using sl_Socket(SL_AF_PACKET, SL_SOCK_RAW, 0) command and the NWP was placed in the Network Bypass Mode (p.108, SWRU455M – FEBRUARY 2017 – REVISED OCTOBER 2020).

The power policy is SL_WLAN_NORMAL_POLICY. It can also be SL_WLAN_ALWAYS_ON_POLICY with the same result. The signal strength is from good to excellent: -55dBm…-33dBm. The transmitting channel is in the 2.4GHz frequency band. WLAN mode does not matter.

The socket can be in blocking or unblocking mode. The FlowContCB.TxPoolCnt value drops to FLOW_CONT_MIN and no more packets can be sent during the delay. After the delay, everything gets back to normal. There are no reported WLAN disconnection/connection events during the delay.

Is there a way you can fix this issue on the NWP side – long delays are not acceptable? Alternatively, please let me know if I hit an undocumented NWP constrain and if there is a way to mitigate it.

Thanks,

Olek Bogush

  • Hi,

    The fact that you see the flow control gets to the minimum means that the TX pool count is dropping and hence the packets in the NWP are deferred. The ALWAYS_ON would help only in RX mode, not in TX mode so this is clear why you don't see any improvement.

    First, I would concentrate on UDP and not TCP or a mixture since TCP requires an ACK from the peer device and until it happens, the packet would remain in the TX queue and may make this phenomenon even worse. Can you make sure you test only with UDP? Do you see different behavior when testing with UDP vs. TCP?

    In addition, the fact that the link quality is good does not necessarily mean that the air is not congested. Can you double check that you are working on a clean channel? the motivation is such that busy channel would make the device defer its transmission so effectively making the issue worse as well.

    Lastly, what if you space transmissions in time even more? any improvements? what size of packets are you using?

    Shlomi

  • Hi Shlomi,

    As I already mentioned, I am using a RAW socket. NWP does not know whether it is UDP, TCP or whatever else higher-level protocol. At least, it is supposed to be that way. However, I agree that using UDP removes unnecessary complexity in observing this issue. No, I do not see different behavior using UDP vs TCP.

    Regarding a clean channel. No, I am not working on a clean channel. If I move on a clean channel (a hard to do thing in 2.4 GHz band), the issue definitely improves. On the other hand, I did not deliberately create a congested environment. I have only one other AP running on the same channel and nothing else around. The signal strength of the other AP is around -56 dBm, close or below the signal strength of my NWP.  Increasing the signal strength of the NWP to -33 dBm (external antenna) does not improve the situation. I moved the device to a completely different place with different APs, with more congestion, and I had a similar result. In summary, my environment is not ideal, but it is also not unique. It is close to what I would expect at the customer’s site.

    I did not check lower transmission rates – I need to send packets every 10ms or faster in my application. The packet size is around 50…200 bytes and is flexible. The absolute maximum is 1514 bytes.

    Thanks,

    Olek

  • Hi,

    There is no difference whether you are using the NWP or not, the TX pool is the same so the fact that you actually get the minimum pool means that you are out of packets so it takes some time to free the pool and get back on track.

    I was asking about the congested air to get the idea of whether it is the air that prevents the device from sending the packets.

    The packet size is less important as an entry from the TX pool would get consumed regardless of the size. So frame every 2mSec as a minimum means 500 frames per second. The datasheet denotes a maximal theoretical throughput of 15Mbps so using 1500bytes frames means 1250 frames per scond on a clean environment. 500 frames is still less but the environment may be busy so it may be on the edge.

    what SPI rate are you using?

    do you have an air sniffer?

    Regards,

    Shlomi

  • Hi Shlomi,

    The SPI frequency is 20MHz – the maximum allowed by the datasheet. No, I do not have an air sniffer. What is the point of using an airs sniffer if NWP does not transmit a frame – I see exactly the same behaviour with different Wi-Fi receivers.

    The problem is in 1…3s and more delay, not in 2….10 ms delay due to a congested Wi-Fi channel.

    Thanks,

    Olek

  • Hi,

    I was referring to the 2..10mSec as how fast you push frames to the device. If you push to fast, the flow control mechanism stops the host from sending more and hence you get deferred.

    When you say "the same behavior with different Wi-Fi receivers" do you means the other peer device that gets the transmissions from the CC3135? If so, I just want to make sure with an air sniffer whether CC3135 is not transmitting or the peer device does not pull.

    If it is a CC3135 TX issue not transmitting, I need to think of a way to debug since it is probably in the firmware.

    Regards,

    Shlomi

  • Hi Shlomi,

    The problem is not that some messages can be deferred from sending and even eventually lost due to a relatively high rate of incoming messages for the current air condition. The problem is that NWP stops transmitting for several seconds creating a major disruption in data communication. This is the real issue.

    Yes, I do mean other peer devices that communicate with CC3153. They can be AP, STA, P2P, various hardware – it does not matter. The devices just stop receiving messages from CC3153 for several seconds in random periods of time. Then everything gets back to normal until the next random stop in transmission.

    Yes, it is in the CC3135 firmware from what I can see that needs to be debugged. I suggest you reproduce the problem on your side and present it to TI developer team to fix the bug.

    Thanks,

    Olek

  • Hi,

    First step would be to reproduce of course.

    I will look into it and let you know.

    Regards,

    Shlomi

  • Hi,

    I managed to build a similar setup to what you have.

    I am using a raw socket as well with lwIP network stack running on the host.

    this is a platform that is running on a PC to simulate the host platform but it should be similar to what you have.

    It has been running for a long time with UDP transmissions @18Kbps on average and I could not see any drops in transmissions.

    I could see on the NWP logs that the TX pool packets do reach to minimum as you see but no drops.

    Can you clarify what servicepack version you are using?

    Also, would be good if you can fetch the NWP logs and send it. The process is described in chapter 20.1 under https://www.ti.com/lit/ug/swru455m/swru455m.pdf?ts=1638204986283&ref_url=https%253A%252F%252Fwww.google.com%252F

    Just remember the stream needs to be recorded in binary mode on the dedicated pin.

    Shlomi

  • Hi Shlomi,

    Raw socket with LwIP is ok. Using PC can be an issue since PC does not have an external SPI interface. You will need a peripheral card with SPI interface to connect to NWP. The card will have its own buffering invisible to PC.

    UDP transmissions is fine. I do not really care about your bit rate - you need to send your packets, say 50-byte long packets (just to be specific), every 10ms or less to reproduce the problem.

    The problem is in random delays in transmission, sometimes more than 3s. Drops (we are talking about packet drops, not connection drops) are the result of long delays, not a separate issue. It is possible that you see no drops due to a large transmission buffer that can accommodate all messages during the transmission delay. Do you see long random delays in transmission?

    It is good that you see that TX pool packet counter reaches its minimum value. That is the point when you should see the delay since the next packet coming in 10ms will cause the sl_Send() to lock and wait until at least one packet is transmitted and the TX pool packet counter goes above its minimum value. You just need to measure this delay.

    I am using the latest NWP service pack: NWP 4.13.0.2 MAC 3.7.0.1 PHY 3.1.0.26.

    I do not have resources to fetch NWP logs.

    Thanks,
    Olek

  • Olek,

    I am connecting via SPI from the PC and sending even more frequent than what you state as I reach high TP and TX pool drops to minimum. I still cannot see the issue you are talking about. The flow control to the host is clear and expected but there is no bottleneck to the air as the internal pools are always full with at least one packet. Also, UDP is the way to test and not TCP since TCP depends on TCP ACK and if the other side does not pull it to the applciation layer, such behavior may happen as the NWP needs to keep the packets for retransmissions.

    Without the option to fetch NWP logs and see under the hood, there is nothing much I can offer. Why is it hard to get an NWP log?

    Shlomi

  • Hi Shlomi,

    Could you please tell me at what message rate FlowContCB.TxPoolCnt value drops to FLOW_CONT_MIN in your experiment? I see it around 2ms per message. It is when I see delays in transmission due to the bottleneck; there is no more space in the NWP TxD buffer, but new messages continue to come. At this point, I see bursts of messages transmitted by the NWP, say ten or so are transmitted one after another (clearing the internal NWP buffer) and then comes 10...20ms delay when nothing is transmitted. Remember, you need to have at least one Wi-Fi access point on the same channel with the same or slightly below than NWP power to create a real-life working environment.

    After the bottleneck experiment described above, when you get your maximum transmission rate, reduce your rate to 10ms per message (message size ~50byte). You should see a smooth transmission of messages with 2-3ms jitter. This is fine and it is how it is supposed to be. However, you will also see 10...100ms delays from time to time, and you will see random long time delays sometimes for more than 3s in 5...20minutes. This is the problem you are trying to reproduce.

    Please, do not bring again TCP argument - I agreed to use UDP to recreate the issue.

    To my understanding, we are debugging your NWP, not my device. The strategy is to reproduce my issue on your setup, which is extremely easy to reproduce, and then you can collect whatever log is necessary for you to debug your NWP.

    Thanks,
    Olek

  • Olek,

    For me <2mSec would trigger it. What I did is cyclically transmitting a sequence of frames that would cause the TX pool to trigger the flow control (<2mSec delay between packets) and then increase the delay so it does not ever reach the minimum flow control (i tried various delays from 2mSec to 20mSec). I also use 50bytes frames.

    I have been running for hours with an air sniffer and an NWP log and everything looks OK so it is hard to tell why you see it. I guess you are more bothered from the large delays in seconds than from the 10..100mSec delays, right?

    I will try to think of a way stress it.

    Shlomi

  • Hi Shlomi,

    Thanks a lot for looking into this issue. Yes, my major concern is having long delays. Short delays can be handled by the internal NWP TxD buffer (it is large enough) and the short delays will always be there due to unpredictable nature of the radio channel. But there is no way the NWP will handle long delays lasting for seconds.

    What I noticed is that the long delays appear when there is radio interference, when one or two APs are also using the same channel as NWP. I have attached two graphs. The first one is the delay between received messages on a clean channel (the messages are sent with 10ms rate). The maximum delay here is ~31ms on 14th minute – absolutely fine. The second figure shows the same delay when there is another AP working on the same channel. You can see ~1.3s delay on the 2-nd minute.

    I hope you will see a similar behaviour with your setup.

    Thanks,

    Olek

    https://1drv.ms/i/s!AnPbPYflhT1RvSKAEJPJwRwsy6VI?e=Dff6gm

    https://1drv.ms/i/s!AnPbPYflhT1RvSEPmP473jzV71Wa?e=mm56pa

  • Hi Olek,

    For me I capture with an air sniffer and probe the delays. I cannot see high delays as you see even when i added another AP on the same channel and loaded the air with a parallel stream of data.

    So the question on your side is where the delays are coming from. It could be the device deferring for some reason but there is no evidence for that. Could also be that the air seems busy/congested but I do not know how busy it is since you don't have an air sniffer. Then there are two sides for the link, from station to AP and from AP to the other peer device. The APs repeat the packet (to-DS and from-DS) so it is interesting to also know if one of this sides is suffering for some reason. Eventually, could also be the peer device itself.

    Without at least an air sniffer, it would be hard to tell so I don't know what else we can do as I am not able to reproduce.

    Regards,

    Shlomi

  • Hi Shlomi,

    There are no extra AP<->Station connections in my experiment. I used two NWPs, one in AP mode and another one in Station mode for collecting data that I published here. This setup has an accurate timestamp in firmware on the Station side. Then I replaced the NWP in Station mode with PC and used my PC software to capture the data sent by the NWP in AP mode. Timestamp in PC is less accurate, but still acceptable and I still see the issue. Also, I tried different combinations with AP and station modes, a router in between, etc.

    I am not very familiar with Wi-Fi protocol. Is it possible that the AP transmits a packet, you see it with your sniffer, but the Station does not process it? Is there a flow control mechanism on the Station side that can request retransmission and retransmission would not happen on the AP side? I am speculating here trying to find a reasonable explanation why you do not see the delays with your sniffer. Remember, I do not really care about Wi-Fi messages on the air. I need IP messages to be received on the Station side without long delays.

    Thanks,

    Olek

  • Hi Olek,

    you got me confused with the last message. Is the sending Simplelink in AP role? is your final product supposed to work in AP role? my experiments were is station role since this was my understanding. Not sure if this should change too much but just to align on both setups.

    For your questions, on Wi-Fi the sending side retransmits if it does not get an acknowledgement from the other side (on Wi-Fi layer). The implementation of how much to retransmit and how the rate fallback works is not in the spec and every vendor may implement it differently. Once the Wi-Fi packet is acknowledged, it is uploaded to processing of the network stack and pending for the host to pull it. there may be up to beacon interval delay in case the station goes into power save with the AP.

    Regards,

    Shlomi

  • Hi Shlomi,

    The problem appears in any mode: Station, AP, P2P. I published data for NWP in AP mode talking to another NWP in Station mode.

    As I expected, sniffer is not applicable here due to the Wi-Fi message acknowledgement mechanism. You need to see the problem on IP level, not on Wi-Fi level.

    Please confirm that you have communication between NWP (any mode) and other Wi-Fi node on IP level with transmission rate 10ms per message without long random delays in a presence of other devices on the same channel if you insist that you cannot see the issue. By IP level, I mean you use sl_Send() to send messages and a socket receive function on the other peer Wi-Fi device to receive it.

    Thanks,

    Olek

  • Hi Olek,

    I don't see the problem in any layer, Wi-Fi or IP. 

    I do not understand your claim that the sniffer is not applicable due to the Wi-Fi message acknowledgement mechanism. what do you mean? sniffer may shed some light on what you see and explain why these drops occur. 

    Have you tried working with an external AP (not Simplelink)?

    Shlomi

  • Hi Shlomi,

    If you do not see the issue on IP level, we have unfortunately reached a dead end.

    The sniffer is useful for debugging Wi-Fi layer. If you do not see the issue on IP level, then Wi-Fi part is ok.

    Yes, sure. Read my previous posts. I tried everything. I was experimenting with NWP in Station mode connecting it to a PC through a Wi-Fi router with the same result. As I said, if you have communication between NWP (let's be specific, in AP mode) and other Wi-Fi node (in Station mode) on IP level with transmission rate 10ms per message without long random delays in a presence of other devices on the same channel, then you cannot reproduce the issue.

    Thanks,
    Olek

  • Hi Olek,

    I do not see it neither on Wi-Fi or IP.

    Please note that issues on Wi-Fi layer may lead to issues on IP layer but I agree that if Wi-Fi layer behaves OK, you can still experience issues on the IP layer. I wanted an air sniffer just to make sure that we do not experience any issues on Wi-Fi layer.

    Yes, seems I cannot reproduce it unfortunately.

    Have you followed exactly all the steps and also disabled the internal filters and the network applications?

    Shlomi

  • Hi Shlomi,

    I followed exactly all steps from the User's guide (swru455m.pdf). Internal filters and network applications are disabled.

    The only difference from the User's Guide is that I first check which network applications are active using
    sl_NetAppGet (SL_NETAPP_STATUS, SL_NETAPP_STATUS_ACTIVE_APP, &pOptionLen, (_u8 *)&AppBitMap);
    And then disable them using
    if(AppBitMap) sl_NetAppStop(AppBitMap);

    I do not think it makes any difference.

    Thanks,
    Olek

  • Thanks, you are right. It makes no difference.

    I just wanted to make sure the internal net application are not busy.