This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

WL1835MOD: UDP broadcast messages drop for up to 60 seconds

Part Number: WL1835MOD
Other Parts Discussed in Thread: WL1835
Our engineering team is developing a new product. We noticed that a periodic UDP broadcast sporadically stops being transmitted over WiFi for up to a minute. We're using the WiLink 8 in our design.
I'd like to give a little overview of how our product works and what it does related to the issue that we found:
Our board design is based on the Beaglebone Black Wireless. We're running Linux on an Octavo OSD3558. The TI WL1835 is connected to the Octavo chip. A wireless access point is running on the device through hostapd.
Another option to connect to the device is USB. We simulate ethernet over USB with  the Remote Network Driver Interface Specification (RNDIS).
Both the USB and WiFi network interfaces are connected to a bridge interface. A DHCP server is running on this bridge interface to provide clients with an IP address on the network.
On the Linux system there is an application running that provides a TCP server for users to connect to and interact with the device. The same application also broadcasts a UDP message in 1 second intervals that contains some device information.
Now to our issue: if the user (a PC) is connected to the device through WiFi, the UDP message is sporadically not received on the PC. Sometimes all UDP broadcast messages get dropped for up to 60 seconds.
I confirmed that the message is created and sent by our application and also confirmed that the message is sent out by the network interface using Wireshark. I am pretty certain that the issue is related to the WiLink 8 firmware, more to this later.
The issue does not occur if the device is connected to a PC through USB. Tests have confirmed this.
The issue occurs more often if there are frequent disconnects/connects to the device's WiFi access point. I wrote a Python script that reconnects every 30 seconds and listens to the UDP message. I am able to reproduce the issue pretty consistently.
Here is why I think the issue is related to the WiLink 8 firmware: the behaviour is different depending on the firmware version.
I initially started testing with Rev 8.9.0.0.69. Whenever the UDP messages started to get dropped, I tried to send characters over SSH to see if it's a general connection issue or if the issue is solely with the UDP broadcast. As soon as I did this, the wilink driver's software watchdog would trigger and restart the device. After hardware recovery, I'd receive all the previously missing UDP messages at once. Here is the error log:
[ 2508.883177] wlcore: ERROR SW watchdog interrupt received! starting recovery.
[ 2508.890336] ------------[ cut here ]------------
[ 2508.895295] WARNING: CPU: 0 PID: 138 at drivers/net/wireless/ti/wlcore/main.c:796 wl12xx_queue_recovery_work+0x80/0x84 [wlcore]
[ 2508.906893] Modules linked in: ccm arc4 wlcore_sdio wl18xx wlcore mac80211 cfg80211 nls_ascii nls_cp437
[ 2508.916802] CPU: 0 PID: 138 Comm: irq/46-wl18xx Tainted: G        W       4.9.56-AML #1
[ 2508.924856] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 2508.931033] [<c01111a4>] (unwind_backtrace) from [<c010d7dc>] (show_stack+0x20/0x24)
[ 2508.938839] [<c010d7dc>] (show_stack) from [<c04bc9b8>] (dump_stack+0x24/0x28)
[ 2508.946121] [<c04bc9b8>] (dump_stack) from [<c0135614>] (__warn+0xec/0x110)
[ 2508.953139] [<c0135614>] (__warn) from [<c0135750>] (warn_slowpath_null+0x30/0x38)
[ 2508.960900] [<c0135750>] (warn_slowpath_null) from [<bf18630c>] (wl12xx_queue_recovery_work+0x80/0x84 [wlcore])
[ 2508.971304] [<bf18630c>] (wl12xx_queue_recovery_work [wlcore]) from [<bf186860>] (wlcore_irq+0x1f4/0x238 [wlcore])
[ 2508.981824] [<bf186860>] (wlcore_irq [wlcore]) from [<c0180d04>] (irq_thread_fn+0x2c/0x64)
[ 2508.990165] [<c0180d04>] (irq_thread_fn) from [<c0180fbc>] (irq_thread+0x14c/0x20c)
[ 2508.997886] [<c0180fbc>] (irq_thread) from [<c01574e0>] (kthread+0x120/0x128)
[ 2509.005077] [<c01574e0>] (kthread) from [<c0109218>] (ret_from_fork+0x14/0x3c)
[ 2509.012347] ---[ end trace 49f3e48f119ca9fd ]---
[ 2509.018638] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.69
2725 [ 2509.026791] wlcore: pc: 0x109650, hint_sts: 0x00000000 count: 3
[ 2509.033557] wlcore: down
[ 2509.037593] ieee80211 phy0: Hardware restart was requested
[ 2509.469982] wlcore: PHY firmware version: Rev 8.2.0.0.236
[ 2509.518193] wlcore: firmware booted (Rev 8.9.0.0.69)
I noticed that there's a new firmware version available online that's supposed to fix frequent SW watchdog interrupts. I updated to Rev 8.9.0.0.79.
The new firmware version did not fix the issue with the UDP broadcast, but the behaviour now changed. The periodic UDP messages still drop out, but sending data over SSH does not trigger the SW watchdog anymore. Also, the missed UDP messages now are just lost and don't get all sent at once after recovery. SSH continues to work when there's no UDP broadcast being sent.
I performed another test where I sent the same UDP message to the PC's IP address instead of broadcasting it. This worked fine and the stream never dropped.
Is this a known issue to TI? Are there any known fixes?
Please let me know if more details are required.
  • Hi, 

    Thanks for sending the information!! In the details you mentioned about updating the FW. Did you also update the driver? Pl let me know. 

    If i understand the statement correctly, you are using the WL8 device as a AP that sends broadcast UDP messages once every second to the connected clients? Also please let us know how many clients are connected. If there are more then one does all the clients loose the packets? 

    Regards, 

    Sudharshan K N

  • Hi Sudharshan,

    No, I did not update the driver. I am trying to find out what driver version I have, but as far as I can see there's no way unless I built it with the TI provided build scripts?

    I'm using the in-tree driver from the 4.9.56 Linux kernel.

    I'm not sure how I'm supposed to update my driver version, as the build scripts don't seem to support kernel version 4.9.

    Your summary of my issue is correct. There's only one client connected to the device. I haven't done any tests with multiple connected clients, but I can if necessary.

    Thank you,

    Andi

  • Hi, 

    let me check if there is any driver update needed for this 4.9.56. Can you please let us know how the traffic is generated to see the behavior on our setups? 

    Regards, 

    Sudharshan K N

  • Hi Sudharsan,

    Our application sends a udp broadcast to 255.255.255.255 through a socket once a second.

    The message payload looks similar to this:
    {"instrument":{"wifi":{"ip":"172.18.127.2","ssid":"AML_BEAG04"},"location":{"lon":0.00000,"lat":0.00000,"sat":6,"hdop":2.23,"tlf":0},"time":"12:17:39","date":"2020-03-09","bat":{"volt":8.117695,"charging":0},"status":0,"water":0},"sensor":[{"param":[{"name":"Pres","unit":"dbar","value":0.008084}]}]}\n

    Do you need some example code?

    Andi

  • Hi, 

    Let me check if i can get the same traffic to work with the iPerf commands. We prefer to use iPerf as this is easier at our end. Let me get back to you on my testing. 

    Regards, 

    Sudharshan K N

  • I also wrote a Python script to reproduce the issue. It performs a reconnect to the WiFi AP every 30 seconds, listens for the UDP message, and notifies the user if the UDP message stops being broadcasted.

    I can provide the script to you if you'd like to use it. I'm running it on Ubuntu using nmcli for the WiFi reconnects, so it might not work properly on a Windows machine.

  • Hi, 

    If the client is already connected then is there an option to use multicast message instead of broadcast? A bridge address can be setup for both devices and messages can be routed to the bridge address. Let me know your thoughts

    Regards, 

    Sudharshan K N

  • Hi Sudharshan,

    I changed the broadcast to a multicast today and did some testing.

    The exact same issue exists when I use multicast instead of broadcast with the WiLink8. The messages can drop for up to 60 seconds.

    Have you been able to reproduce the issue on your end?

    Andreas

  • Hi Andres, 

    I did the testing for AP-STA mode using multicast. But my testing was for few minutes. I am using iPerf to do the tests. Below are the details of the testing 

    route add -host 239.255.1.3 <AP interface> 

    route add -host 239.255.1.3 wlan1 <-- for my example on AP side

    route add -host 239.255.1.3 wlan0 <-- on station side 

    iperf -s -B 239.255.1.3 -u -f m -i 5 & --> on AP side

    iperf -c 239.255.1.3 -u -b 10M -f m -i 5 -t 30 -S 0x10  & --> on station side

    Here the data is sent every 5 seconds.  I can increase this to 60s and check. The traffic generated is UDP. 

    Can you please let me know how long does it take the issue to reproduce? I can run a longer test and check if i see the issue. 

    Regards, 

    Sudharshan K N

  • Hi Sudharshan,

    The issue occurs mainly if the client is frequently dis- and reconnected to the station.

    I can try reproducing the issue on my end using the iperf commands in your last post.

    Right now I use a Python script to automate the WiFi reconnects. I could use a bash script instead. Do you have nmcli available on your machines that you do the testing on?

    Andreas

  • I was able to reproduce the same issue with iperf. The issue can be reproduced within 10 minutes of running scripts.

    This is one example log where the packages got lost:

    ------------------------------------------------------------
    Server listening on UDP port 1031
    Binding to local address 239.255.1.3
    Joining multicast group 239.255.1.3
    Receiving 1470 byte datagrams
    UDP buffer size: 208 KByte (default)
    ------------------------------------------------------------
    [ 3] local 239.255.1.3 port 1031 connected with 172.18.127.2 port 44665
    [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
    [ 3] 0.0- 1.0 sec 1.44 KBytes 11.8 Kbits/sec 0.000 ms 215/ 216 (1e+02%)
    [ 3] 1.0- 2.0 sec 1.44 KBytes 11.8 Kbits/sec 0.107 ms 0/ 1 (0%)
    [ 3] 2.0- 3.0 sec 1.44 KBytes 11.8 Kbits/sec 0.117 ms 0/ 1 (0%)
    [ 3] 3.0- 4.0 sec 1.44 KBytes 11.8 Kbits/sec 0.134 ms 0/ 1 (0%)
    [ 3] 4.0- 5.0 sec 1.44 KBytes 11.8 Kbits/sec 0.188 ms 0/ 1 (0%)
    [ 3] 5.0- 6.0 sec 1.44 KBytes 11.8 Kbits/sec 0.178 ms 0/ 1 (0%)
    [ 3] 6.0- 7.0 sec 1.44 KBytes 11.8 Kbits/sec 0.248 ms 0/ 1 (0%)
    [ 3] 7.0- 8.0 sec 1.44 KBytes 11.8 Kbits/sec 0.239 ms 0/ 1 (0%)
    [ 3] 8.0- 9.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 9.0-10.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 10.0-11.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 11.0-12.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 12.0-13.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 13.0-14.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 14.0-15.0 sec 1.44 KBytes 11.8 Kbits/sec 0.257 ms 6/ 7 (86%)
    [ 3] 15.0-16.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 16.0-17.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 17.0-18.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 18.0-19.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 19.0-20.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 20.0-21.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 21.0-22.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 22.0-23.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 23.0-24.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 24.0-25.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 25.0-26.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 26.0-27.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)
    [ 3] 27.0-28.0 sec 0.00 KBytes 0.00 Kbits/sec 0.000 ms 0/ 0 (0%)

    To reflect my application better I made some minor adjustments to the iperf commands.

    This is the bash script that I'm running on my PC. It reconnects to the WiFi AP every 30 seconds and starts the iperf server that will receive the UDP multicast packets.

    !/bin/bash
    
    WIFI_AP="AML_BEAG04"
    IPERF_CMD="iperf -s -B 239.255.1.3 -u -f k -i 1 -p 1031"
    WLAN_IF="wlp2s0"
    
    
    while true
    do
            nmcli con up ${WIFI_AP}
            sleep 1
            route add -host 239.255.1.3 $WLAN_IF
            $IPERF_CMD &
            sleep 30
            kill -SIGKILL $(ps -A | grep iperf | grep -v "grep" | cut -d " " -f1)
            sleep 1
            nmcli con down ${WIFI_AP}
            sleep 1
    done
    

    These are the commands that I'm running on my device with the WiLink 8:

    route add -host 239.255.1.3 wlan0
    iperf -c 239.255.1.3 -u -b 1K -f m -i 1 -t 600 -p 1031 -f k

    I start the bash script first and wait until I'm connected to WiFi and iperf is running. Then I start the iperf client.

    The iperf client will be running continuosly and the iperf server will be killed every 30 seconds before the WiFi reconnect.
    You'll see packets lost reported by the server on every iperf restart because the client is running continuosly.

    I was able to reproduce the issue with these commands/scripts within 5 minutes.

    Andreas

  • Hi, 

    I am using 2 WiLink8 setups for testing. One is used in AP mode and the other in station.The traffic is initiated using iPerf. 

    Regards, 

    Sudharshan K N

  • Hi Sudharshan,

    Did you see my latest post? I included detailed instructions on how to reproduce the issue.

    Have you been able to reproduce the problem yet using my explanation?

    Thank you,

    Andreas

  • Hi, 

    Currently you are using the FW Rev 8.9.0.0.69. Can you please try with the latest release R8.7_SP3. and check if the problem is still existing? 

    Also is there a reason for the disconnects? 

    Regards, 

    Sudharshan K N

  • Hi,

    I am using FW Rev 8.9.0.0.79, which is the latest version.

    I am unable to update the driver to R8.7_SP3 because the TI build scripts don't support Linux kernel 4.9.56, as I mentioned earlier. As far as I understand TI recommends the in-tree driver for this kernel version.

    As I mentioned from the beginning, the issue mainly occurs and is easier to reproduce on frequent WiFi station reconnects. This is the reason for the disconnects.

    Are you still working on reproducing the issue?

    Regards,

    Andreas

  • Hi, 

    Sorry I missed the previous reply. I am looking into it and will get back to you. 

    Regards, 

    Sudharshan K N

  • Hi, 

    I am able to reproduce the issue and ifconfig wlan1 down/up alone is not enough to bring up the AP interface back. The required commands to bring up the AP back are contained in ap_start.sh and ap_stop.sh scripts. These scripts are located at /usr/share/wl18xx/ folder within the AM335 SDK installed folders. I have attached the contents of these files below. 

    after ifconfig wlan1 down run the ap_start.sh commands to enable AP again. Then you can add the route and start the iPerf server again. You should be able to see the broadcast messages. 

    Ap_start.sh 

    root@am335x-evm:/usr/share/wl18xx# cat ap_start.sh
    #!/bin/sh

    ########## variables ##########

    WLAN=wlan1
    WLAN2=wlan2
    HOSTAPD_PROC=/var/run/hostapd
    HOSTAPD_CONF=/usr/share/wl18xx/hostapd.conf
    #HOSTAPD_BIN_DIR=/usr/local/bin
    HOSTAPD_BIN_DIR=/usr/sbin
    IP_ADDR=192.168.43.1
    IP_ADDR2=192.168.53.1
    DHCP_CONF=udhcpd.conf
    DHCP_CONF2=udhcpd2.conf
    DHCP_CONF_PROC=u[d]hcpd.conf
    DHCP_CONF_PROC2=u[d]hcpd2.conf

    ########## body ##########

    ### check for configuration file
    if [ ! -f $HOSTAPD_CONF ]; then
    if [ ! -f /etc/hostapd.conf ]
    then
    echo "error - no default hostapd.conf file"
    exit 1
    fi
    cp /etc/hostapd.conf $HOSTAPD_CONF
    chmod 777 $HOSTAPD_CONF
    fi

    ### configure ip forewarding
    echo 1 > /proc/sys/net/ipv4/ip_forward

    ### add WLAN interface, if not present
    if [ ! -d /sys/class/net/$WLAN ]
    then
    echo "adding $WLAN interface"
    iw phy `ls /sys/class/ieee80211/` interface add $WLAN type managed
    fi

    ### start a hostapd interface, if not present
    if [ ! -r $HOSTAPD_PROC ]
    then
    $HOSTAPD_BIN_DIR/hostapd $HOSTAPD_CONF &
    sleep 1
    fi

    ### configure ip
    ifconfig $WLAN $IP_ADDR netmask 255.255.255.0 up
    if [ -d /sys/class/net/$WLAN2 ]
    then
    ifconfig $WLAN2 $IP_ADDR2 netmask 255.255.255.0 up
    fi

    ### start udhcpd server, if not started
    output=`ps | grep /usr/share/wl18xx\$DHCP_CONF_PROC`
    set -- $output
    echo $output
    if [ -z "$output" ]; then
    udhcpd $DHCP_CONF
    fi

    if [ -d /sys/class/net/$WLAN2 ]
    then
    output=`ps | grep /usr/share/wl18xx\$DHCP_CONF_PROC2`
    set -- $output
    echo $output
    if [ -z "$output" ]; then
    udhcpd $DHCP_CONF2
    fi
    fi

    ### configure nat
    iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

    root@am335x-evm:/usr/share/wl18xx#

  • Hi,

    So your suggestion is that I restart the WiFi AP using the provided script whenever the UDP dropout occurs?

    How can the device with the WiLink8 determine if the WiLink8 is still sending out multicast messages? Does the driver have support for such a feature?

    It is possible that an end user is connected to our device who does not have SSH access and can't power cycle the system. What is the recommended solution for this case?

    Is there a fix for this issue planned in the firmware or driver and is there an ETA for a firmware fix?

    Thank you.

  • Hi, 

    the original logs posted in this thread indicates that there was a FW recovery 

    [ 2509.018638] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.69
    2725 [ 2509.026791] wlcore: pc: 0x109650, hint_sts: 0x00000000 count: 3
    [ 2509.033557] wlcore: down
    [ 2509.037593] ieee80211 phy0: Hardware restart was requested
    [ 2509.469982] wlcore: PHY firmware version: Rev 8.2.0.0.236
    [ 2509.518193] wlcore: firmware booted (Rev 8.9.0.0.69)</span>
    as mentioned before please update the drivers to the latest R8.7_SP3 version (FW - v8.9.0.0.79) and let us know if the issue is still existing. I am not able to reproduce the issue at our end without forcing a shutdown of the interface and restarting the same to recreate the UDP packet loss issue. 
    Regards, 
    Sudharshan K N
  • Hi,

    Oh, there must have been a misunderstanding then. I was under the impressioned that you were able to reproduce the issue, as stated in your last post.

    The original logs only apply to FW rev 8.9.0.0.69 and not to 8.9.0.0.79.

    Can you advise please on how I can compile the latest driver version with my version of the Linux kernel? Is there an update to the TI build tools?

  • Hi, 

    if you are using AM335x you can download the latest SDK from TI (http://software-dl.ti.com/processor-sdk-linux/esd/AM335X/latest/index_FDS.html) and use the in-tree drivers. They are updated to the latest R8.7_SP3 

    Regards, 

    Sudharshan K N

  • Hi,

    I'm unable to use the SDK. We're building our custom Linux image using buildroot.

    Is there an alternative?

  • Hi, 

    You can try the steps mentioned the build scripts here https://processors.wiki.ti.com/index.php/WL18xx_System_Build_Scripts

    Also please let us know the kernel version for your builds 

    Regards, 

    Sudharshan K N

  • Hi,

    I've tried using the build scripts, but had issues.

    I did a quick search in the forum and came across some posts from TI employees that stated they don't support my kernel version.

    I am using kernel 4.9.56.

  • Hi, 

    Sorry for the delayed response. 

    For Kernel 4.9 there is no need to use the backports. Can you please try the build by making the builds for individual components?

    1. ./sudo_build_wl18xx.sh openssl
    2. ./sudo_build_wl18xx.sh libnl
    3. ./sudo_build_wl18xx.sh iw
    4. ./sudo_build_wl18xx.sh wpa_supplicant
    5. ./sudo_build_wl18xx.sh hostapd
    6. ./sudo_build_wl18xx.sh firmware à install the FW
    7. ./sudo_build_wl18xx.sh crda

    Regards, 

    Sudharshan K N

  • Hi Sudharshan,

    We replaced the UDP broadcast with a TCP subscription service to work around the issue.

    Thank you for your help on this topic.

    Andreas