Other Parts Discussed in Thread: WL1835
Provide case details or comments: Our devices are setting up a wireless mesh with a wl1835 connected through sdio with an i.mx6 processor.
Our platform is a linux 4.9 based platform. The driver/net/wireless/ti directory seems to be in sync with the version in the linux stable 4.9 tag.
Firmware versions used for all logs attached are:
[ 171.787880] wlcore: PHY firmware version: Rev 8.2.0.0.240
[ 171.851427] wlcore: firmware booted (Rev 8.9.0.0.76)
However, issue seems to also be reproduceable with the latest firmware version taken from:
git.ti.com/.../
A text dump of the firmware configuration file (wlconf -i /lib/firmware/ti-connectivity/wl18xx-conf.bin -g > ./wlconf.txt) in attachment as wlconf.txt. It was configured using the configure_devices.sh script provided by TI. The device has 2 antenna mounted.
We setup the mesh using wpa supplicant, the TI R8.8 version as built from the upstream_29_rebase branch from git.ti.com/.../
Find the wpa-supplicant conf file in attachment as wpa_supplicant_mesh.conf. The network is a mesh without SAE enabled on channel 6.
We configure one node as a mesh gateway with a dhcp server running and NAT rules to an ethernet interface, the rest of the nodes run a dhcp client and have no ethernet interface. DHCP is handled by a systemd-networkd config
The mesh gateway is set to enable root mode and gate anouncements.
> iw dev st_wlan0 set mesh_param mesh_hwmp_rootmode 4
> iw dev st_wlan0 set mesh_param mesh_gate_announcements 1
We also disable power save on all devices and set rts on for all packets:
> iw phy `ls /sys/class/ieee80211/` set rts 0
> iw dev st_wlan0 set power_save off
This works fine until we bring up alot of nodes that are relatively close... i.e. close enough for the 10 peerlink limit to get reached by one or more nodes.
In that situation the wl1835 firmware seems to get stuck sometimes, where stuck is defined as cat /sys/kernel/debug/ieee80211/phy/wlcore/tx_queue_len continously going up reaching multiple 100s of queued messages within a minute or so.
The device does not seem to have entered ELP mode, i.e. /sys/kernel/debug/ieee80211/phy/wlcore/sleep_auth always indicates 0x0.
The situation can be recovered from manually by triggering the wl1835 recovery using:
> cat 0x1 > /sys/kernel/debug/ieee80211/phy/wlcore/sleep_auth
The recovery does not kick in automatically though.
The situation becomes very reproducible by just rebooting the gateway or restarting it's supplicant.
In one of those reproductions I increased the dynamic debug level of the wlcore driver, the kernel log during such a reproduction is attached as wlcore_part.xt. The debug_level change was applied just before bringing up the supplicant, the following settings were used:
> echo -n 'module wlcore +p' > /sys/kernel/debug/dynamic_debug/control
> echo -n 'module wl18xx +p' > /sys/kernel/debug/dynamic_debug/control
> echo -n 'module mac80211 +p' > /sys/kernel/debug/dynamic_debug/control
> echo -n 'module cfg80211 +p' > /sys/kernel/debug/dynamic_debug/control
> echo 0x1840 > /sys/module/wlcore/parameters/debug_level
> echo 8 > /proc/sys/kernel/printk
In one of those reproductions I enabled a monitor interface above the radio and did a packet capture using tcpdump (started just before bringing up the supplicant); the capture is attached as wireless.cap.
> iw phy phy add interface mon0 type monitor
> ifconfig mon0 up
> tcpdump -i mon0 -n -w /data/wireless.capdata.tar.gz