This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TIDC-WL1837MODCOM8I: Mesh + Station mode causes wlcore resets or poor ping responses

Part Number: TIDC-WL1837MODCOM8I
Other Parts Discussed in Thread: WL1837MOD, AM4372

Hello,

I am evaluating the feasibility of using WL1837 based solution for our requirements.

Typical use case is to extend the range of AP using Mesh + Station mode (as in section 5.3 of Wlink8 WLAN Software document SWAA166).

Configurations Tested

AM437x GP EVM with WL1837MOD

Beagleboard Black Wireless

Beagleboard Green Wireless

Processor SDK 6.0.0.7 for AM437x

3 different images viz

Pre-built images as in the SDK with re-configured kernel using top level makefile

Arago base image using TI SDK / Yocto

Custom distribution using Yocto

Steps to reproduce

Power up the EVM

Connect to an AP using sta_start.sh with AP information in wpa_supplicant.conf

After verifying internet connectivity, create mesh and join it using

/usr/share/wl18xx/mesh_start.sh

/usr/share/wl18xx/mesh_join.sh

Once the mesh connection is successful, ensure IP tables are updated.

Start a ping test to an external IP with a minimum count of 50.

Observations

(Similar to question)

e2e.ti.com/.../691084

Under office environment, observed frequent wlcore resets.

--- www.ti.com ping statistics ---
50 packets transmitted, 24 packets received, 52% packet loss
round-trip min/avg/max = 284.033/377.893/924.620 ms
BBGW>[  323.689308] wlcore: Scan completed due to error.
[  323.694010] ------------[ cut here ]------------
[  323.702040] WARNING: CPU: 0 PID: 7 at drivers/net/wireless/ti/wlcore/main.c:808 wl12xx_queue_recovery_work+0x68/0x6c [wlcore]
[  323.715049] Modules linked in: xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo ctr aes_arm_bs crypto_simd cryptd ccm usb_f_acm u_serial usb_f_ecm g_multi usb_f_mass_storage usb_f_rndis u_ether libcomposite arc4 wl18xx wlcore mac80211 pru_rproc irq_pruss_intc sha256_generic sha256_arm pruss cfg80211 musb_dsps musb_hdrc udc_core phy_am335x phy_am335x_control phy_generic pm33xx wkup_m3_rproc wkup_m3_ipc remoteproc omap_aes_driver crypto_engine omap_crypto omap_sham pruss_soc_bus ti_emif_sram wlcore_sdio at24 rtc_omap musb_am335x omap_wdt sch_fq_codel uio_module_drv(O) uio ftdi_sio usbserial usbcore usb_common cryptodev(O)
[  323.776364] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G        W  O      4.19.38-g4dae378bbe #1
[  323.786800] Hardware name: Generic AM33XX (Flattened Device Tree)
[  323.794945] Workqueue: phy0 wl1271_scan_complete_work [wlcore]
[  323.802309] Backtrace: 
[  323.805039] [<c010cb64>] (dump_backtrace) from [<c010ced4>] (show_stack+0x18/0x1c)
[  323.813696]  r7:00000009 r6:00000000 r5:bf2e8180 r4:00000000
[  323.819977] [<c010cebc>] (show_stack) from [<c08a9954>] (dump_stack+0x24/0x28)
[  323.827338] [<c08a9930>] (dump_stack) from [<c012aa60>] (__warn+0xe0/0xf8)
[  323.835108] [<c012a980>] (__warn) from [<c012aac0>] (warn_slowpath_null+0x48/0x50)
[  323.843382]  r9:d8eaabd0 r8:d8c18d20 r7:d8c18d58 r6:bf2d269c r5:00000328 r4:bf2e8180
[  323.851940] [<c012aa78>] (warn_slowpath_null) from [<bf2d269c>] (wl12xx_queue_recovery_work+0x68/0x6c [wlcore])
[  323.862812]  r6:00000000 r5:c0d03048 r4:d8c18d20
[  323.867702] [<bf2d2634>] (wl12xx_queue_recovery_work [wlcore]) from [<bf2e550c>] (wl1271_scan_complete_work+0x104/0x18c [wlcore])
[  323.880867]  r5:c0d03048 r4:d8c19028
[  323.884738] [<bf2e5408>] (wl1271_scan_complete_work [wlcore]) from [<c0142480>] (process_one_work+0x210/0x430)
[  323.895965]  r9:00000001 r8:dc005000 r7:00000000 r6:db47f200 r5:dc03d200 r4:d8c19028
[  323.904414] [<c0142270>] (process_one_work) from [<c0142720>] (worker_thread+0x80/0x674)
[  323.913219]  r10:dc005000 r9:dc005014 r8:ffffe000 r7:c0d14a20 r6:dc03d214 r5:dc005000
[  323.921509]  r4:dc03d200
[  323.924073] [<c01426a0>] (worker_thread) from [<c0148798>] (kthread+0x158/0x160)
[  323.932117]  r10:dc061e70 r9:c01426a0 r8:dc03d200 r7:dc072000 r6:00000000 r5:dc02ea00
[  323.940315]  r4:dc02eb40
[  323.942871] [<c0148640>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[  323.950939] Exception stack(0xdc073fb0 to 0xdc073ff8)
[  323.956022] 3fa0:                                     00000000 00000000 00000000 00000000
[  323.964928] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  323.973570] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  323.980561]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0148640
[  323.988739]  r4:dc02ea00
[  323.992016] ---[ end trace 78f7729a1924f8bb ]---
[  323.996796] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.99.79
[  324.005059] wlcore: pc: 0x0, hint_sts: 0x00000040 count: 2
[  324.011452] wlcore: down
[  324.014032] wlcore: down
[  324.016809] ieee80211 phy0: Hardware restart was requested
[  324.506202] wlcore: PHY firmware version: Rev 8.2.0.0.242
[  324.549866] wlcore: firmware booted (Rev 8.9.0.99.79)
[  324.608028] wlcore: Association completed.

BBGW>

Under home environment, could not observe wlcore resets, but ping responses are poor.

--- www.ti.com ping statistics ---
100 packets transmitted, 78 packets received, 22% packet loss
round-trip min/avg/max = 6.032/883.139/7027.528 ms

My specific question is,

What are the changes needed to get a stable Mesh + Station mode ?

Looking forward for your help,

Thanks

  • Hi , 

    I believe we set up similar test setup at our end and did not see this issue . Can you confirm if you have applied all kernel patches that we had suggested ? 

    Thanks

    Saurabh

  • Hello Saurabh,

    Thank you for your reply.

    Yes. I confirm that I have applied all of the kernel patches suggested. Would be happy to share the consolidated patches for a review.

    There were some differences on the test setup at your end (like pinging internet address versus pinging mesh nodes, pinging for larger duration, disabling ethernet etc).

    Looking forward for your suggestions,

    Thanks

  • Hi ,

    I don't think pinging local node or an internet address should a difference. Did you try pinging a local node and see if this issue is reproducible ? Also , can you confirm both station and mesh roles are on the same channel ?

    Thanks

    Saurabh

  • Hello,

    Please find more observations,

    1. Unable to use a common frequency / channel among BBGW / BBBW & AM437x GP EVM with our office network.
    AM437x GP EVM connects with higher channel number. When BBBW tries to join that mesh, it reports errors as below

    root@beaglebone:/usr/share/wl18xx# ./mesh_join.sh demomesh 5240
    netid=0
    =========================
    OK
    ...
    OK
    root@beaglebone:/usr/share/wl18xx# mesh0: CTRL-EVENT-REGDOM-CHANGE init=BEACON_HINT type=UNKNOWN
    Could not select hw_mode and channel. (-3)
    mesh0: interface state UNINITIALIZED->DISABLED
    AP-DISABLED 
    mesh0: Unable to setup interface.
    Failed to initialize hostapd interface for mesh
    mesh0: Failed to init mesh
    mesh0: Could not join mesh

    I understand that BBBW may not support 5GHz and the error is expected.

    Question: As per the TI document, multi channel seems to be supported for Mesh + Station mode with the cost of latency.

    Could it cause Wlcore resets?

    Confirm that ping among the mesh nodes are working fine.

    Able to run iperf between three nodes continuously for 16 hours without any issues under a different access point.
    Need to ignore some initial resets, which became stable for the entire 16 hours.

    [  6] 56860.0-56861.0 sec  1.58 MBytes  13.3 Mbits/sec
    [  6] 56861.0-56862.0 sec  1.73 MBytes  14.5 Mbits/sec
    [  6] 56862.0-56863.0 sec  1.85 MBytes  15.5 Mbits/sec
    ^Z[1]+  Stopped                    iperf -s -i1 -w256K
    

    After stopping the iperf tests, Wlcore reset is observed for the ping command tried with an external address.
    Complete log is produced here.

    [  6] 56860.0-56861.0 sec  1.58 MBytes  13.3 Mbits/sec
    [  6] 56861.0-56862.0 sec  1.73 MBytes  14.5 Mbits/sec
    [  6] 56862.0-56863.0 sec  1.85 MBytes  15.5 Mbits/sec
    ^Z[1]+  Stopped                    iperf -s -i1 -w256K
    root@beaglebone:/usr/share/wl18xx# iw mesh0 station dump
    Station d0:b5:c2:c1:fa:d0 (on mesh0)
    inactive time:680 ms
    rx bytes:88963001264
    rx packets:57706390
    tx bytes:2606044313
    tx packets:26594855
    tx retries:0
    tx failed:439
    rx drop misc:168
    signal:  0 dBm
    signal avg:-17 dBm
    Toffset:55845013907 us
    tx bitrate:72.2 MBit/s MCS 7 short GI
    rx bitrate:65.0 MBit/s MCS 7
    expected throughput:114.257Mbps
    mesh llid:0
    mesh plid:0
    mesh plink:ESTAB
    mesh local PS mode:ACTIVE
    mesh peer PS mode:ACTIVE
    mesh non-peer PS mode:ACTIVE
    authorized:yes
    authenticated:yes
    associated:yes
    preamble:long
    WMM/WME:yes
    MFP:no
    TDLS peer:no
    DTIM period:2
    beacon interval:1000
    connected time:58550 seconds
    Station 74:e1:82:2c:3d:df (on mesh0)
    inactive time:920 ms
    rx bytes:85486880615
    rx packets:55445944
    tx bytes:2617831364
    tx packets:26715386
    tx retries:0
    tx failed:764
    rx drop misc:38
    signal:  0 dBm
    signal avg:-36 dBm
    Toffset:55299462998 us
    tx bitrate:72.2 MBit/s MCS 7 short GI
    rx bitrate:65.0 MBit/s MCS 7
    expected throughput:130.859Mbps
    mesh llid:0
    mesh plid:0
    mesh plink:ESTAB
    mesh local PS mode:ACTIVE
    mesh peer PS mode:ACTIVE
    mesh non-peer PS mode:ACTIVE
    authorized:yes
    authenticated:yes
    associated:yes
    preamble:long
    WMM/WME:yes
    MFP:no
    TDLS peer:no
    DTIM period:2
    beacon interval:1000
    connected time:57672 seconds
    root@beaglebone:/usr/share/wl18xx# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
        link/sit 0.0.0.0 brd 0.0.0.0
    3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 38:d2:69:e0:6a:c3 brd ff:ff:ff:ff:ff:ff
        inet 10.1.165.48/24 brd 10.1.165.255 scope global dynamic wlan0
           valid_lft 2146sec preferred_lft 2146sec
        inet6 fe80::3ad2:69ff:fee0:6ac3/64 scope link 
           valid_lft forever preferred_lft forever
    5: mesh0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 3a:d2:69:e0:6a:c3 brd ff:ff:ff:ff:ff:ff
        inet 10.20.30.42/24 brd 10.20.30.255 scope global mesh0
           valid_lft forever preferred_lft forever
        inet6 fe80::38d2:69ff:fee0:6ac3/64 scope link 
           valid_lft forever preferred_lft forever
    root@beaglebone:/usr/share/wl18xx# ping 8.8.8.8 -c100
    PING 8.8.8.8 (8.8.8.8): 56 data bytes
    64 bytes from 8.8.8.8: seq=0 ttl=53 time=18.470 ms
    64 bytes from 8.8.8.8: seq=1 ttl=53 time=27.892 ms
    64 bytes from 8.8.8.8: seq=2 ttl=53 time=18.821 ms
    64 bytes from 8.8.8.8: seq=3 ttl=53 time=22.738 ms
    64 bytes from 8.8.8.8: seq=4 ttl=53 time=17.648 ms
    64 bytes from 8.8.8.8: seq=5 ttl=53 time=20.607 ms
    64 bytes from 8.8.8.8: seq=6 ttl=53 time=20.118 ms
    [63076.797087] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
    [63076.803958] ------------[ cut here ]------------
    [63076.817722] WARNING: CPU: 0 PID: 1204 at drivers/net/wireless/ti/wlcore/main.c:808 wl12xx_queue_recovery_work+0x68/0x6c [wlcore]
    [63076.833363] Modules linked in: iptable_filter xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables x_tables ctr aes_arm_bs crypto_simd cryptd ccm arc4 wl18xx wlcore mac80211 pru_rproc irq_pruss_intc sha256_generic sha256_arm cfg80211 pruss musb_dsps musb_hdrc udc_core phy_am335x usbcore phy_generic usb_common phy_am335x_control bluetooth ecdh_generic snd_soc_simple_card snd_soc_simple_card_utils pm33xx wkup_m3_rproc wkup_m3_ipc remoteproc omap_aes_driver crypto_engine omap_crypto omap_sham pruss_soc_bus ti_emif_sram wlcore_sdio at24 musb_am335x rtc_omap omap_wdt sch_fq_codel
    [63076.896994] CPU: 0 PID: 1204 Comm: kworker/u2:2 Tainted: G        W         4.19.38-gc17c376661 #1
    [63076.906011] Hardware name: Generic AM33XX (Flattened Device Tree)
    [63076.917795] Workqueue: phy0 wl12xx_tx_watchdog_work [wlcore]
    [63076.923513] Backtrace: 
    [63076.925992] [<c010cb80>] (dump_backtrace) from [<c010cef0>] (show_stack+0x18/0x1c)
    [63076.936853]  r7:00000009 r6:00000000 r5:bf30f180 r4:00000000
    [63076.942584] [<c010ced8>] (show_stack) from [<c08a31c8>] (dump_stack+0x24/0x28)
    [63076.952166] [<c08a31a4>] (dump_stack) from [<c012aa8c>] (__warn+0xe0/0xf8)
    [63076.960497] [<c012a9ac>] (__warn) from [<c012aaec>] (warn_slowpath_null+0x48/0x50)
    [63076.969513]  r9:00000001 r8:dc005000 r7:00000000 r6:bf2f96c8 r5:00000328 r4:bf30f180
    [63076.978748] [<c012aaa4>] (warn_slowpath_null) from [<bf2f96c8>] (wl12xx_queue_recovery_work+0x68/0x6c [wlcore])
    [63076.990297]  r6:d91f0d58 r5:d91f0d20 r4:d91f0d20
    [63076.995128] [<bf2f9660>] (wl12xx_queue_recovery_work [wlcore]) from [<bf2f97dc>] (wl12xx_tx_watchdog_work+0x110/0x114 [wlcore])
    [63077.009110]  r5:d91f0d20 r4:d91f19a0
    [63077.012873] [<bf2f96cc>] (wl12xx_tx_watchdog_work [wlcore]) from [<c014246c>] (process_one_work+0x210/0x430)
    [63077.025076]  r7:00000000 r6:db239c00 r5:d9487800 r4:d91f19a0
    [63077.032148] [<c014225c>] (process_one_work) from [<c014270c>] (worker_thread+0x80/0x674)
    [63077.041629]  r10:dc005000 r9:dc005014 r8:ffffe000 r7:c0d14960 r6:d9487814 r5:dc005000
    [63077.050753]  r4:d9487800
    [63077.053329] [<c014268c>] (worker_thread) from [<c014878c>] (kthread+0x158/0x160)
    [63077.063104]  r10:db3cde78 r9:c014268c r8:d9487800 r7:d94e8000 r6:00000000 r5:db194b40
    [63077.072329]  r4:db194b80
    [63077.074904] [<c0148634>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    [63077.084268] Exception stack(0xd94e9fb0 to 0xd94e9ff8)
    [63077.090712] 9fa0:                                     00000000 00000000 00000000 00000000
    [63077.100211] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [63077.109650] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [63077.116312]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0148634
    [63077.126316]  r4:db194b40
    [63077.130096] ---[ end trace df79db6154b4fdea ]---
    [63077.135015] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.99.79
    [63077.144626] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 2
    [63077.152184] wlcore: down
    [63077.154831] wlcore: down
    [63077.159752] wlcore: down
    [63077.163283] ieee80211 phy0: Hardware restart was requested
    [63077.659392] wlcore: PHY firmware version: Rev 8.2.0.0.242
    [63077.877539] wlcore: firmware booted (Rev 8.9.0.99.79)
    [63077.941786] wlcore: Association completed.
    64 bytes from 8.8.8.8: seq=90 ttl=53 time=1099.912 ms
    64 bytes from 8.8.8.8: seq=91 ttl=53 time=99.566 ms
    mesh0: mesh plink with 74:e1:82:2c:3d:df established
    mesh0: ===> PLINKS NUMBER: 1
    mesh0: MESH-PEER-CONNECTED 74:e1:82:2c:3d:df
    64 bytes from 8.8.8.8: seq=92 ttl=53 time=20.114 ms
    mesh0: mesh plink with d0:b5:c2:c1:fa:d0 established
    mesh0: ===> PLINKS NUMBER: 2
    mesh0: MESH-PEER-CONNECTED d0:b5:c2:c1:fa:d0
    64 bytes from 8.8.8.8: seq=93 ttl=53 time=35.231 ms
    64 bytes from 8.8.8.8: seq=94 ttl=53 time=18.139 ms
    64 bytes from 8.8.8.8: seq=95 ttl=53 time=18.211 ms
    64 bytes from 8.8.8.8: seq=96 ttl=53 time=12.124 ms
    64 bytes from 8.8.8.8: seq=97 ttl=53 time=27.846 ms
    
    --- 8.8.8.8 ping statistics ---
    100 packets transmitted, 15 packets received, 85% packet loss
    round-trip min/avg/max = 12.124/98.495/1099.912 ms
    root@beaglebone:/usr/share/wl18xx# [63154.169156] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
    [63154.176027] ------------[ cut here ]------------
    [63154.188752] WARNING: CPU: 0 PID: 1204 at drivers/net/wireless/ti/wlcore/main.c:808 wl12xx_queue_recovery_work+0x68/0x6c [wlcore]
    [63154.202303] Modules linked in: iptable_filter xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables x_tables ctr aes_arm_bs crypto_simd cryptd ccm arc4 wl18xx wlcore mac80211 pru_rproc irq_pruss_intc sha256_generic sha256_arm cfg80211 pruss musb_dsps musb_hdrc udc_core phy_am335x usbcore phy_generic usb_common phy_am335x_control bluetooth ecdh_generic snd_soc_simple_card snd_soc_simple_card_utils pm33xx wkup_m3_rproc wkup_m3_ipc remoteproc omap_aes_driver crypto_engine omap_crypto omap_sham pruss_soc_bus ti_emif_sram wlcore_sdio at24 musb_am335x rtc_omap omap_wdt sch_fq_codel
    [63154.267303] CPU: 0 PID: 1204 Comm: kworker/u2:2 Tainted: G        W         4.19.38-gc17c376661 #1
    [63154.278191] Hardware name: Generic AM33XX (Flattened Device Tree)
    [63154.286039] Workqueue: phy0 wl12xx_tx_watchdog_work [wlcore]
    [63154.293233] Backtrace: 
    [63154.295740] [<c010cb80>] (dump_backtrace) from [<c010cef0>] (show_stack+0x18/0x1c)
    [63154.305579]  r7:00000009 r6:00000000 r5:bf30f180 r4:00000000
    [63154.312666] [<c010ced8>] (show_stack) from [<c08a31c8>] (dump_stack+0x24/0x28)
    [63154.321236] [<c08a31a4>] (dump_stack) from [<c012aa8c>] (__warn+0xe0/0xf8)
    [63154.328161] [<c012a9ac>] (__warn) from [<c012aaec>] (warn_slowpath_null+0x48/0x50)
    [63154.337929]  r9:00000002 r8:dc005000 r7:00000000 r6:bf2f96c8 r5:00000328 r4:bf30f180
    [63154.347225] [<c012aaa4>] (warn_slowpath_null) from [<bf2f96c8>] (wl12xx_queue_recovery_work+0x68/0x6c [wlcore])
    [63154.358774]  r6:d91f0d58 r5:d91f0d20 r4:d91f0d20
    [63154.365135] [<bf2f9660>] (wl12xx_queue_recovery_work [wlcore]) from [<bf2f97dc>] (wl12xx_tx_watchdog_work+0x110/0x114 [wlcore])
    [63154.378128]  r5:d91f0d20 r4:d91f19a0
    [63154.383226] [<bf2f96cc>] (wl12xx_tx_watchdog_work [wlcore]) from [<c014246c>] (process_one_work+0x210/0x430)
    [63154.394483]  r7:00000000 r6:db239c00 r5:d9487800 r4:d91f19a0
    [63154.401473] [<c014225c>] (process_one_work) from [<c014270c>] (worker_thread+0x80/0x674)
    [63154.411010]  r10:dc005000 r9:dc005014 r8:ffffe000 r7:c0d14960 r6:d9487814 r5:dc005000
    [63154.420181]  r4:d9487800
    [63154.422755] [<c014268c>] (worker_thread) from [<c014878c>] (kthread+0x158/0x160)
    [63154.432414]  r10:db3cde78 r9:c014268c r8:d9487800 r7:d94e8000 r6:00000000 r5:db194b40
    [63154.441682]  r4:db194b80
    [63154.444259] [<c0148634>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    [63154.453605] Exception stack(0xd94e9fb0 to 0xd94e9ff8)
    [63154.458693] 9fa0:                                     00000000 00000000 00000000 00000000
    [63154.469168] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [63154.477399] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    [63154.486143]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0148634
    [63154.495330]  r4:db194b40
    [63154.497886] ---[ end trace df79db6154b4fdeb ]---
    [63154.504844] wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.99.79
    [63154.513598] wlcore: pc: 0x0, hint_sts: 0x00000060 count: 3
    [63154.520792] wlcore: down
    [63154.523382] wlcore: down
    [63154.525934] wlcore: down
    [63154.532497] ieee80211 phy0: Hardware restart was requested
    [63155.021907] wlcore: PHY firmware version: Rev 8.2.0.0.242
    [63155.210564] wlcore: firmware booted (Rev 8.9.0.99.79)
    [63155.274480] wlcore: Association completed.
    mesh0: mesh plink with 74:e1:82:2c:3d:df established
    mesh0: ===> PLINKS NUMBER: 1
    mesh0: MESH-PEER-CONNECTED 74:e1:82:2c:3d:df
    mesh0: mesh plink with d0:b5:c2:c1:fa:d0 established
    mesh0: ===> PLINKS NUMBER: 2
    mesh0: MESH-PEER-CONNECTED d0:b5:c2:c1:fa:d0
    
    root@beaglebone:/usr/share/wl18xx# 
    

    Let me know, if I could perform specific tests to isolate the cause.

    Looking forward for your reply,

    Thanks

  • Hi , 

    - Can you pls draw the test bed schematics showing which board is running what role ( sta/mesh/ethernet )  and what device is running ping ? I wanted to see visual of the test bed

    -  what channel is station and mesh nodes on ?

    Thanks

    Saurabh

  • Hello,

    Please find attached a diagram of the setup.
    Let me know, if I need to add more details.

    All these boards were running an equivalent of arago-tisdk-base-image (based on 6.0.0.7)  on eMMC.

    Ping was attempted from the gateway node no. 1.

    Ethernet was not connected on the AM4372 GP EVM.

    Yesterday, the mesh was using channel 6, frequency 2437.
    Mesh nodes were assigned static IPs for the mesh interface.


    I think the problem appears more during the initial stages (of forming the mesh).
    After few resets, it runs for hours without any issues under the same environment.

    Looking forward for your reply,

    Thanks

  • Hi , 

    What channel is AP on ?

    Thanks

    Saurabh

  • Hello,

    AP is also on the same channel.

    Infact, I specify the frequency for joining mesh based on the frequency reported after sta_start script.

    This frequency value is verified using network monitoring APP.

    Thanks

  • Hi , 

    Are you able to exclude beaglebone green from your test setup and re-test ? 

    Thanks

    Saurabh

  • Hello Saurabh,

    Earlier today, I was doing more testing with specific attention to the frequency and different access points.

    Only one AM4372 GP EVM was powered on. BBGW, BBBW and another AM4372 were not turned on.
    Frequency list was forced via wpa_supplicant. Tested with two different APs.

    Procedure

    1. Clean start AM4372 GP EVM
    2. Run sta_start.sh and wait for association and note down the frequency
    3. Run mesh_start.sh
    4. Run mesh_join.sh demomesh <frequency noted> and wait for mesh connected message
    5. Update iptables for NAT
    6. Ping to internet address
    7. One or more reset issues observed within a short period of time
    8. After some 15 minutes, add more nodes (added 3 more nodes for a 4 node mesh network)
    9. All mesh communication and internet access was stable for more than 5 hours (two nodes were running iperf, another node was pinging external IP via gateway)

    In short, the BBGW / BBBW were not turned on when reset issue was observed.

    Hope this clarifies,

    Thanks

  • Hi , 

    Thanks for the detailed instructions. Can you pls make following modification in wlcore/scan.c and see if it makes any difference. Essentially it removes recovery command when a scan fails. We just want to check if recovery is really needed in your case

    FYI : In certain cases recovery is beneficial .  Entire recovery takes ~1 sec .

    wlcore/scan.c

    if (wl->scan.failed) {

    wl1271_info("Scan completed due to error.");
    //wl12xx_queue_recovery_work(wl);
    }

    Thanks

    Saurabh

  • Saurabh,

    Thank you for your suggestion. Unfortunately, this has not helped.

    The issue seems to be triggered by the environment regardless of the new patch.

    Yesterday, I noticed the resets during the 1st hour (after starting STA + Mesh).
    After that, the four node mesh were running smoothly for more than 6 hours, all mesh nodes were able to access the internet.

    Today, I could not even get it started. The number of continuous resets went more than 12, before I gave up.
    On the same environment, I removed the filter frequency in wpa_supplicant and it connected to 5GHz frequency (57xx).

    Starting / joining mesh at this frequency completed without any errors. No reset was observed.
    However, the two mesh nodes (both of AM437x EVMs) could not detect each other, despite using same mesh ID and frequency.

    Not sure whether this could provide any clue.

    Is there any other setting that I need to check that could cause this behaviour ?

    The reset was observed without any foreground tasks. Target was idle, no ping or iperf was running. Still the module got reset repeatedly.
    Exact steps that I do are,

    IP forwarding is enabled via /etc/sysctl.conf

    Run sta_start.sh (wpa_supplicant has AP credentials)
    Note down the frequency with which AP association was successful
    Run mesh_start.sh
    Run mesh_join.sh with demomesh <ap frequency>

    Wait for mesh start message 
    Setup IPTables for NAT

    Run ping 8.8.8.8

    Let me know, if you suspect any particular step.

    Thanks

  • Hi , 

    We will check if we are able to reproduce the issue at our end and will get back to you . 

    Thanks

    Saurabh

  • Hello Saurabh,

    Observing positive results when the following line is commented out in the mesh_start.sh.

    iw phy phy0 set rts 0

    If I have this line enabled, I observe wlcore reset randomly.

    If this line is commented out, internet sharing is working reliably.
    Could not observe resets for two days on two different hardware.

    Kindly review and let us know, whether this is a valid change needed to get Station + MP work reliably.

    Thanks

  • Hi,

    Thanks for updating us . This is good info and we are glad it will working ok for you with this change. We will review it and get back to you .

    Thanks

    Saurabh

  • Hi ,

    It's ok to remove the rts setting from mesh start script. We don't believe it's needed.

    Thanks

    Saurabh