This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/WL1807MOD: Missing TX interrupt after broadcast of packet

Part Number: WL1807MOD
Other Parts Discussed in Thread: WL1271

Tool/software: Linux

I have Wi-Fi AP with WL1807 module. I found issue with missing TX interrupt
after broadcast of packet. Correct execution looks like this (from kernel log,
lines with "FIXME" are added by me for better understanding of program flow
and for getting more information):

7,593724,2466895290,-;wlcore: queue skb hlid 2 q 0 len 118 count 1
7,593725,2466895326,-;FIXME: wlcore_select_ac() - q=0, min_pkts=0
7,593726,2466895337,-;FIXME: wl1271_skb_dequeue() - START, ac=0, wlvif=df380c20, num_links=16
7,593727,2466895346,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=0, ac=0
7,593728,2466895360,-;FIXME: wl18xx_lnk_high_prio() - hlid=0, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0000, ap_fw_ps_map=0x0014
7,593729,2466895367,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,593730,2466895378,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=999
7,593731,2466895388,-;FIXME: wlcore_vif_dequeue_high_prio() - START, ac=0, last_tx_hlid=2, start_hlid=3
7,593732,2466895395,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=3, ac=0
7,593733,2466895405,-;FIXME: wl18xx_lnk_high_prio() - hlid=3, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0000, ap_fw_ps_map=0x0014
7,593734,2466895411,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,593735,2466895418,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=1
7,593736,2466895425,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=4, ac=0
7,593737,2466895434,-;FIXME: wl18xx_lnk_high_prio() - hlid=4, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0000, ap_fw_ps_map=0x0014
7,593738,2466895440,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,593739,2466895448,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=1
7,593740,2466895455,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=1, ac=0
7,593741,2466895464,-;FIXME: wl18xx_lnk_high_prio() - hlid=1, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0000, ap_fw_ps_map=0x0014
7,593742,2466895471,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,593743,2466895478,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=1
7,593744,2466895484,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=2, ac=0
7,593745,2466895493,-;FIXME: wl18xx_lnk_high_prio() - hlid=2, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0000, ap_fw_ps_map=0x0014
7,593746,2466895499,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,593747,2466895507,-;FIXME: wlcore_lnk_dequeue() - skb=df588d80, tx_queue_count[0]=0
7,593748,2466895515,-;FIXME: wlcore_vif_dequeue_high_prio() - END, skb=df588d80, *hlid=2, last_tx_hlid=2
7,593749,2466895524,-;FIXME: wl1271_skb_dequeue() - END, skb=df588d80, *hlid=2, last_wlvif=df380c20
7,593750,2466895537,-;FIXME: wl1271_alloc_tx_id() - id=0, skb=df588d80, tx_frames_cnt=1, tx_frames_map=0x0001
7,593751,2466895555,-;wlcore: tx_allocate: size: 134, blocks: 2, id: 0
7,593752,2466895570,-;wlcore: tx_fill_hdr: hlid: 2 len: 134 life: 8000 mem: 2
7,593753,2466895581,-;FIXME: wl1271_tx_regulate_link() - hlid=2, tx_pkts=1, fw_ps=1, active_link_count=4, ap_count=1
7,593754,2466895592,-;FIXME: wlcore_tx_work_locked() - hlid=2, skb=df588d80, wlvif=df380c20, has_data=1, ret=136
7,593755,2466895600,-;FIXME: wlcore_tx_work_locked() - desc->hlid=2, wl->tx_packets_count=2372
7,593756,2466895608,-;FIXME: wlcore_select_ac() - q=-1, min_pkts=4294967295
7,593757,2466895616,-;FIXME: wl1271_skb_dequeue() - END, skb=  (null), *hlid=2, last_wlvif=df380c20
7,593758,2466895715,-;FIXME: wlcore_tx_work_locked() - END, buf_offset=256, sent_packets=1, active_hlids=0x0004, bus_ret=0
7,593759,2466918846,-;wlcore: elp work
7,593760,2466918864,-;wlcore: chip to elp
7,593761,2466966364,-;wlcore: IRQ work
7,593762,2466966385,-;wlcore: waking up chip from elp
7,593763,2466966468,-;wlcore: wakeup time: 0 ms
7,593764,2466966583,-;wlcore: intr: 0x24 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593765,2466966604,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593766,2466966615,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593767,2466966630,-;FIXME: wlcore_fw_status() - fw_localtime=2433907619, time_offset=1458360385304202
7,593768,2466966637,-;wlcore: WL1271_ACX_INTR_EVENT_A
7,593769,2466966643,-;wlcore: EVENT on mbox 0
7,593770,2466966751,-;wlcore: MBOX vector: 0x2000000
7,593771,2466967160,-;wlcore: WL1271_ACX_INTR_HW_AVAILABLE
7,593772,2466967229,-;wlcore: intr: 0x0 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593773,2466967241,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593774,2466967250,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593775,2466967262,-;FIXME: wlcore_fw_status() - fw_localtime=2433907619, time_offset=1458360385304821
7,593776,2466967280,-;FIXME: wlcore_irq_locked() - END, ret=0
7,593777,2466988914,-;wlcore: elp work
7,593778,2466988931,-;wlcore: chip to elp
7,593779,2467039309,-;wlcore: IRQ work
7,593780,2467039330,-;wlcore: waking up chip from elp
7,593781,2467039413,-;wlcore: wakeup time: 0 ms
7,593782,2467039531,-;wlcore: intr: 0x60 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593783,2467039547,-;FIXME: wlcore_fw_status() - freed=2, alloc=0
7,593784,2467039563,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593785,2467039573,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593786,2467039587,-;FIXME: wlcore_fw_status() - fw_localtime=2433980559, time_offset=1458360385302509
7,593787,2467039596,-;wlcore: last released desc = 28, current idx = 29
7,593788,2467039610,-;wlcore: tx status id 0 skb 0xdf588d80 success 1
7,593789,2467039636,-;FIXME: wl1271_free_tx_id() - id=0, skb=df588d80, tx_frames_cnt=0, tx_frames_map=0x0000
7,593790,2467039642,-;wlcore: WL1271_ACX_INTR_DATA
7,593791,2467039651,-;FIXME: wlcore_rx() - hlid=0, drv_rx_counter=8, fw_rx_counter=8
7,593792,2467039659,-;FIXME: wlcore_rx() - END, ret=0
7,593793,2467039665,-;wlcore: WL1271_ACX_INTR_HW_AVAILABLE
7,593794,2467039733,-;wlcore: intr: 0x0 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593795,2467039745,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593796,2467039754,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593797,2467039763,-;FIXME: wlcore_fw_status() - fw_localtime=2433980559, time_offset=1458360385302685
7,593798,2467039780,-;FIXME: wlcore_irq_locked() - END, ret=0
7,593799,2467066477,-;wlcore: IRQ work
7,593800,2467066646,-;wlcore: intr: 0x8 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593801,2467066668,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593802,2467066678,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593803,2467066694,-;FIXME: wlcore_fw_status() - fw_localtime=2434007472, time_offset=1458360385302067
7,593804,2467066702,-;wlcore: WL1271_ACX_INTR_EVENT_B
7,593805,2467066709,-;wlcore: EVENT on mbox 1
7,593806,2467066775,-;wlcore: MBOX vector: 0x2000000
7,593807,2467067356,-;wlcore: intr: 0x0 (fw_rx_counter = 200, drv_rx_counter = 7, tx_results_counter = 0)
7,593808,2467067374,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,593809,2467067383,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,593810,2467067394,-;FIXME: wlcore_fw_status() - fw_localtime=2434007472, time_offset=1458360385302754
7,593811,2467067411,-;FIXME: wlcore_irq_locked() - END, ret=0
7,593812,2467088922,-;wlcore: elp work
7,593813,2467088940,-;wlcore: chip to elp

If TX interrupt is missing, kernel log looks differently:

7,686469,2859456615,-;wlcore: queue skb hlid 2 q 0 len 118 count 1
7,686470,2859456653,-;wlcore: IRQ work
7,686471,2859456743,-;wlcore: intr: 0x40 (fw_rx_counter = 221, drv_rx_counter = 28, tx_results_counter = 0)
7,686472,2859456757,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686473,2859456767,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686474,2859456777,-;FIXME: wlcore_fw_status() - fw_localtime=2826377307, time_offset=1458360376125676
7,686475,2859456783,-;wlcore: WL1271_ACX_INTR_DATA
7,686476,2859456791,-;FIXME: wlcore_rx() - hlid=0, drv_rx_counter=28, fw_rx_counter=29
7,686477,2859456798,-;FIXME: wlcore_rx() - buf_size=256, rx_counter=29
7,686478,2859456874,-;FIXME: wlcore_rx() - END, ret=0
7,686479,2859456885,-;FIXME: wlcore_select_ac() - q=0, min_pkts=0
7,686480,2859456896,-;FIXME: wl1271_skb_dequeue() - START, ac=0, wlvif=df380c20, num_links=16
7,686481,2859456905,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=0, ac=0
7,686482,2859456919,-;FIXME: wl18xx_lnk_high_prio() - hlid=0, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686483,2859456926,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,686484,2859456936,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=999
7,686485,2859456947,-;FIXME: wlcore_vif_dequeue_high_prio() - START, ac=0, last_tx_hlid=3, start_hlid=4
7,686486,2859456953,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=4, ac=0
7,686487,2859456963,-;FIXME: wl18xx_lnk_high_prio() - hlid=4, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686488,2859456969,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,686489,2859456977,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=1
7,686490,2859456984,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=1, ac=0
7,686491,2859456993,-;FIXME: wl18xx_lnk_high_prio() - hlid=1, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686492,2859457000,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=10
7,686493,2859457007,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=1
7,686494,2859457013,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=2, ac=0
7,686495,2859457022,-;FIXME: wl18xx_lnk_high_prio() - hlid=2, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686496,2859457029,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,686497,2859457037,-;FIXME: wlcore_lnk_dequeue() - skb=df593b00, tx_queue_count[0]=0
7,686498,2859457044,-;FIXME: wlcore_vif_dequeue_high_prio() - END, skb=df593b00, *hlid=2, last_tx_hlid=2
7,686499,2859457053,-;FIXME: wl1271_skb_dequeue() - END, skb=df593b00, *hlid=2, last_wlvif=df380c20
7,686500,2859457066,-;FIXME: wl1271_alloc_tx_id() - id=0, skb=df593b00, tx_frames_cnt=1, tx_frames_map=0x0001
7,686501,2859457084,-;wlcore: tx_allocate: size: 134, blocks: 2, id: 0
7,686502,2859457099,-;wlcore: tx_fill_hdr: hlid: 2 len: 134 life: 8000 mem: 2
7,686503,2859457110,-;FIXME: wl1271_tx_regulate_link() - hlid=2, tx_pkts=1, fw_ps=1, active_link_count=4, ap_count=1
7,686504,2859457122,-;FIXME: wlcore_tx_work_locked() - hlid=2, skb=df593b00, wlvif=df380c20, has_data=1, ret=136
7,686505,2859457129,-;FIXME: wlcore_tx_work_locked() - desc->hlid=2, wl->tx_packets_count=2663
7,686506,2859457136,-;FIXME: wlcore_select_ac() - q=-1, min_pkts=4294967295
7,686507,2859457145,-;FIXME: wl1271_skb_dequeue() - END, skb=  (null), *hlid=2, last_wlvif=df380c20
7,686508,2859457213,-;FIXME: wlcore_tx_work_locked() - END, buf_offset=256, sent_packets=1, active_hlids=0x0004, bus_ret=0
7,686509,2859457283,-;wlcore: intr: 0x0 (fw_rx_counter = 221, drv_rx_counter = 28, tx_results_counter = 0)
7,686510,2859457294,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686511,2859457304,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686512,2859457314,-;FIXME: wlcore_fw_status() - fw_localtime=2826377307, time_offset=1458360376126201
7,686513,2859457323,-;FIXME: wlcore_irq_locked() - END, ret=0
7,686514,2859463626,-;wlcore: IRQ work
7,686515,2859463811,-;wlcore: intr: 0x40 (fw_rx_counter = 222, drv_rx_counter = 29, tx_results_counter = 0)
7,686516,2859463831,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686517,2859463841,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686518,2859463856,-;FIXME: wlcore_fw_status() - fw_localtime=2826384275, time_offset=1458360376125619
7,686519,2859463864,-;wlcore: WL1271_ACX_INTR_DATA
7,686520,2859463873,-;FIXME: wlcore_rx() - hlid=0, drv_rx_counter=29, fw_rx_counter=30
7,686521,2859463882,-;FIXME: wlcore_rx() - buf_size=256, rx_counter=30
7,686522,2859463979,-;FIXME: wlcore_rx() - END, ret=0
7,686523,2859464196,-;wlcore: queue skb hlid 3 q 0 len 70 count 1
7,686524,2859464231,-;wlcore: intr: 0x0 (fw_rx_counter = 222, drv_rx_counter = 29, tx_results_counter = 0)
7,686525,2859464243,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686526,2859464251,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686527,2859464261,-;FIXME: wlcore_fw_status() - fw_localtime=2826384275, time_offset=1458360376126017
7,686528,2859464277,-;FIXME: wlcore_irq_locked() - END, ret=0
7,686529,2859464341,-;wlcore: IRQ work
7,686530,2859464425,-;wlcore: intr: 0x40 (fw_rx_counter = 223, drv_rx_counter = 30, tx_results_counter = 0)
7,686531,2859464438,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686532,2859464447,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686533,2859464457,-;FIXME: wlcore_fw_status() - fw_localtime=2826385052, time_offset=1458360376125432
7,686534,2859464462,-;wlcore: WL1271_ACX_INTR_DATA
7,686535,2859464470,-;FIXME: wlcore_rx() - hlid=0, drv_rx_counter=30, fw_rx_counter=31
7,686536,2859464477,-;FIXME: wlcore_rx() - buf_size=256, rx_counter=31
7,686537,2859464554,-;FIXME: wlcore_rx() - END, ret=0
7,686538,2859464566,-;FIXME: wlcore_select_ac() - q=0, min_pkts=1
7,686539,2859464577,-;FIXME: wl1271_skb_dequeue() - START, ac=0, wlvif=df380c20, num_links=16
7,686540,2859464585,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=0, ac=0
7,686541,2859464598,-;FIXME: wl18xx_lnk_high_prio() - hlid=0, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686542,2859464605,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,686543,2859464615,-;FIXME: wlcore_lnk_dequeue() - skb=  (null), tx_queue_count[0]=999
7,686544,2859464625,-;FIXME: wlcore_vif_dequeue_high_prio() - START, ac=0, last_tx_hlid=2, start_hlid=3
7,686545,2859464632,-;FIXME: wlcore_lnk_dequeue_high_prio() - hlid=3, ac=0
7,686546,2859464641,-;FIXME: wl18xx_lnk_high_prio() - hlid=3, suspend_bitmap=0xFFFFFFE0, fw_fast_lnk_map=0x0002, ap_fw_ps_map=0x0014
7,686547,2859464648,-;FIXME: wl18xx_lnk_high_prio() - pkts=0 < thold=4
7,686548,2859464656,-;FIXME: wlcore_lnk_dequeue() - skb=df534f00, tx_queue_count[0]=0
7,686549,2859464664,-;FIXME: wlcore_vif_dequeue_high_prio() - END, skb=df534f00, *hlid=3, last_tx_hlid=3
7,686550,2859464672,-;FIXME: wl1271_skb_dequeue() - END, skb=df534f00, *hlid=3, last_wlvif=df380c20
7,686551,2859464685,-;FIXME: wl1271_alloc_tx_id() - id=1, skb=df534f00, tx_frames_cnt=2, tx_frames_map=0x0003
7,686552,2859464695,-;wlcore: tx_allocate: size: 86, blocks: 2, id: 1
7,686553,2859464708,-;wlcore: tx_fill_hdr: hlid: 3 len: 86 life: 8000 mem: 2
7,686554,2859464719,-;FIXME: wl1271_tx_regulate_link() - hlid=3, tx_pkts=1, fw_ps=0, active_link_count=4, ap_count=1
7,686555,2859464730,-;FIXME: wlcore_tx_work_locked() - hlid=3, skb=df534f00, wlvif=df380c20, has_data=1, ret=88
7,686556,2859464738,-;FIXME: wlcore_tx_work_locked() - desc->hlid=3, wl->tx_packets_count=2664
7,686557,2859464746,-;FIXME: wlcore_select_ac() - q=-1, min_pkts=4294967295
7,686558,2859464754,-;FIXME: wl1271_skb_dequeue() - END, skb=  (null), *hlid=3, last_wlvif=df380c20
7,686559,2859464823,-;FIXME: wlcore_tx_work_locked() - END, buf_offset=256, sent_packets=1, active_hlids=0x0008, bus_ret=0
7,686560,2859464896,-;wlcore: intr: 0x0 (fw_rx_counter = 223, drv_rx_counter = 30, tx_results_counter = 0)
7,686561,2859464908,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=1, active_link_count=4, ap_count=1
7,686562,2859464917,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686563,2859464927,-;FIXME: wlcore_fw_status() - fw_localtime=2826385052, time_offset=1458360376125890
7,686564,2859464937,-;FIXME: wlcore_irq_locked() - END, ret=0
7,686565,2859467602,-;wlcore: IRQ work
7,686566,2859467743,-;wlcore: intr: 0x40 (fw_rx_counter = 223, drv_rx_counter = 30, tx_results_counter = 0)
7,686567,2859467758,-;FIXME: wlcore_fw_status() - freed=2, alloc=2
7,686568,2859467781,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686569,2859467791,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686570,2859467805,-;FIXME: wlcore_fw_status() - fw_localtime=2826388226, time_offset=1458360376125523
7,686571,2859467814,-;wlcore: last released desc = 22, current idx = 23
7,686572,2859467827,-;wlcore: tx status id 1 skb 0xdf534f00 success 1
7,686573,2859467849,-;FIXME: wl1271_free_tx_id() - id=1, skb=df534f00, tx_frames_cnt=1, tx_frames_map=0x0001
7,686574,2859467863,-;FIXME: wl18xx_tx_expire() - id=0, now_time=-1468579013, start_time=-1468589678, life_time=8000
7,686575,2859467870,-;wlcore: WL1271_ACX_INTR_DATA
7,686576,2859467878,-;FIXME: wlcore_rx() - hlid=0, drv_rx_counter=31, fw_rx_counter=31
7,686577,2859467885,-;FIXME: wlcore_rx() - END, ret=0
7,686578,2859467953,-;wlcore: intr: 0x0 (fw_rx_counter = 223, drv_rx_counter = 30, tx_results_counter = 0)
7,686579,2859467965,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=3, tx_pkts=0, active_link_count=4, ap_count=1
7,686580,2859467974,-;FIXME: wl12xx_irq_ps_regulate_link() - hlid=4, tx_pkts=0, active_link_count=4, ap_count=1
7,686581,2859467984,-;FIXME: wlcore_fw_status() - fw_localtime=2826388226, time_offset=1458360376125701
7,686582,2859467994,-;FIXME: wlcore_irq_locked() - END, ret=0
7,686583,2859488905,-;wlcore: elp work
7,686584,2859488923,-;wlcore: chip to elp

There is missing line with "tx status id 0 skb ...". Event doesn't occur even
after long time of waiting (see attached whole kernel log). I think this means that
packet processing was broken in Wi-Fi module somehow.

One lost broadcast packet wouldn't be a problem, but there are unpleasant
consequences. Every packet allocates some resources in kernel Wi-Fi driver. In
this case numbers in tx_allocated_blocks, tx_allocated_pkts[] and
tx_queue_count slowly grow as other and other TX interrupts are lost. The
limit is reached after some time and result is "wlcore: ERROR Tx stuck (in FW)
for 5000 ms. Starting recovery". Second problem is that Wi-Fi AP is actually
not working for connected clients during this time - data tranmission drop
and interrupt take few minutes before recovery is reached. It's really bad.

Wi-Fi AP use linux kernel 3.12.10 with wl18xx driver backported from kernel
4.8.5. Wi-Fi module has firmware 8.9.0.1.70. I tested firmware 8.9.0.0.70 too,
but behaviour is same. There are 2 Wi-Fi clients. First is notebook with
Windows 7 (on hlid 3) which communicates over Wi-Fi seldom but regularly.
Second is some router with linux (on hlid 4) which doesn't communicate over
Wi-Fi, it is connected to AP only. More, this router is restarted every 5
minutes. It is here for simulating of connecting/disconnecting of other Wi-Fi
clients. The router is key component in this test. If it is not present and
notebook is connected to AP as the only client, everything is fine!

Whole kernel log, Wi-Fi module log and configuration file are attached.

My questions:

Q1: What should prevent a TX interrupt to occur?
Q2: What is maximum time when TX interrupt should occur?
Q3: Could some change in configuration file help with this problem?
Q4: Could some parameter change in kernel module help with this problem?
Q5: Why power save mode is activated on hlid 2? Cannot be problem here?
Q6: How to disable power save mode in Wi-Fi module completely?

1616.wifi.zip

  • Hi Lumir,

    Before we start digging deep into your logs, can you srart with your Q6 (disable ELP) using the following procedure:

    And see if the issue is still seen?

    I want to see if this missed interrupt is by any chance after wake up from ELP.

    Best Regards,

    Eyal

  • Hi Eyal,

    I tried to disable power save mode according to your reference:

    Command "iw wlan0 set power_save off" is not supported in AP mode. However, second setting was successful.

    Unfortunately the behaviour is similar. The missing TX interrupt is still here. In new attached kernel log file you can see it at 3014853793 time stamp and later.

    Some more information I forgot to mention: I have set debug_level = 11361, it means IRQ + EVENT + TX + PSM + MAC80211 + ACX, in kernel driver.

    wifi2.zip

  • One more question:

    Q7: What should happen, when TX life_time (value 8000) expires? What is unit of value, milisecond?

  • Hi Lumir,

    We have gone over your logs and it does look like some  mismatch in the host/fw released descriptors.

    This can happen usually in cases where the falling/raising/level/edge configuration is not right, or there are some other board issues.

    If you try switching between level/edge configuration does it make any change?

    Is there a standard scenario that you are using to recreate this issue? Is it easy to reproduce?

    We don't rule out that there might be a logical issue that cause this IRQ not to be raised.

    Can you explain your traffic scenario so we may try to reproduce the issue on our setup?

     Best Regards,

    Eyal

  • Hi Eyal,

    Interrupt is set to level high now. I will try to change to rising edge and then test again. Falling edge or level low doesn't make sense for me.

    My test scenario is described in my first post. It is not so easy to reproduce and one attempt takes about 20 - 60 minutes. I don't know exact sequence of steps. I just start everything, then I'm waiting and let it to fail. I can watch kernel log or "/sys/kernel/debug/ieee80211/phy0/wlcore/driver_state" continuously and it's all.

    My next peice of information: it is always problem with broadcast packets (on hlid 2) first. Event when broadcast packets don't work anymore, then unicast packets to/from communicating client (on hlid 3) are transferred successfully.

    Best Regards,

    Lumir

  • Hi Lumir,

    So, to make sure I understand.
    basically your product is an AP that as part of this test has two clients, a notebook that is connectedd and is sending some data (ping?) and another device.
    The other device is just connected to the AP (over wifi) and you are resetting it every couple of minutes.
    Did I miss something?

    Please try with switching to edge interrupt and let's see if it makes a change.

    In addition, do you have the ability to provide us with firmware logs (using gLogger)? does your hardware have the WLAN_UART_DBG pin routed out?
    This pin is used for WiFi Firmware logging so we can get indications from the firmware running inside the wilink chip as only driver logs may not be enough to fully understand the issue.

    Best Regards,
    Eyal
  • Hi Eyal,

    Yes, you understand it well. The notebook is just PC I'm working on, so there are e-mail client, instant messaging, web browsing, windows networking and so on running and communication over wifi at background.
    I don't have WLAN_UART_DBG pin available, but I can download Wi-Fi firmware log from module, as I did before. I will test new IRQ configuration and then provide new logs.

    Best Regards,
    Lumir
  • Hi Eyal,

    I have changed interrupt configuration from high level to rising edge. The results are similar. TX queue is filled up after some time because of missing interrupts for hlid 2.

    New logs are attached. The first packet where no interrupt occured was at kernel log time stamp 2798446554.

    Best Regards,

    Lumir

    wifi3.zip

  • Hi Lumir,

    I am trying to recreate the issue you are seeing using our setups but so far not able to.

    I have one board set up as an AP and another board set up as a station.

    The station is running a shell script that every five seconds it disconnect from the AP (using wpa_cli commands) and reconnects.

    Another phone is just constantly connected to the AP.

    There is obviously something something that we are doing different.

    Any idea what?

    Best Regards,

    Eyal

  • Hi Eyal,

    the station running a shell script should work in little diferrent way. I don't disconnect from AP correctly, but I forcibly reboot whole router without disconnection. Maybe try to just kill -9 wpa_supplicant without disconnection. And new connection attempt occurs after about 20 seconds, because of router reboot. I have 300 secconds pause between reboots, 5 seconds seems to be not much. Maybe try 120 seconds at least in order to reach power save mode. I think it's important.
    Phone connection looks good.

    Best Regards,
    Lumir
  • Hi Lumir,

    Ok, will change the script to kill the supplicant and restart it every 300 seconds as you do.
    One question regarding your AP configuration. What type of security do you configure your AP to?

    Best Regards,
    Eyal
  • Hi Eyal,

    I'm using WPA2-PSK authentication with AES encryption. My hostapd.conf is attached.

    Best Regards,

    Lumir

    1854.hostapd.conf.zip

  • Hi Lumir,

    So far I am not able to reproduce the issues you are seeing.

    Not with killing wpa_supplicant (killall wpa_supplicant) and not by manual reset of my station board.

    I am using your hostapd.conf (just a different ssid)

    Instead of using the router, can you take one of your boards that normally run as an AP and run it as a station, simulating connect/disconnect?

    I am using the following shell script on the station board:

    # cat connect-loop.sh
    for i in `seq 1 1000`;
    do
            echo $i
            killall wpa_supplicant
            wpa_supplicant  -e/usr/share/wl18xx/entropy.bin \
            -iwlan0 -Dnl80211 -c/usr/share/wl18xx/wpa_supplicant.conf &
            sleep 10
            wpa_cli -iwlan0 add_network
            wpa_cli -iwlan0 set_network 0 auth_alg OPEN
            wpa_cli -iwlan0 set_network 0 key_mgmt WPA-PSK
            wpa_cli -iwlan0 set_network 0 psk '"12345678"'
            wpa_cli -iwlan0 set_network 0 pairwise CCMP TKIP
            wpa_cli -iwlan0 set_network 0 group CCMP TKIP
            wpa_cli -iwlan0 set_network 0 mode 0
            wpa_cli -iwlan0 set_network 0 ssid '"SitaraAP-Eyal"'
            wpa_cli -iwlan0 select_network 0
            wpa_cli -iwlan0 enable_network 0
            wpa_cli -iwlan0 reassociate
            sleep 300
            wpa_cli -iwlan0 status
    done

    I want to try and rule out a specific issue with your station client.

    Best Regards,

    Eyal

  • Hi Eyal,

    your script is too considerate to AP. "killall wpa_supplicant" will secure correct disconnection. I had to do 2 small modifications to better simulate my conditions:

    for i in `seq 1 1000`;
    do
            echo $i
            killall -9 wpa_supplicant

            sleep 20

            wpa_supplicant  -e/usr/share/wl18xx/entropy.bin \
            -iwlan0 -Dnl80211 -c/usr/share/wl18xx/wpa_supplicant.conf &
            sleep 10
            wpa_cli -iwlan0 add_network
            wpa_cli -iwlan0 set_network 0 auth_alg OPEN
            wpa_cli -iwlan0 set_network 0 key_mgmt WPA-PSK
            wpa_cli -iwlan0 set_network 0 psk '"12345678"'
            wpa_cli -iwlan0 set_network 0 pairwise CCMP TKIP
            wpa_cli -iwlan0 set_network 0 group CCMP TKIP
            wpa_cli -iwlan0 set_network 0 mode 0
            wpa_cli -iwlan0 set_network 0 ssid '"SitaraAP-Eyal"'
            wpa_cli -iwlan0 select_network 0
            wpa_cli -iwlan0 enable_network 0
            wpa_cli -iwlan0 reassociate
            sleep 300
            wpa_cli -iwlan0 status
    done

    With "killall -9 ..." you will accomplish incorrect disconnection from AP like when I reboot my router. This means, that station suddenly disappears from AP and AP has to solve station removing by its own. Second, "sleep 20" simulates some time which is needed to finish reboot of my router. The router is totally inaccessible from AP during this time and then every packet for them should be buffered for some time, because it cannot be transmitted immediately.

    Third refinement (optional) is starting periodic ping from AP to nonexistent station. The purpose of this is generation of more broadcast packets (ARP requests) on hlid 2.

    With these modifications I can cause ERROR as described before. New logs are attached but they are similar to previous.

    Best Regards,

    Lumir

    wifi4.zip

  • Hi Lumir,

    Even with these changes, My setup is running since yesterday and no issue is seen yet.

    typical reconnection prints on the AP side after the station is killed looks as below:

    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-DISCONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-CONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 WPA: pairwise key handshake completed (RSN)
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-DISCONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-CONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 WPA: pairwise key handshake completed (RSN)
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-DISCONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-CONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 WPA: pairwise key handshake completed (RSN)

    Coupule of suggestions/questions:

    1. Can you try do disconnection and reconnection every 6 min instead of 3.The Aging in the FW is by default 5 min. and the AP thinks the STA is still connected to it, so we want to see if this can be related to your issue.

    2. Try using the attached firmware, which is not released yet, but have an internal fix that just might be related (FW 8.9.0.0.72):

    0576.wl18xx-fw-4.bin

    3.If both doen't help , you mentioned that you are using the drivers backported from kernel 4.8.5. Can you try building the modules directly from our official release (R8.7_SP1) using our build script? ()

    and see if the issue is still there? It also use backports for building.

    Best Regards,

    Eyal

  • Hi Eyal!

    1. Changing of disconnection/reconnection period from 5 to 6 minutes doesn't help anything. The issue is still here.

    2. The newer firmware looks little better than my current FW 8.9.0.1.70. I can see some progress in recovery of packets on hlid 2. The packet queue is sometimes suddenly filled with 10 packets, but these packets are usually released with status 0 after some time (about 30 seconds). This is definitely progress because that state was always start of dead end in previous FW. But if I wait long enough, my issue occurs anyway. I still don't know what type of these 10 packets is. I will try to find out.

    3. I could. It's not easy task. It will take few days.

    Best Regards,

    Lumir

  • Hi Lumir,

    Thank you for the update! Please do let me know once you find something regarding those 10 packets.

    One question, In your log file I didn't see the wpa supplicant prints after the reconnecton of the station which are related to reconnection after the station reboot, especially the "AP-STA-POSSIBLE-PSK-MISMATCH" when the AP detects that a re-key is needed. Are you getting those prints in your log?

    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-POSSIBLE-PSK-MISMATCH 5c:31:3e:d4:a7:84
    wlan1: AP-STA-DISCONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: authenticated
    wlan1: STA 5c:31:3e:d4:a7:84 IEEE 802.11: associated (aid 2)
    wlan1: AP-STA-CONNECTED 5c:31:3e:d4:a7:84
    wlan1: STA 5c:31:3e:d4:a7:84 WPA: pairwise key handshake completed (RSN)

    In addition, what version of wpa_suppicant are you using?

    BR,

    Eyal

  • Hi Eyal,

    no, I didn't notice these "AP-STA-POSSIBLE-PSK-MISMATCH" messages. But I'm not 100% sure about this, I will check it next time.

    I'm using hostapd 2.4 and wpa_supplicant 2.4 too.

    Best Regards,
    Lumir
  • Hi Lumir,

    The version you are using may be a bit old. Wonder if this is related...
    Can you please try using the one from our release which is version 2.5:

    git.ti.com/.../R8.7_SP1

    BR,
    Eyal
  • Hi Lumir,

    Have you tried switching to hostapd version 2.5? any updates?

    I am still not able to simulate your issue here and a basic question has come up.
    In your test setup, what is the number of devices that are connected to your AP?

    The maximum number of connected stations can't exceed 10 and I want to make sure that you are not running into any corner case here.
    Can you confirm you are seeing the issue in case of just two stations connected to the AP (one is resetting every couple of minutes?)

    BR,
    Eyal
  • Hi Eyal,

    I just tried the latest hostapd 2.6 and it doesn't help.

    I have 2 devices only connected to AP. One stable and one periodically connecting/disconnecting. I use your script actually.

    I can see 3 differences between your AP syslog and mine:

    2017-05-18 08:42:50 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: authenticated
    2017-05-18 08:42:50 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: associated (aid 2)
    2017-05-18 08:42:51 hostapd: wlan0: STA 00:22:88:02:0d:29 RADIUS: starting accounting session 388B5F6FAEBF1365
    2017-05-18 08:42:51 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: pairwise key handshake completed (RSN)
    2017-05-18 08:42:51 dhcpd-wlan: DHCPDISCOVER from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:42:52 dhcpd-wlan: DHCPOFFER on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:42:52 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:42:52 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:42:52 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:42:52 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:44:33 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: group key handshake completed (RSN)
    2017-05-18 08:44:33 hostapd: wlan0: STA 5c:51:4f:8c:02:d5 WPA: group key handshake completed (RSN)
    2017-05-18 08:48:13 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: authenticated
    2017-05-18 08:48:13 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: associated (aid 2)
    2017-05-18 08:48:13 hostapd: wlan0: STA 00:22:88:02:0d:29 RADIUS: starting accounting session 9186544D5005E3D5
    2017-05-18 08:48:13 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: pairwise key handshake completed (RSN)
    2017-05-18 08:48:14 dhcpd-wlan: DHCPDISCOVER from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:48:15 dhcpd-wlan: DHCPOFFER on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:48:15 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:48:15 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:48:15 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:48:15 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:43 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: authenticated
    2017-05-18 08:51:43 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: associated (aid 2)
    2017-05-18 08:51:44 hostapd: wlan0: STA 00:22:88:02:0d:29 RADIUS: starting accounting session 9186544D5005E3D5
    2017-05-18 08:51:44 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: pairwise key handshake completed (RSN)
    2017-05-18 08:51:45 dhcpd-wlan: DHCPDISCOVER from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:46 dhcpd-wlan: DHCPOFFER on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:46 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:46 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:46 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 (192.168.7.1) from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:51:46 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:54:21 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: authenticated
    2017-05-18 08:54:21 hostapd: wlan0: STA 00:22:88:02:0d:29 IEEE 802.11: associated (aid 2)
    2017-05-18 08:54:21 hostapd: wlan0: STA 00:22:88:02:0d:29 RADIUS: starting accounting session A2C9D68EB210A161
    2017-05-18 08:54:21 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: pairwise key handshake completed (RSN)
    2017-05-18 08:54:22 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:54:22 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:54:22 dhcpd-wlan: DHCPREQUEST for 192.168.7.102 from 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:54:22 dhcpd-wlan: DHCPACK on 192.168.7.102 to 00:22:88:02:0d:29 via wlan0
    2017-05-18 08:54:33 hostapd: wlan0: STA 5c:51:4f:8c:02:d5 WPA: group key handshake completed (RSN)
    2017-05-18 08:54:33 hostapd: wlan0: STA 00:22:88:02:0d:29 WPA: group key handshake completed (RSN)

    1) No "AP-STA-POSSIBLE-PSK-MISMATCH" messages.

    2) No "AP-STA-DISCONNECTED" messages.

    3) Extra DHCP messages. Maybe you filtered them.

    I still think that problem is between kernel driver and Wi-Fi module. I can see, that all TX packets were successfully transfered to W-Fi module according to return values. But TX status was not received from Wi-Fi module for all sent packets. And always for hlid 2 only. IRQ works at the same time. Packets on hlid 3 still can be transferred succesfully when hlid 2 queue contains unfreed/unacknowledged packets. Sequence numbers in TX status, if received, are OK, no mising or extra one.

    I suspect packet life time service is malfunctioning somehow in Wi-Fi firmware when station is not accessible. How exactly Wi-Fi firmware services expired packets, i.e. these that cannot be transmitted for any reason? There is about 8000 ms timeout in kernel driver. This value is transmitted to Wi-Fi module with every TX packet. But module probably ignores this value. I have meassured real life time from kernel log time stamps and it is either about 30000 ms (tx status id 0 skb 0x.. success 0) or infinity (no tx status in next interrupts at all).

    I tried more APs so I can exclude HW problem, I hope.

    Best Regards,

    Lumir

  • Hi lumir,

    There is nothing unique regarding our wpa_supplicant logging.

    Our wpa_supplicant.conf file that is used on the connecting station is taken from here:

    Can you try these setting on your connecting/disconnecting station?

    I wonder if there is any difference that is casing the issue not to show up on our test setup?

    BR,

    Eyal

  • Hi Eyal,

    I had been using much simpler wpa_supplicant.conf:

    ctrl_interface=/var/run/wpa_supplicant

    country=CZ

    But I can raise my issue even with your one. I had to comment some option that old wpa_supplicant 2.4 doesn't support: device_name, manufacturer, model_name, model_number, serial_number, config_methods, p2p_go_ht40, p2p_ignore_shared_freq.

    Could you answer my question about packet life time handling in Wi-Fi firmware, please? I think it's important for better understanding what is happen inside.

    Best Regards,

    Lumir

  • Hello Lumir,

    Regarding your question, Each packet has a lifetime value attached to from the driver but there are manipulations done on this value inside the firmware based on many conditions so it is really just a template.

    One quick try, if possible. Can you try an older firmware from previous release (R8.6_SP1 - 8.9.0.1.55)?

    There is work done for mesh that we want to confirm that did not introduce this issue on some corner case:

    In addition, I am attaching a debug firmware here with increased expiry time (8.9.0.1.72):

    wl18xx-fw-4.bin-8.9.1.72.bin

    Can you try it as well and see if it makes a change?

    BR,

    Eyal

  • Hi Eyal,

    thank you for answer.

    My test results now:

    Firmware 8.9.0.1.55: it had been running over 4 hour and no "ERROR Tx stuck ..." occured. I have noticed primary cause which typically leeds to TX stuck few times during test, but firmware always successfully recovered from this critical state without restart.

    Firmware 8.9.0.1.72: my issue can be observed with this firmware. Logs are attached.

    I have noticed one diffference between these two firmwares. It is about wl->ap_fw_ps_map value in kernel driver. ap_fw_ps_map changes to value 0x14, i.e. hlids 2 (broadcasts) and 4 (my connecting/disconnecting station) are in power save mode, in FW 8.9.0.1.72 after some time and stays here forewer. But in FW 8.9.0.1.55 ap_fw_ps_map changes to value 0x14 after some time, but this value returns back to 0x00 just before recovery from critical state this. Then all buffered packets were freed with "tx status ... success 0" in next interrupt. Maybe this means something.

    Best Regards,

    Lumirwifi5.zip

  • Hi Lumir,

    Thank you very much for the update!

    we will look into the logs you provided for FW 8.9.0.1.72.

    So, if understand correctly, so far you are not seeing the original issue with Firmware 8.9.0.1.55, right?

    This can be very useful for our R&D for debugging the issue.

    Can you keep ruining your test setup with this (R8.6_SP1) firmware for a longer time and make sure it does not show up?

    BR,

    Eyal

  • Hi Eyal,

    yes, I didn't see my original issue with FW 8.9.0.1.55. And of course, I can and I will do longer test.

    Best Regards,
    Lumir
  • Hi Lumir,

    Can you please send me your contact info (email) to eyalr@ti.com
    I may need to send you a special firmware for trying to isolate the issue you are seeing.

    Will do that after you run FW 8.9.0.1.55 for a bit longer.

    Best Regards,
    Eyal