This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5728: PCIe M.2 Wifi Modules stability under Linux

Part Number: AM5728


Tool/software: Linux

Hi,

we are using a PhyCORE-AM57x Module with an AM5728 SoC. (Silicon Revision 2.0)

We're running a custom buildroot built Linux with a 4.9 Kernel with patches from the ti kernel repo.

We designed our own custom Baseboard which has an M.2 port for PCIe connectivity to use WiFi Cards. The WiFi-Card we want to use is an Intel 8265NGW.

Currently we're having problems to get the WiFi working realiably. We can initialize the card and load the Firmware (version 22.361476). Sometimes we can even connect to a Wifi and transmit some data.

But after some time we get timeouts from the Card. So the Driver is waiting for Data but the Card does not send anything.

Typicall error messages are something like this:

[   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
[   97.782240] iwlwifi 0000:01:00.0: Current SW read_ptr 56 write_ptr 57

We tried another WiFi Card from Advantech using a Marvell Chip. But there we also get stuck at some point with similar symptoms.

Since we were not sure our Board is working correctly we took the Phytec SOM Platform Board which has a conventional PCIe Slot and used an PCIe to M.2 converter Card. But the results are the same. The card stops talking to us sooner or later.

When we use a standard Intel Gigabit CT Desktop Adapter on this board (plugged directly into the PCIe slot) I can transfer gigabytes of data without any problems. 

When I put the WiFi Card with the converter Card into the PCIe Slot of an x86 PC I can also connect to a wifi and transfer gigabytes of data without any problems.

So it currently boils down to different M.2 Wifi Cards not working in combination with an AM5728 on a PhyCORE Module on Linux 4.9. (We also tested Kernel 4.8, the Intel card is not supported before kernel 4.6 so we can't test 4.4)

We're currently more or less out of ideas what could be the cause of this issue. I'm open for any hints what could cause this.

Michael

  • Hi Michael,

    The software which you are using is not an official TI's release and I could not find the problem but I suggest you to compare the WiFi module behaviour with your software and with PROCESSOR-SDK-LINUX-AM57X at:
    www.ti.com/.../processor-sdk-am57x
    Verify weather the same issue appears with PROCESSOR-SDK-LINUX-AM57X.

    BR
    Tsvetolin Shulev
  • Since the Intel 8265 Card is only supported from kernel 4.6+ I had to get another Card which is compatible with 4.4 based Kernels.

    Now I got the Intel 8260. I tried it on the Phytec SOM Board with their release PD16.01 which is TI SDK 03.00 based and basically got the same result:

    Linux am57xx-phycore-rdk 4.4.12 #1 SMP PREEMPT Tue Jan 24 12:12:02 CET 2017 armv7l GNU/Linux

    Examples:

    [  114.830449] iwlwifi 0000:01:00.0: Error sending PHY_CONTEXT_CMD: time out after 2000ms.
    [  114.838521] iwlwifi 0000:01:00.0: Current CMD queue read_ptr 47 write_ptr 48
    
    ------
    
    [  115.352352] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
    [  115.358039] iwlwifi 0000:01:00.0: Current SW read_ptr 47 write_ptr 48
    

    Next I will try if this board can boot the vanilla TI SDK and if this succeeds repeat the tests.

  • I now tried it with linux-4.4.32+gitAUTOINC+adde2ca9f8-gadde2ca9f8 kernel from processor-sdk-linux-03.02.00

    basically it shows the same behaviour:

    wpa_supplicant -c /etc/wpa_supplicant.conf -i wlp1s0 -B
    [  172.674644] iwlwifi 0000:01:00.0: Error sending STATISTICS_CMD: time out after 2000ms.
    [  172.682606] iwlwifi 0000:01:00.0: Current CMD queue read_ptr 53 write_ptr 54
    [  172.693441] iwlwifi 0000:01:00.0: Loaded firmware version: 16.242414.0
    [  172.701355] iwlwifi 0000:01:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
    [  172.709670] iwlwifi 0000:01:00.0: 0x000002F0 | uPc
    [  172.714484] iwlwifi 0000:01:00.0: 0x00000000 | branchlink1
    [  172.722535] iwlwifi 0000:01:00.0: 0x00000BF0 | branchlink2
    [  172.729352] iwlwifi 0000:01:00.0: 0x00022B90 | interruptlink1
    [  172.735962] iwlwifi 0000:01:00.0: 0x00022B90 | interruptlink2
    [  172.741768] iwlwifi 0000:01:00.0: 0x00000000 | data1
    [  172.746870] iwlwifi 0000:01:00.0: 0x00000080 | data2
    [  172.751933] iwlwifi 0000:01:00.0: 0x07830000 | data3
    [  172.757007] iwlwifi 0000:01:00.0: 0xFAE3D7AA | beacon time
    [  172.762598] iwlwifi 0000:01:00.0: 0x0996DACD | tsf low
    [  172.767821] iwlwifi 0000:01:00.0: 0x00000000 | tsf hi
    [  172.773003] iwlwifi 0000:01:00.0: 0x00000000 | time gp1
    [  172.778305] iwlwifi 0000:01:00.0: 0x0996DAD2 | time gp2
    [  172.783614] iwlwifi 0000:01:00.0: 0x00000000 | time gp3
    [  172.788947] iwlwifi 0000:01:00.0: 0x00000010 | uCode version major
    [  172.795248] iwlwifi 0000:01:00.0: 0x0003B2EE | uCode version minor
    [  172.801495] iwlwifi 0000:01:00.0: 0x00000201 | hw version
    [  172.806981] iwlwifi 0000:01:00.0: 0x00489008 | board version
    [  172.812783] iwlwifi 0000:01:00.0: 0x0935009C | hcmd
    [  172.817740] iwlwifi 0000:01:00.0: 0x00022000 | isr0
    [  172.822732] iwlwifi 0000:01:00.0: 0x00800000 | isr1
    [  172.827690] iwlwifi 0000:01:00.0: 0x08001802 | isr2
    [  172.832685] iwlwifi 0000:01:00.0: 0x00400080 | isr3
    [  172.837678] iwlwifi 0000:01:00.0: 0x00000000 | isr4
    [  172.842658] iwlwifi 0000:01:00.0: 0x00000110 | isr_pref
    [  172.848001] iwlwifi 0000:01:00.0: 0x00000000 | wait_event
    [  172.853515] iwlwifi 0000:01:00.0: 0x00003EED | l2p_control
    [  172.859376] iwlwifi 0000:01:00.0: 0x00000020 | l2p_duration
    [  172.865143] iwlwifi 0000:01:00.0: 0x00000000 | l2p_mhvalid
    [  172.870655] iwlwifi 0000:01:00.0: 0x00000000 | l2p_addr_match
    [  172.876600] iwlwifi 0000:01:00.0: 0x0000008F | lmpm_pmg_sel
    [  172.882279] iwlwifi 0000:01:00.0: 0x17111905 | timestamp
    [  172.887719] iwlwifi 0000:01:00.0: 0x00345880 | flow_handler
    [  172.893460] iwlwifi 0000:01:00.0: Start IWL Error Log Dump:
    [  172.899140] iwlwifi 0000:01:00.0: Status: 0x00000000, count: 7
    [  172.905094] iwlwifi 0000:01:00.0: 0x00000070 | ADVANCED_SYSASSERT
    [  172.911288] iwlwifi 0000:01:00.0: 0x00000000 | umac branchlink1
    [  172.917304] iwlwifi 0000:01:00.0: 0xC00817F8 | umac branchlink2
    [  172.923348] iwlwifi 0000:01:00.0: 0xC008B796 | umac interruptlink1
    [  172.929761] iwlwifi 0000:01:00.0: 0xC0081510 | umac interruptlink2
    [  172.936046] iwlwifi 0000:01:00.0: 0x00000800 | umac data1
    [  172.941498] iwlwifi 0000:01:00.0: 0xC0081510 | umac data2
    [  172.946971] iwlwifi 0000:01:00.0: 0xDEADBEEF | umac data3
    [  172.952478] iwlwifi 0000:01:00.0: 0x00000010 | umac major
    [  172.957945] iwlwifi 0000:01:00.0: 0x0003B2EE | umac minor
    [  172.963409] iwlwifi 0000:01:00.0: 0xC0887F80 | frame pointer
    [  172.969138] iwlwifi 0000:01:00.0: 0xC0887F80 | stack pointer
    [  172.974888] iwlwifi 0000:01:00.0: 0x0935009C | last host cmd
    [  172.980570] iwlwifi 0000:01:00.0: 0x00000000 | isr status reg
    [  172.986396] ieee80211 phy0: Hardware restart was requested
    [  173.444663] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
    [  173.450413] iwlwifi 0000:01:00.0: Current SW read_ptr 53 write_ptr 54
    [  173.464156] iwl data: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [  173.473100] iwlwifi 0000:01:00.0: FH TRBs(0) = 0x8dbe5d15
    [  173.478544] iwlwifi 0000:01:00.0: FH TRBs(1) = 0xcff7d887
    [  173.483977] iwlwifi 0000:01:00.0: FH TRBs(2) = 0xd75f9392
    [  173.484926] iwlwifi 0000:01:00.0: L1 Disabled - LTR Disabled
    [  173.485199] iwlwifi 0000:01:00.0: L1 Disabled - LTR Disabled
    [  173.500816] iwlwifi 0000:01:00.0: FH TRBs(3) = 0x533fd5c4
    [  173.506260] iwlwifi 0000:01:00.0: FH TRBs(4) = 0xf2322248
    [  173.511702] iwlwifi 0000:01:00.0: FH TRBs(5) = 0xd411da1c
    [  173.517155] iwlwifi 0000:01:00.0: FH TRBs(6) = 0xde1040b0
    [  173.522594] iwlwifi 0000:01:00.0: FH TRBs(7) = 0x00709035
    [  173.528091] iwlwifi 0000:01:00.0: Q 0 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.536458] iwlwifi 0000:01:00.0: Q 1 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.544822] iwlwifi 0000:01:00.0: Q 2 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.553182] iwlwifi 0000:01:00.0: Q 3 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.561549] iwlwifi 0000:01:00.0: Q 4 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.569913] iwlwifi 0000:01:00.0: Q 5 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.578277] iwlwifi 0000:01:00.0: Q 6 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.586643] iwlwifi 0000:01:00.0: Q 7 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.595016] iwlwifi 0000:01:00.0: Q 8 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.603378] iwlwifi 0000:01:00.0: Q 9 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.611745] iwlwifi 0000:01:00.0: Q 10 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.620200] iwlwifi 0000:01:00.0: Q 11 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.628653] iwlwifi 0000:01:00.0: Q 12 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.637104] iwlwifi 0000:01:00.0: Q 13 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.645558] iwlwifi 0000:01:00.0: Q 14 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.654003] iwlwifi 0000:01:00.0: Q 15 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.662455] iwlwifi 0000:01:00.0: Q 16 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.670908] iwlwifi 0000:01:00.0: Q 17 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.679359] iwlwifi 0000:01:00.0: Q 18 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.687812] iwlwifi 0000:01:00.0: Q 19 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.696264] iwlwifi 0000:01:00.0: Q 20 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.704714] iwlwifi 0000:01:00.0: Q 21 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.713162] iwlwifi 0000:01:00.0: Q 22 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.721613] iwlwifi 0000:01:00.0: Q 23 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.730065] iwlwifi 0000:01:00.0: Q 24 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.738516] iwlwifi 0000:01:00.0: Q 25 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.746967] iwlwifi 0000:01:00.0: Q 26 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.755420] iwlwifi 0000:01:00.0: Q 27 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.763867] iwlwifi 0000:01:00.0: Q 28 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.772319] iwlwifi 0000:01:00.0: Q 29 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    [  173.780772] iwlwifi 0000:01:00.0: Q 30 is inactive and mapped to fifo 0 ra_tid 0x0000 [0,0]
    root@am57xx-phycore-rdk:~# ifconfig [  178.494636] iwlwifi 0000:01:00.0: Failed to load firmware chunk!
    [  178.500676] iwlwifi 0000:01:00.0: Could not load the [0] uCode section
    [  178.510110] iwlwifi 0000:01:00.0: Failed to start INIT ucode: -110
    [  178.516414] iwlwifi 0000:01:00.0: Failed to run INIT ucode: -110
    [  178.522594] ------------[ cut here ]------------
    [  178.527411] WARNING: CPU: 0 PID: 4 at net/mac80211/util.c:1818 ieee80211_reconfig+0x3e4/0xa50 [mac80211]()
    [  178.537149] Hardware became unavailable during restart.
    [  178.542397] Modules linked in: rpmsg_rpc rpmsg_pru rpmsg_proto xfrm_user xfrm4_tunnel bluetooth ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo dwc3 udc_core virtio_rpmsg_bus arc4 iwlmvm mac80211 ahci_platform libahci_platform pru_rproc snd_soc_simple_card libahci pruss_intc libata dwc3_omap pruss omap_sham omap_wdt omap_aes_driver ti_vpe c_can_platform ti_sc c_can videobuf2_dma_contig iwlwifi ti_csc ti_vpdma v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_core extcon_palmas extcon snd_soc_tlv320aic3x can_dev edt_ft5x06 omap_des rtc_omap omap_rng rng_core rtc_palmas cfg80211 omap_remoteproc remoteproc virtio virtio_ring sch_fq_codel
    [  178.599759] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.4.32-gadde2ca9f8 #3
    [  178.606926] Hardware name: Generic DRA74X (Flattened Device Tree)
    [  178.613117] Workqueue: events_freezable ieee80211_restart_work [mac80211]
    [  178.619940] Backtrace:
    [  178.622417] [<c00130ac>] (dump_backtrace) from [<c00132a8>] (show_stack+0x18/0x1c)
    [  178.630018]  r7:bf2ce2f0 r6:60090013 r5:00000000 r4:c093afcc
    [  178.635744] [<c0013290>] (show_stack) from [<c02b7a84>] (dump_stack+0x90/0xa4)
    [  178.643006] [<c02b79f4>] (dump_stack) from [<c0033794>] (warn_slowpath_common+0x88/0xb8)
    [  178.651129]  r7:bf2ce2f0 r6:0000071a r5:00000009 r4:ee09de40
    [  178.656849] [<c003370c>] (warn_slowpath_common) from [<c00337fc>] (warn_slowpath_fmt+0x38/0x40)
    [  178.665583]  r8:eed2d400 r7:00000000 r6:ffffff92 r5:ee7e29f4 r4:bf2e47b8
    [  178.672433] [<c00337c8>] (warn_slowpath_fmt) from [<bf2ce2f0>] (ieee80211_reconfig+0x3e4/0xa50 [mac80211])
    [  178.682125]  r3:ee08cf00 r2:bf2e47b8
    [  178.685729]  r4:ee7e23e0
    [  178.688411] [<bf2cdf0c>] (ieee80211_reconfig [mac80211]) from [<bf2a3730>] (ieee80211_restart_work+0x5c/0x88 [mac80211])
    [  178.699325]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e23e0 r5:ee7e29f4
    [  178.707233]  r4:ee7e29f4
    [  178.709854] [<bf2a36d4>] (ieee80211_restart_work [mac80211]) from [<c00484e8>] (process_one_work+0x1dc/0x3f8)
    [  178.719807]  r7:00000000 r6:eed29ac0 r5:ee05ae00 r4:ee7e2c50
    [  178.725530] [<c004830c>] (process_one_work) from [<c0049168>] (worker_thread+0x4c/0x524)
    [  178.733652]  r10:eed29ac0 r9:ee05ae00 r8:00000008 r7:ee09c000 r6:eed29ad4 r5:ee05ae18
    [  178.741560]  r4:eed29ac0
    [  178.744113] [<c004911c>] (worker_thread) from [<c004e49c>] (kthread+0xe4/0xfc)
    [  178.751364]  r10:00000000 r9:00000000 r8:00000000 r7:c004911c r6:ee05ae00 r5:ee043ec0
    [  178.759269]  r4:00000000
    [  178.761823] [<c004e3b8>] (kthread) from [<c000fb08>] (ret_from_fork+0x14/0x2c)
    [  178.769072]  r7:00000000 r6:00000000 r5:c004e3b8 r4:ee043ec0
    [  178.774919] ---[ end trace 86bf7c765a093b45 ]---
    [  178.787185] ------------[ cut here ]------------
    [  178.791902] WARNING: CPU: 0 PID: 4 at net/mac80211/driver-ops.h:12 drv_remove_interface+0x70/0x78 [mac80211]()
    [  178.801999] p2p-dev-wlp1s0:  Failed check-sdata-in-driver check, flags: 0x0
    [  178.809019] Modules linked in: rpmsg_rpc rpmsg_pru rpmsg_proto xfrm_user xfrm4_tunnel bluetooth ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo dwc3 udc_core virtio_rpmsg_bus arc4 iwlmvm mac80211 ahci_platform libahci_platform pru_rproc snd_soc_simple_card libahci pruss_intc libata dwc3_omap pruss omap_sham omap_wdt omap_aes_driver ti_vpe c_can_platform ti_sc c_can videobuf2_dma_contig iwlwifi ti_csc ti_vpdma v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_core extcon_palmas extcon snd_soc_tlv320aic3x can_dev edt_ft5x06 omap_des rtc_omap omap_rng rng_core rtc_palmas cfg80211 omap_remoteproc remoteproc virtio virtio_ring sch_fq_codel
    [  178.866346] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W       4.4.32-gadde2ca9f8 #3
    [  178.874730] Hardware name: Generic DRA74X (Flattened Device Tree)
    [  178.880909] Workqueue: events_freezable ieee80211_restart_work [mac80211]
    [  178.887730] Backtrace:
    [  178.890201] [<c00130ac>] (dump_backtrace) from [<c00132a8>] (show_stack+0x18/0x1c)
    [  178.897799]  r7:bf2a60f8 r6:60090013 r5:00000000 r4:c093afcc
    [  178.903520] [<c0013290>] (show_stack) from [<c02b7a84>] (dump_stack+0x90/0xa4)
    [  178.910778] [<c02b79f4>] (dump_stack) from [<c0033794>] (warn_slowpath_common+0x88/0xb8)
    [  178.918898]  r7:bf2a60f8 r6:0000000c r5:00000009 r4:ee09dd48
    [  178.924615] [<c003370c>] (warn_slowpath_common) from [<c00337fc>] (warn_slowpath_fmt+0x38/0x40)
    [  178.933346]  r8:ee7e298c r7:ece35000 r6:ee7e23e0 r5:ee7e298c r4:bf2e354c
    [  178.940166] [<c00337c8>] (warn_slowpath_fmt) from [<bf2a60f8>] (drv_remove_interface+0x70/0x78 [mac80211])
    [  178.949857]  r3:ece3511c r2:bf2e354c
    [  178.953457]  r4:ece35000
    [  178.956117] [<bf2a6088>] (drv_remove_interface [mac80211]) from [<bf2b613c>] (ieee80211_do_stop+0x4b0/0x73c [mac80211])
    [  178.966942]  r4:00000000
    [  178.969605] [<bf2b5c8c>] (ieee80211_do_stop [mac80211]) from [<bf2b760c>] (ieee80211_sdata_stop+0x20/0x54 [mac80211])
    [  178.980255]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e2000 r5:ee7e2000
    [  178.988157]  r4:ece35008
    [  178.990820] [<bf2b75ec>] (ieee80211_sdata_stop [mac80211]) from [<bf2baef0>] (ieee80211_stop_p2p_device+0x14/0x18 [mac80211])
    [  179.002168]  r5:ee7e2000 r4:ece35008
    [  179.005883] [<bf2baedc>] (ieee80211_stop_p2p_device [mac80211]) from [<bf0d2570>] (cfg80211_stop_p2p_device+0x50/0xf4 [cfg80211])
    [  179.017664] [<bf0d2520>] (cfg80211_stop_p2p_device [cfg80211]) from [<bf0d2684>] (cfg80211_shutdown_all_interfaces+0x70/0xa8 [cfg80211])
    [  179.029971]  r5:ee7e203c r4:ece35008
    [  179.033673] [<bf0d2614>] (cfg80211_shutdown_all_interfaces [cfg80211]) from [<bf2cbfac>] (ieee80211_handle_reconfig_failure+0xa8/0xdc [mac80211])
    [  179.046764]  r7:00000000 r6:ffffff92 r5:ee7e2b14 r4:ee7e23e0
    [  179.052591] [<bf2cbf04>] (ieee80211_handle_reconfig_failure [mac80211]) from [<bf2cdf94>] (ieee80211_reconfig+0x88/0xa50 [mac80211])
    [  179.064550]  r7:00000000 r6:ffffff92 r5:ee7e29f4 r4:ee7e23e0
    [  179.070373] [<bf2cdf0c>] (ieee80211_reconfig [mac80211]) from [<bf2a3730>] (ieee80211_restart_work+0x5c/0x88 [mac80211])
    [  179.081285]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e23e0 r5:ee7e29f4
    [  179.089185]  r4:ee7e29f4
    [  179.091792] [<bf2a36d4>] (ieee80211_restart_work [mac80211]) from [<c00484e8>] (process_one_work+0x1dc/0x3f8)
    [  179.101744]  r7:00000000 r6:eed29ac0 r5:ee05ae00 r4:ee7e2c50
    [  179.107458] [<c004830c>] (process_one_work) from [<c0049168>] (worker_thread+0x4c/0x524)
    [  179.115579]  r10:eed29ac0 r9:ee05ae00 r8:00000008 r7:ee09c000 r6:eed29ad4 r5:ee05ae18
    [  179.123480]  r4:eed29ac0
    [  179.126032] [<c004911c>] (worker_thread) from [<c004e49c>] (kthread+0xe4/0xfc)
    [  179.133281]  r10:00000000 r9:00000000 r8:00000000 r7:c004911c r6:ee05ae00 r5:ee043ec0
    [  179.141181]  r4:00000000
    [  179.143732] [<c004e3b8>] (kthread) from [<c000fb08>] (ret_from_fork+0x14/0x2c)
    [  179.150982]  r7:00000000 r6:00000000 r5:c004e3b8 r4:ee043ec0
    [  179.157138] ---[ end trace 86bf7c765a093b46 ]---
    [  179.161949] ------------[ cut here ]------------
    [  179.166709] WARNING: CPU: 0 PID: 4 at net/mac80211/driver-ops.h:12 drv_remove_interface+0x70/0x78 [mac80211]()
    [  179.177017] wlp1s0:  Failed check-sdata-in-driver check, flags: 0x0                                                                                                              wpa_supplicant -c /etc/wpa_supplicant.conf -i wlp1s0 -B[  179.183398] Modules linked in: rpmsg_rpc rpmsg_pru rpmsg_proto xfrm_user xfrm4_tunnel bluetooth ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo dwc3 udc_core virtio_rpmsg_bus arc4 iwlmvm mac80211 ahci_platform libahci_platform pru_rproc snd_soc_simple_card libahci pruss_intc libata dwc3_omap pruss omap_sham omap_wdt omap_aes_driver ti_vpe c_can_platform ti_sc c_can videobuf2_dma_contig iwlwifi ti_csc ti_vpdma v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_core extcon_palmas extcon snd_soc_tlv320aic3x can_dev edt_ft5x06 omap_des rtc_omap omap_rng rng_core rtc_palmas cfg80211 omap_remoteproc remoteproc virtio virtio_ring sch_fq_codel
    [  179.246194] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W       4.4.32-gadde2ca9f8 #3
    [  179.254579] Hardware name: Generic DRA74X (Flattened Device Tree)
    [  179.260751] Workqueue: events_freezable ieee80211_restart_work [mac80211]
    [  179.267569] Backtrace:
    [  179.270035] [<c00130ac>] (dump_backtrace) from [<c00132a8>] (show_stack+0x18/0x1c)
    [  179.277630]  r7:bf2a60f8 r6:60090013 r5:00000000 r4:c093afcc
    [  179.283347] [<c0013290>] (show_stack) from [<c02b7a84>] (dump_stack+0x90/0xa4)
    [  179.290603] [<c02b79f4>] (dump_stack) from [<c0033794>] (warn_slowpath_common+0x88/0xb8)
    [  179.298722]  r7:bf2a60f8 r6:0000000c r5:00000009 r4:ee09dd10
    [  179.304431] [<c003370c>] (warn_slowpath_common) from [<c00337fc>] (warn_slowpath_fmt+0x38/0x40)
    [  179.313162]  r8:ee7e298c r7:ee0f14c0 r6:ee7e23e0 r5:ee7e298c r4:bf2e354c
    [  179.319965] [<c00337c8>] (warn_slowpath_fmt) from [<bf2a60f8>] (drv_remove_interface+0x70/0x78 [mac80211])
    [  179.329654]  r3:ee0f1000 r2:bf2e354c
    [  179.333251]  r4:ee0f14c0
    [  179.335885] [<bf2a6088>] (drv_remove_interface [mac80211]) from [<bf2b613c>] (ieee80211_do_stop+0x4b0/0x73c [mac80211])
    [  179.346709]  r4:00000000
    [  179.349346] [<bf2b5c8c>] (ieee80211_do_stop [mac80211]) from [<bf2b63e0>] (ieee80211_stop+0x18/0x20 [mac80211])
    [  179.359471]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000001 r6:ee09de10 r5:ee09de10
    [  179.367365]  r4:ee0f1000
    [  179.369958] [<bf2b63c8>] (ieee80211_stop [mac80211]) from [<c058014c>] (__dev_close_many+0x90/0xd8)
    [  179.379043] [<c05800bc>] (__dev_close_many) from [<c058020c>] (dev_close_many+0x78/0x104)
    [  179.387251]  r5:ee7e203c r4:ee0f14c8
    [  179.390854] [<c0580194>] (dev_close_many) from [<c0584300>] (dev_close+0x44/0x60)
    [  179.398363]  r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e2000 r5:ee7e203c r4:ee0f14c8
    [  179.406226] [<c05842bc>] (dev_close) from [<bf0d2650>] (cfg80211_shutdown_all_interfaces+0x3c/0xa8 [cfg80211])
    [  179.416346] [<bf0d2614>] (cfg80211_shutdown_all_interfaces [cfg80211]) from [<bf2cbfac>] (ieee80211_handle_reconfig_failure+0xa8/0xdc [mac80211])
    [  179.429437]  r7:00000000 r6:ffffff92 r5:ee7e2b14 r4:ee7e23e0
    [  179.435231] [<bf2cbf04>] (ieee80211_handle_reconfig_failure [mac80211]) from [<bf2cdf94>] (ieee80211_reconfig+0x88/0xa50 [mac80211])
    [  179.447187]  r7:00000000 r6:ffffff92 r5:ee7e29f4 r4:ee7e23e0
    [  179.452981] [<bf2cdf0c>] (ieee80211_reconfig [mac80211]) from [<bf2a3730>] (ieee80211_restart_work+0x5c/0x88 [mac80211])
    [  179.463893]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e23e0 r5:ee7e29f4
    [  179.471783]  r4:ee7e29f4
    [  179.474375] [<bf2a36d4>] (ieee80211_restart_work [mac80211]) from [<c00484e8>] (process_one_work+0x1dc/0x3f8)
    [  179.484326]  r7:00000000 r6:eed29ac0 r5:ee05ae00 r4:ee7e2c50
    [  179.490033] [<c004830c>] (process_one_work) from [<c0049168>] (worker_thread+0x4c/0x524)
    [  179.498152]  r10:eed29ac0 r9:ee05ae00 r8:00000008 r7:ee09c000 r6:eed29ad4 r5:ee05ae18
    [  179.506041]  r4:eed29ac0
    [  179.508590] [<c004911c>] (worker_thread) from [<c004e49c>] (kthread+0xe4/0xfc)
    [  179.515838]  r10:00000000 r9:00000000 r8:00000000 r7:c004911c r6:ee05ae00 r5:ee043ec0
    [  179.523727]  r4:00000000
    [  179.526275] [<c004e3b8>] (kthread) from [<c000fb08>] (ret_from_fork+0x14/0x2c)
    [  179.533522]  r7:00000000 r6:00000000 r5:c004e3b8 r4:ee043ec0
    [  179.539422] ---[ end trace 86bf7c765a093b47 ]---
    [  179.544067] ------------[ cut here ]------------
    [  179.548785] WARNING: CPU: 0 PID: 4 at net/mac80211/driver-ops.c:39 drv_stop+0x9c/0xa0 [mac80211]()
    [  179.557794] Modules linked in: rpmsg_rpc rpmsg_pru rpmsg_proto xfrm_user xfrm4_tunnel bluetooth ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo dwc3 udc_core virtio_rpmsg_bus arc4 iwlmvm mac80211 ahci_platform libahci_platform pru_rproc snd_soc_simple_card libahci pruss_intc libata dwc3_omap pruss omap_sham omap_wdt omap_aes_driver ti_vpe c_can_platform ti_sc c_can videobuf2_dma_contig iwlwifi ti_csc ti_vpdma v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_core extcon_palmas extcon snd_soc_tlv320aic3x can_dev edt_ft5x06 omap_des rtc_omap omap_rng rng_core rtc_palmas cfg80211 omap_remoteproc remoteproc virtio virtio_ring sch_fq_codel
    [  179.614944] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W       4.4.32-gadde2ca9f8 #3
    [  179.623328] Hardware name: Generic DRA74X (Flattened Device Tree)
    [  179.629493] Workqueue: events_freezable ieee80211_restart_work [mac80211]
    [  179.636312] Backtrace:
    [  179.638777] [<c00130ac>] (dump_backtrace) from [<c00132a8>] (show_stack+0x18/0x1c)
    [  179.646375]  r7:bf2a5f94 r6:60090013 r5:00000000 r4:c093afcc
    [  179.652089] [<c0013290>] (show_stack) from [<c02b7a84>] (dump_stack+0x90/0xa4)
    [  179.659343] [<c02b79f4>] (dump_stack) from [<c0033794>] (warn_slowpath_common+0x88/0xb8)
    [  179.667464]  r7:bf2a5f94 r6:00000027 r5:00000009 r4:00000000
    [  179.673172] [<c003370c>] (warn_slowpath_common) from [<c0033868>] (warn_slowpath_null+0x24/0x2c)
    [  179.681988]  r8:ee7e298c r7:ee0f14c0 r6:ee7e23e0 r5:ee7e298c r4:ee7e23e0
    [  179.688791] [<c0033844>] (warn_slowpath_null) from [<bf2a5f94>] (drv_stop+0x9c/0xa0 [mac80211])
    [  179.697611] [<bf2a5ef8>] (drv_stop [mac80211]) from [<bf2cdf08>] (ieee80211_stop_device+0x40/0x44 [mac80211])
    [  179.707561]  r5:ee7e298c r4:ee7e23e0
    [  179.711251] [<bf2cdec8>] (ieee80211_stop_device [mac80211]) from [<bf2b60f0>] (ieee80211_do_stop+0x464/0x73c [mac80211])
    [  179.722160]  r5:ee7e298c r4:00000000
    [  179.725846] [<bf2b5c8c>] (ieee80211_do_stop [mac80211]) from [<bf2b63e0>] (ieee80211_stop+0x18/0x20 [mac80211])
    [  179.735971]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000001 r6:ee09de10 r5:ee09de10
    [  179.743866]  r4:ee0f1000
    [  179.746458] [<bf2b63c8>] (ieee80211_stop [mac80211]) from [<c058014c>] (__dev_close_many+0x90/0xd8)
    [  179.755540] [<c05800bc>] (__dev_close_many) from [<c058020c>] (dev_close_many+0x78/0x104)
    [  179.763748]  r5:ee7e203c r4:ee0f14c8
    [  179.767349] [<c0580194>] (dev_close_many) from [<c0584300>] (dev_close+0x44/0x60)
    [  179.774858]  r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e2000 r5:ee7e203c r4:ee0f14c8
    [  179.782709] [<c05842bc>] (dev_close) from [<bf0d2650>] (cfg80211_shutdown_all_interfaces+0x3c/0xa8 [cfg80211])
    [  179.792829] [<bf0d2614>] (cfg80211_shutdown_all_interfaces [cfg80211]) from [<bf2cbfac>] (ieee80211_handle_reconfig_failure+0xa8/0xdc [mac80211])
    [  179.805921]  r7:00000000 r6:ffffff92 r5:ee7e2b14 r4:ee7e23e0
    [  179.811717] [<bf2cbf04>] (ieee80211_handle_reconfig_failure [mac80211]) from [<bf2cdf94>] (ieee80211_reconfig+0x88/0xa50 [mac80211])
    [  179.823674]  r7:00000000 r6:ffffff92 r5:ee7e29f4 r4:ee7e23e0
    [  179.829467] [<bf2cdf0c>] (ieee80211_reconfig [mac80211]) from [<bf2a3730>] (ieee80211_restart_work+0x5c/0x88 [mac80211])
    [  179.840377]  r10:eed29ac0 r9:00000000 r8:eed2d400 r7:00000000 r6:ee7e23e0 r5:ee7e29f4
    [  179.848269]  r4:ee7e29f4
    [  179.850860] [<bf2a36d4>] (ieee80211_restart_work [mac80211]) from [<c00484e8>] (process_one_work+0x1dc/0x3f8)
    [  179.860811]  r7:00000000 r6:eed29ac0 r5:ee05ae00 r4:ee7e2c50
    [  179.866519] [<c004830c>] (process_one_work) from [<c0049168>] (worker_thread+0x4c/0x524)
    [  179.874638]  r10:eed29ac0 r9:ee05ae00 r8:00000008 r7:ee09c000 r6:eed29ad4 r5:ee05ae18
    [  179.882531]  r4:eed29ac0
    [  179.885080] [<c004911c>] (worker_thread) from [<c004e49c>] (kthread+0xe4/0xfc)
    [  179.892326]  r10:00000000 r9:00000000 r8:00000000 r7:c004911c r6:ee05ae00 r5:ee043ec0
    [  179.900216]  r4:00000000
    [  179.902767] [<c004e3b8>] (kthread) from [<c000fb08>] (ret_from_fork+0x14/0x2c)
    [  179.910014]  r7:00000000 r6:00000000 r5:c004e3b8 r4:ee043ec0
    [  179.915918] ---[ end trace 86bf7c765a093b48 ]---

    The modifications in comparison to the binary release of the sdk is basically the devicetree for the phytec board and the additional iwlwifi-mvm kernel-module as it is not enabled by default.

    We now ordered the AM57xx IDK to verify if that also happens with TI supplied Boards. I expect it to arrive next week.

    Any other ideas what I can look at?

  • We now got the AM572x IDK.  Basically the outcome is the same.

    The kernel used is `Linux am57xx-evm 4.4.32-rt41-ge26c84b0ac #1 SMP PREEMPT RT Tue Jan 31 10:10:54 CET 2017 armv7l GNU/Linux` built from "am57xx-evm-linux-rt-sdk-src-03.02.00.05.tar.xz". Only modification is the enabling of the iwlmvm kernel module.

    I get a slightly more stable behaviour when I disable the MSI by booting with 'pci=nomsi' but in the end the same crash happens.

    My gut feeling is that there is an interrupt not recognized by the PCIe Root Complex or the pci driver.

    Any ideas how we can further approach this issue?

  • For Reference, I posted this issue on the kernel bugzilla: bugzilla.kernel.org/show_bug.cgi

    Intel is not willing to provide any assistance on this issue (at least without support contract).
    Though I'm still not really sure it is an Intel problem or if this is more a PCIe/Interrupt Sitara Issue.

    I would be really happy if anyone can report a working PCIe Wifi Card on the Sitara so I can compare what is different.
  • We now repeated the test with a Marvell based wifi chipset (Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless).

    Essentially with the same result. Output is attached at the end of this post.

    In both cases we have a device which utilizes at least 4 DMA Queues over PCIe and suddenly one queue stops transfering.

    In comparison an Intel Ethernet Gigabit card is working at full speed without any issues. As far as I know this card only uses one DMA queue. So I'm suspecting an issue with switching between DMA Queues when using PCIe. Is there any example/proof/test which shows multiple DMA queues working over PCIe on an AM57xx Processor without stalling?

    [  418.427955] mwifiex_pcie 0000:01:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0x107, act = 0x0
    [  418.437429] mwifiex_pcie 0000:01:00.0: num_data_h2c_failure = 0
    [  418.443658] mwifiex_pcie 0000:01:00.0: num_cmd_h2c_failure = 0
    [  418.449789] mwifiex_pcie 0000:01:00.0: is_cmd_timedout = 1
    [  418.455546] mwifiex_pcie 0000:01:00.0: num_tx_timeout = 0
    [  418.461217] mwifiex_pcie 0000:01:00.0: last_cmd_index = 2
    [  418.466874] mwifiex_pcie 0000:01:00.0: last_cmd_id: 1e 00 ce 00 07 01 5e 00 0f 01
    [  418.474727] mwifiex_pcie 0000:01:00.0: last_cmd_act: 00 00 90 e4 00 00 01 00 01 00
    [  418.482678] mwifiex_pcie 0000:01:00.0: last_cmd_resp_index = 1
    [  418.488796] mwifiex_pcie 0000:01:00.0: last_cmd_resp_id: 1e 80 ce 80 5e 80 5e 80 0f 81
    [  418.497107] mwifiex_pcie 0000:01:00.0: last_event_index = 2
    [  418.502965] mwifiex_pcie 0000:01:00.0: last_event: 0a 00 0b 00 0a 00 0a 00 0b 00
    [  418.510730] mwifiex_pcie 0000:01:00.0: data_sent=1 cmd_sent=1
    [  418.516757] mwifiex_pcie 0000:01:00.0: ps_mode=1 ps_state=0
    [  418.530646] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump start===
    [  418.538578] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p77)
    [  418.547092] mwifiex_pcie 0000:01:00.0: PCIE register dump start
    [  418.553326] mwifiex_pcie 0000:01:00.0: pcie scratch register:
    [  418.559379] mwifiex_pcie 0000:01:00.0: reg:0xcf0, value=0xfedcba00
    [  418.559379] reg:0xcf8, value=0x4e002c
    [  418.559379] reg:0xcfc, value=0x2737300
    [  418.559379]
    [  418.575169] mwifiex_pcie 0000:01:00.0: PCIE register dump end
    [  418.581275] mwifiex_pcie 0000:01:00.0: ===mwifiex driverinfo dump end===
    [  418.588315] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump start ==
    [  418.599200] mwifiex_pcie 0000:01:00.0: Firmware dump Finished!
    [  418.605326] mwifiex_pcie 0000:01:00.0: == mwifiex firmware dump end ==
    [  418.613351] mwifiex_pcie 0000:01:00.0: == mwifiex dump information to /sys/class/devcoredump start
    [  418.622909] mwifiex_pcie 0000:01:00.0: == mwifiex dump information to /sys/class/devcoredump end
    
    
    # cat  /sys/class/devcoredump/devcd1/data
    
    ========Start dump driverinfo========
    driver_name = "mwifiex"
    driver_version = mwifiex 1.0 (15.68.7.p77)
    tx_pending = 27
    rx_pending = 0
    
    [interface  : "wlp1s0"]
    wmm_tx_pending[0] = 0
    wmm_tx_pending[1] = 27
    wmm_tx_pending[2] = 0
    wmm_tx_pending[3] = 0
    media_state="Connected"
    carrier on
    tx queue 0:started  tx queue 1:started  tx queue 2:started  tx queue 3:started
    wlp1s0: num_tx_timeout = 0
    
    === PCIE register dump===
    reg:0xcf0, value=0xfedcba00
    reg:0xcf8, value=0x4e002c
    reg:0xcfc, value=0x2737300
    
    
    === more debug information
    debug_mask=0x7
    int_counter=0x0
    wmm_ac_vo=0x0
    wmm_ac_vi=0x0
    wmm_ac_be=0x0
    wmm_ac_bk=0x43
    tx_buf_size=0xe00
    curr_tx_buf_size=0xe00
    ps_mode=0x1
    ps_state=0x0
    is_deep_sleep=0x1
    wakeup_dev_req=0x0
    wakeup_tries=0x0
    hs_configured=0x0
    hs_activated=0x0
    num_tx_timeout=0x0
    is_cmd_timedout=0x1
    timeout_cmd_id=0x107
    timeout_cmd_act=0x0
    last_cmd_id=0x1e 0xce 0x107 0x5e 0x10f
    last_cmd_act=0x0 0xe490 0x0 0x1 0x1
    last_cmd_index=0x2
    last_cmd_resp_id=0x801e 0x80ce 0x805e 0x805e 0x810f
    last_cmd_resp_index=0x1
    last_event=0xa 0xb 0xa 0xa 0xb
    last_event_index=0x2
    last_mp_wr_bitmap=0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    last_mp_wr_ports=0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    last_mp_wr_len=0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    last_mp_curr_wr_port=0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
    last_sdio_mp_index=0x0
    num_cmd_h2c_fail=0x0
    num_cmd_sleep_cfm_fail=0x0
    num_tx_h2c_fail=0x0
    num_evt_deauth=0x0
    num_evt_disassoc=0x0
    num_evt_link_lost=0x0
    num_cmd_deauth=0x0
    num_cmd_assoc_ok=0x1
    num_cmd_assoc_fail=0x0
    cmd_sent=0x1
    data_sent=0x1
    cmd_resp_received=0x0
    event_received=0x0
    cmd_pending=0x0
    tx_pending=0x1b
    rx_pending=0x0
    Tx BA stream table:
    tid = 0, ra = e4:6f:13:2e:a5:50
    Rx reorder table:
    tid = 0, ta = e4:6f:13:2e:a5:50, start_win = 28, win_size = 64, buffer: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    
    ========End dump========
    

  • To round things up we now also tried with a Qualcom WNFQ-258ACN based Module from Sparklan. So I can now verify that all major wifi module Linux drivers (intel / marvell / atheros) at some point starve with the am5728 pcie interface. At least this driver has the best reproducability of the issue as it basically stops at initiatialization:

    [    6.939807] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
    [    6.946421] ath10k_pci 0000:01:00.0: enabling bus mastering
    [    6.946774] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
    [    7.363572] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:01:00.0.bin failed with error -2
    [    7.374919] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/cal-pci-0000:01:00.0.bin failed with error -2
    [    7.496751] ath10k_pci 0000:01:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0258
    [    7.506468] ath10k_pci 0000:01:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 0 testmode 0
    [    7.516641] ath10k_pci 0000:01:00.0: firmware ver WLAN.RM.4.4-00022-QCARMSWPZ-2 api 5 features wowlan,ignore-otp crc32 4d458559
    [    7.621560] ath10k_pci 0000:01:00.0: failed to fetch board data for bus=pci,bmi-chip-id=0,bmi-board-id=0 from ath10k/QCA6174/hw3.0/board-2.bin
    [    7.648083] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id 0:0 crc32 ed5f849a
    [    9.884723] ath10k_pci 0000:01:00.0: Target ready! transmit resources: 2 size:1792
    [    9.884737] ath10k_pci 0000:01:00.0: ath10k_htc_build_tx_ctrl_skb: skb ed423c00
    [    9.884806] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 0 skb ed423c00
    [    9.884978] ath10k_pci 0000:01:00.0: HTC Service HTT Data connect response: status: 0x0, assigned ep: 0x1
    [    9.884988] ath10k_pci 0000:01:00.0: ath10k_htc_build_tx_ctrl_skb: skb ed6a13c0
    [    9.885053] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 0 skb ed6a13c0
    [    9.885219] ath10k_pci 0000:01:00.0: HTC Service WMI connect response: status: 0x0, assigned ep: 0x2
    [    9.885231] ath10k_pci 0000:01:00.0: ath10k_htc_build_tx_ctrl_skb: skb ec88f000
    [    9.885237] ath10k_pci 0000:01:00.0: HTC is using TX credit flow control
    [    9.885287] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 0 skb ec88f000
    [    9.885506] ath10k_pci 0000:01:00.0: htc rx completion ep 2 skb ec9ed540
    [    9.885540] ath10k_pci 0000:01:00.0: wmi tlv abi 0x01000000 ?= 0x01000000, 0x5f414351 ?= 0x5f414351, 0x00004c4d ?= 0x00004c4d, 0x00000000 ?= 0x00000000, 0x00000000 ?= 0x00000000
    [    9.885555] ath10k_pci 0000:01:00.0: wmi svc: 00000000: 0d 00 00 00 07 00 00 00 0f 00 00 00 03 00 00 00  ................
    [    9.885563] ath10k_pci 0000:01:00.0: wmi svc: 00000010: 0f 00 00 00 0f 00 00 00 0b 00 00 00 0f 00 00 00  ................
    [    9.885572] ath10k_pci 0000:01:00.0: wmi svc: 00000020: 0b 00 00 00 0b 00 00 00 00 00 00 00 0a 00 00 00  ................
    [    9.885580] ath10k_pci 0000:01:00.0: wmi svc: 00000030: 00 00 00 00 04 00 00 00 07 00 00 00 0e 00 00 00  ................
    [    9.885587] ath10k_pci 0000:01:00.0: wmi svc: 00000040: 0a 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
    [    9.885594] ath10k_pci 0000:01:00.0: wmi svc: 00000050: 01 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
    [    9.885602] ath10k_pci 0000:01:00.0: wmi svc: 00000060: 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
    [    9.885609] ath10k_pci 0000:01:00.0: wmi svc: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    [    9.885622] ath10k_pci 0000:01:00.0: wmi event service ready min_tx_power 0x0000003f max_tx_power 0x0000003f ht_cap 0x0000085b vht_cap 0x339011b2 sw_ver0 0x01000000 sw_ver1 0x00000138 fw_build 0x44080016 phy_capab 0x00000003 num_rf_chains 0x00000002 eeprom_rd 0x0000006c num_mem_reqs 0x00000000
    [    9.885678] ath10k_pci 0000:01:00.0: wmi tlv init
    [    9.885689] ath10k_pci 0000:01:00.0: htc ep 2 consumed 1 credits (total 1)
    [    9.885742] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 2 skb ec86d900
    [    9.900869] ath10k_pci 0000:01:00.0: htc rx completion ep 2 skb ec9ed600
    [    9.900900] ath10k_pci 0000:01:00.0: wmi event ready sw_version 16777216 abi_version 312 mac_addr 00:0e:8e:75:fa:cc status 0
    [    9.901060] ath10k_pci 0000:01:00.0: wmi tlv vdev create
    [    9.901070] ath10k_pci 0000:01:00.0: htc ep 2 consumed 1 credits (total 0)
    [    9.901085] ath10k_pci 0000:01:00.0: wmi tlv vdev delete
    [    9.901134] ath10k_pci 0000:01:00.0: htc rx completion ep 2 skb ec9ed6c0
    [    9.901143] ath10k_pci 0000:01:00.0: wmi tlv diag event len 364
    [    9.901172] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 2 skb ec865300
    [    9.903752] ath10k_pci 0000:01:00.0: htc ep 2 got 1 credits (total 1)
    [    9.903782] ath10k_pci 0000:01:00.0: htc rx completion ep 2 skb ec9ed780
    [    9.903792] ath10k_pci 0000:01:00.0: Unknown eventid: 90118
    [    9.909719] ath10k_pci 0000:01:00.0: htc ep 2 consumed 1 credits (total 0)
    [    9.909756] ath10k_pci 0000:01:00.0: wmi tlv echo value 0x0ba991e9
    [    9.909795] ath10k_pci 0000:01:00.0: htc ep 2 got 1 credits (total 1)
    [    9.909829] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 2 skb ec8653c0
    [    9.909862] ath10k_pci 0000:01:00.0: htc ep 2 consumed 1 credits (total 0)
    [   12.987702] ath10k_pci 0000:01:00.0: failed to ping firmware: -110
    [   12.994190] ath10k_pci 0000:01:00.0: failed to reset rx filter: -110
    [   13.109003] ath10k_pci 0000:01:00.0: ath10k_htc_notify_tx_completion: ep 2 skb ec866780
    [   13.110539] ath10k_pci 0000:01:00.0: could not init core (-110)
    [   13.116857] ath10k_pci 0000:01:00.0: could not probe fw (-110)

    Regards,

    Michael

  • Michael,

    Thank you for your hard work providing all of these details. I'm sorry it has taken us a while to get back to you. I am in the process of procuring some of these cards so that we can replicate these results and debug the issue. We will get back to you next week with our findings and will try to drive this issue to closure ASAP.
  • Hi Ron,

    great to hear you're looking into the issue. Please have a look at this thread: e2e.ti.com/.../2129397

    I patched the interrupt handler and now have had a stable connection running over the weekend with constant 300MBit/s traffic (With Intel AC8265). But I would highly appreciate if TI can supply an official patch which fixes the race condition on interrupts with high frequencies.

    Sidenote: The Qualcom card still doesn't initialize so there might be an Atheros driver issue. I have to check the Marvell though.

    Regards,
    Michael
  • Hi, Mike,

    I want to update you our status. TI is able to reproduce the issue using 8260 wifi card over M.2 adapter on AM572x GP EVM. Your workaround has been submitted for internal review. Once I hear back from the review, I'll post it here. Really appreciate your effort in digging into the issue.

    Rex
  • Hi Rex, 

    great to hear you could reproduce this. I saw that I didn't provide the patch we're currently using. This is how I patched drivers/pci/dwc/pci-dra7xx.c (from the ti-linux-4.9.y tree)

    static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
    {
        struct dra7xx_pcie *dra7xx = arg;
        struct dw_pcie *pci = dra7xx->pci;
        struct pcie_port *pp = &pci->pp;
        u32 reg;
    
        reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
    
        switch (reg) {
        case MSI:
                dw_handle_msi_irq(pp);
                break;
        case INTA:
        case INTB:
        case INTC:
        case INTD:
                generic_handle_irq(irq_find_mapping(dra7xx->irq_domain,
                                                    ffs(reg)));
                break;
        }
    
        dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
        dw_pcie_write(pci->dbi_base + PCIE_MSI_INTR0_STATUS, 4, 0xffffffff);
    
            return IRQ_HANDLED;
    }

    This does risk missing interrupts but at least avoids deadlock situations as have been happening before. But I'm still not sure if this is the best way to handle this. (As you have to always clear two interrupt registers, race conditions are unavoidable).

    Regards,

    Michael

  • Hi, Michael,

    I patched the code based on your post on the other thread with a while loop, and was able to get the wireless ping ran longer. I submitted both patches to the jira system, and will discuss them in the internal meeting to see if there is a better way to handle the situation. I'll let you know as soon as I find out.

    Rex

  • Hi Rex,

    Just some more information:

    I used iperf2 to get a constant datarate of ~300MBit/s. With the patch in the other thread I ran into the problem after ~8 hours of constant traffic. 

    With the patch above I had stable connections for multiple days.  

    In the end the reason is obvious: With the first patch there is still a few us time where a bit in PCIE_MSI_INTR0_STATUS might be set before PCIECTRL_TI_CONF_IRQSTATUS_MSI  is cleared. And PCIECTRL_TI_CONF_IRQSTATUS_MSI   won't be set again if one bit in PCIE_MSI_INTR0_STATUS  is already set.

    The second patch makes sure that PCIE_MSI_INTR0_STATUS  is cleared after PCIECTRL_TI_CONF_IRQSTATUS_MSI  is cleared. So PCIECTRL_TI_CONF_IRQSTATUS_MSI  will be set when the next irq bit PCIE_MSI_INTR0_STATUS is set.

    Still, having to clear two registers calls for race conditions.

    Regards,

    Michael

  • Hi, Michael,

    Just want to update you with comments of TI's code review:

    "IIUC there is a small instance of time while the code returns from dw_handle_msi_irq but is yet to return from dra7xx_pcie_msi_irq_handler when the EP device raises the MSI interrupt. Ideally we would have liked the MSI controller to raise an interrupt. I have to think if any better fix is there but just clearing interrupts doesn't look correct"

    I'll update you once we have a better fix.

    Rex
  • Hi Rex,

    did you make any progress on this issue?

    Regards,
    Michael
  • Hi, Michael,

    The developer just got back to work last week from a paternity leave and ordered the parts needed for his site. We are working on it.

    Rex

  • Hi Rex,

    any update on this issue?

    Regards,
    Michael
  • Hi, Michael,
    The developer acquired the equipment in end of April. I just pinged them for the status and see which release it is targeted for. I'll post back if I get the target date.
    Rex
  • Hi, Michael,

    Just got response back from developer. It should be available in its next release in 2 weeks. I'll monitor the release and get back to you.

    Rex
  • Hi, Michael,

    Development team has the target date set to Sprint 14 (August 2nd). If you don't mind, I'll close this thread for now, but will come back to update with latest status. Thanks!

    Rex
  • Hi Rex,

    thanks for the update. Where will I be able to see the results of "Sprint 14"? Will this appear in the Kernel Git (git.ti.com/.../ti-linux-kernel) at some point? Or is this part of some future SDK release?

    Regards,
    Michael
  • Michael,

    Sprint release is our internal bi-weekly goal. It should be available in future release. I had the patch last week and meant to verify under my environment, but just haven't had time to do it yet. Developer verified in its setup. I'll let you know once I verify it, and can send you the patch then.

    Rex
  • Hi, Michael,

    Could you try with the following changes to see if the issue still happens.

    diff --git a/drivers/pci/dwc/pcie-designware-host.c b/drivers/pci/dwc/pcie-designware-host.c
    index aaa207a..1312c5b 100644
    --- a/drivers/pci/dwc/pcie-designware-host.c
    +++ b/drivers/pci/dwc/pcie-designware-host.c
    @@ -69,9 +69,9 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
    
                            while ((pos = find_next_bit(&val, 32, pos)) != 32) {
                                    irq = irq_find_mapping(pp->irq_domain,
                                                           i * 32 + pos);
    +                               generic_handle_irq(irq);
                                    dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
                                                        i * 12, 4, 1 << pos);
    -                               generic_handle_irq(irq);
                                    pos++;
                            }

    It should write back to the status register after the interrupt is handled. Not before.

    The file is located in controller folder in older release instead of dwc folder in newer release. Please post back and let us know how it goes.

    Thanks!

    Rex 

  • Hi Rex,

    thanks for the update. Actually that is a fix I did try myself before. It does mitigate the problem to happen less often. But it still happens. I just retried it myself and was able to produce a driver crash within 20 minutes. (Using iperf3 traffic with roughly 100MBit/s).

    Regards,
    Michael
  • Michael,

    Thanks for the feedback. It's good to know. The developer was able to reproduce the issue, but not seeing it after the fix. We'll try using iperf to investigate more.

    Rex
  • Michael,

    The developer had some issue using Intel 8260. The PCI link isn't stable, so he uses centrino N-1000 instead. He was able to reproduce the issue, but not happening for 1 hour after applying the patch. I went through a few product feature pages, and don't see any speicification of 4 DMAs. How can you tell it's a 4 DMA card or not. Since he was able to reproduce it with N-1000, we wonder if it takes longer to happen. So, we have the test run longer period. I think the bottom line should get the 8260 up at his site. I'll check if I should ship my card to him.

    Rex
  • I ran into a similar issue with a QCA9884 based WiFi Card using the ath10k driver. It was missing irqs and locking up the the firmware.

    If I set the pci interface to use legacy interrupts (by setting the irq_mode=1 argument when loading the ath10k_pci module) it seems to work fine.
  • Charles,

    Thanks for input. That will be the workaround for now before TI fixes the MSI issue.

    Michael,

    We found the procedure to handle MSI interrupts in the TRM is incorrect and is leading to these missed irqs. We are in the process of identifying the correct method with the hardware apps team and fixing it in Kernel.

    Rex

  • Michael,

    Please try attached patch to see if it helps in your setup. Our test ran for 7 days without failure. This is not final version till TRM is modified. Let me know how it goes. Thanks!

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/354/2538.0001_2D00_Long_2D00_Term_2D00_Test.patch

    Rex