This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

g_ether causing a kernel panic in SDK 7.0 Linux version 3.12 on am335x board

I am trying to connect 2 am335x boards using a usb cable.  On one board it uses a host usb port, on the other a device usb port.  The stability is very unreliable and often I get a kernel panic.  Here is the output from one panic

[   11.607687] Kernel BUG at c059ffe8 [verbose debug info unavailable]
[   11.614259] Internal error: Oops - BUG: 0 [#1] ARM
[   11.619283] Modules linked in: usb_f_ecm g_ether usb_f_rndis u_rndis libcomposite u_ether musb_dsps musb_hdrc musb_am335x
[   11.630827] CPU: 0 PID: 1426 Comm: ConfigSrvc Not tainted 3.12.10-Edwards.00.00.04 #2
[   11.639043] task: cb219b80 ti: cb08e000 task.ti: cb08e000
[   11.644719] PC is at skb_panic+0x5c/0x68
[   11.648838] LR is at irq_work_queue+0x5c/0xbc
[   11.653406] pc : [<c059ffe8>]    lr : [<c008efd4>]    psr: 200b0193
[   11.653406] sp : cb08fe18  ip : 00000000  fp : cb08fe4c
[   11.665447] r10: 00000100  r9 : 00000006  r8 : cb08e000
[   11.670925] r7 : c0734b60  r6 : cb20f700  r5 : 000007e4  r4 : cb20f742
[   11.677769] r3 : 00000001  r2 : c0827584  r1 : c082683c  r0 : 0000007b
[   11.684617] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   11.692190] Control: 10c5387d  Table: 8b3e4019  DAC: 00000015
[   11.698215] Process ConfigSrvc (pid: 1426, stack limit = 0xcb08e240)
[   11.704879] Stack: (0xcb08fe18 to 0xcb090000)
[   11.709447] fe00:                                                       000007e4 cb20f700
[   11.718028] fe20: cb20f742 cb20ff26 cb20fd40 cb0ea800 000007e4 cb20ff26 cb20fd40 00000000
[   11.726608] fe40: cb08fe6c cb08fe50 c049b8d4 c059ff98 cb0eac40 cb32fa40 cd130c80 00000000
[   11.735188] fe60: cb08fe8c cb08fe70 c03d8680 c049b88c cb32fa40 600b0113 00000000 00000000
[   11.743769] fe80: cb08feac cb08fe90 c03e2c30 c03d8664 cb32fa54 cb08feb0 cd5a76e8 cd5a76ec
[   11.752349] fea0: cb08fed4 cb08feb0 c03e2d34 c03e2bd4 cb08feb0 cb08feb0 b6e2f1c6 00000000
[   11.760929] fec0: c08243bc c08605c0 cb08fef4 cb08fed8 c0049fa0 c03e2ccc 00000001 00000018
[   11.769509] fee0: c086061c c0860600 cb08ff3c cb08fef8 c004a35c c0049f44 cb08ff24 cb08ff08
[   11.778090] ff00: 00000018 00400140 ffff8f57 0000000a 00000021 600b0193 00000021 00000000
[   11.786670] ff20: 00000021 b7ca1788 00000000 00007344 cb08ff54 cb08ff40 c004a4e0 c004a2a0
[   11.795249] ff40: 00000000 cb08e000 cb08ff6c cb08ff58 c004a758 c004a49c 00000110 c082e734
[   11.803830] ff60: cb08ff8c cb08ff70 c00154e4 c004a6c8 00000080 fa200000 cb08ffb0 c085f510
[   11.812409] ff80: cb08ffac cb08ff90 c000879c c00154b4 b6e2f1c6 800b0030 ffffffff 73440000
[   11.820989] ffa0: 00000000 cb08ffb0 c05a3420 c0008740 00014768 0000d424 00003772 b6e2f0d0
[   11.829569] ffc0: d4240000 0000d424 00001a0f 73440000 b7ca1788 00000000 00007344 be9efee8
[   11.838150] ffe0: be9ef240 be9ef1e8 b6e2f943 b6e2f1c6 800b0030 ffffffff 5f636100 666a616d
[   11.846724] Backtrace:
[   11.849302] [<c059ff8c>] (skb_panic+0x0/0x68) from [<c049b8d4>] (skb_put+0x54/0x58)
[   11.857331]  r7:00000000 r6:cb20fd40 r5:cb20ff26 r4:000007e4
[   11.863299] [<c049b880>] (skb_put+0x0/0x58) from [<c03d8680>] (rx_complete+0x28/0x230)
[   11.871600]  r7:00000000 r6:cd130c80 r5:cb32fa40 r4:cb0eac40
[   11.877562] [<c03d8658>] (rx_complete+0x0/0x230) from [<c03e2c30>] (__usb_hcd_giveback_urb+0x68/0xf8)
[   11.887231]  r7:00000000 r6:00000000 r5:600b0113 r4:cb32fa40
[   11.893185] [<c03e2bc8>] (__usb_hcd_giveback_urb+0x0/0xf8) from [<c03e2d34>] (usb_giveback_urb_bh+0x74/0xa4)
[   11.903491]  r6:cd5a76ec r5:cd5a76e8 r4:cb08feb0 r3:cb32fa54
[   11.909456] [<c03e2cc0>] (usb_giveback_urb_bh+0x0/0xa4) from [<c0049fa0>] (tasklet_action+0x68/0xbc)
[   11.919035]  r6:c08605c0 r5:c08243bc r4:00000000
[   11.923891] [<c0049f38>] (tasklet_action+0x0/0xbc) from [<c004a35c>] (__do_softirq+0xc8/0x1ac)
[   11.932922]  r7:c0860600 r6:c086061c r5:00000018 r4:00000001
[   11.938877] [<c004a294>] (__do_softirq+0x0/0x1ac) from [<c004a4e0>] (do_softirq+0x50/0x5c)
[   11.947550] [<c004a490>] (do_softirq+0x0/0x5c) from [<c004a758>] (irq_exit+0x9c/0xf0)
[   11.955760]  r4:cb08e000 r3:00000000
[   11.959532] [<c004a6bc>] (irq_exit+0x0/0xf0) from [<c00154e4>] (handle_IRQ+0x3c/0x8c)
[   11.967742]  r4:c082e734 r3:00000110
[   11.971502] [<c00154a8>] (handle_IRQ+0x0/0x8c) from [<c000879c>] (omap3_intc_handle_irq+0x68/0x7c)
[   11.980897]  r6:c085f510 r5:cb08ffb0 r4:fa200000 r3:00000080
[   11.986858] [<c0008734>] (omap3_intc_handle_irq+0x0/0x7c) from [<c05a3420>] (__irq_usr+0x40/0x60)
[   11.996163] Exception stack(0xcb08ffb0 to 0xcb08fff8)
[   12.001461] ffa0:                                     00014768 0000d424 00003772 b6e2f0d0
[   12.010041] ffc0: d4240000 0000d424 00001a0f 73440000 b7ca1788 00000000 00007344 be9efee8
[   12.018620] ffe0: be9ef240 be9ef1e8 b6e2f943 b6e2f1c6 800b0030 ffffffff
[   12.025554]  r7:73440000 r6:ffffffff r5:800b0030 r4:b6e2f1c6
[   12.031507] Code: e58d4008 e58de00c e59f0008 ebfff58c (e7f001f2)
[   12.037905] ---[ end trace 5d52f4a9fabf60da ]---
[   12.042745] Kernel panic - not syncing: Fatal exception in interrupt

Is this a known issue and if so is there a fix for it?

Thank you.

  • Hi Tim,

    Check whether the following patches related to AMSDK 07.00.00.00 present in your source code:
    processors.wiki.ti.com/.../Sitara_Linux_MUSB_Issues

    BR
    Tsvetolin Shulev
  • Thank you for your reply.  I will look at the patches and see if any may fix our issue.  I will update my results once the testing is complete.

    Tim

  • Update - In my research of this issue I saw several comments about turning off the CPPI DMA support. I did this, recompiled the kernel and the modules and the problem went away. We would really like to make use of DMA so I put the support back and then applied this patch that was in the list you mentioned:

    marc.info/

    This is the only patch I saw that referred to CPPI. After recompiling the kernel and modules and testing, the problem had returned. Any suggestions on other patches that may resolve this issue but leave the DMA in tact?
  • Patch #7.4, #7.6 and #8.1 in the Wiki Tsvetolin referred are also related to CPPI.

    Anyway, can you please try the 4.1 kernel provided in the Processor SDK2.0 to see if the issue still exists?

  • Thanks Bin for the very fast reply. I will not be able to try version 4.1 of the kernel since our whole development project is based on 3.12. I will try patches 7.4 and 7.6 first and see what results I get from that.
    Thank you.
  • I highly recommend you to apply #8.1 too.
  • Patch 8.1 does not work with the version of the kernel we are using. We are using SDK 7.0, not 8. Doest patch 7.2 do the same as 8.1?
    patching file drivers/usb/musb/musb_cppi41.cHunk #1 FAILED at 39.
    Hunk #2 succeeded at 131 (offset 9 lines).
    Hunk #3 succeeded at 142 (offset 9 lines).
    Hunk #4 succeeded at 420 (offset 46 lines).
    1 out of 4 hunks FAILED -- saving rejects to file drivers/usb/musb/musb_cppi41.c.rej
  • Tim,

    If you integrate patches 7.2, 7.4, and 7.6 do you still see issues when using CPPI?

    Bin Liu said:
    Patch #7.4, #7.6 and #8.1 in the Wiki Tsvetolin referred are also related to CPPI.

    Bin, did you mean to say "7.2" instead of "8.1".  Both 7.2 and 8.1 have the same header:

    • MUSB CPPI DMA driver does not handle RX Zero-Length Packet (ZLP)

    It looks like Tim already applied this patch (i.e. the patch from 7.2) based on his earlier post.

    Brad

  • Tim Potter said:
    patching file drivers/usb/musb/musb_cppi41.cHunk #1 FAILED at 39.
    Hunk #2 succeeded at 131 (offset 9 lines).
    Hunk #3 succeeded at 142 (offset 9 lines).
    Hunk #4 succeeded at 420 (offset 46 lines).

    Does this happen when applying #8.1? have you applied the patch #7.4 first? 8.1 should be applied on top of 7.4. I recommend you apply all 7.x and 8.x patches in order.

    Brad, no, I do mean 8.1, not 7.2 since it appears Tim already applied 7.2.

    7.2 is for RX ZLP, but 8.1 is for TX ZLP, they are different.

  • I have applied 7.2, 7.4 and 7.6. I then try an apply 8.1 and the hunk at 39 fails because what it is looking for is different in my file.


    Patch is looking for :

    --- a/drivers/usb/musb/musb_cppi41.c
    +++ b/drivers/usb/musb/musb_cppi41.c
    @@ -39,6 +39,7 @@ struct cppi41_dma_channel {
            u32 transferred;
            u32 packet_sz;
            struct list_head tx_check;
    +       int tx_zlp;
     };

     #define MUSB_DMA_NUM_CHANNELS 15

    What is really there:

            u32 transferred;
            u32 packet_sz;
            struct list_head tx_check;
            struct work_struct dma_completion;
    };

    #define MUSB_DMA_NUM_CHANNELS 15

  • It appears that the issue, which patch 8.1 fixes, is applicable to SDK7.0 kernel, but the patch 8.1 cannot be applied cleanly on SDK7.0, because SDK7.0 kernel misses some cppi driver changes as in SDK8.0 kernel.

    I will figure out the minimum change to fix the TX ZLP issue in SDK7.0 and update you. Meanwhile, please go ahead test the kernel without patch 8.1.
  • I updated the wiki: moved patch 7.6 to 7.6.b, and added patch 7.6.a.
    After applied patch 7.6.a, and 7.6.b, patch 8.1 is applicable now.
  • Thank you Bin.

    I tested with applying patches 7.2, 7.4 and 7.6 and that seems to have resolved the issue.  This morning I will apply patch 7.6a and 8.1 as well but as of now the problem appears fixed.

  • Morning Bin,

     

    I tried applying patch 7.6a and it fails:

    patching file drivers/usb/musb/musb_cppi41.c
    Hunk #2 succeeded at 112 (offset -4 lines).
    Hunk #3 FAILED at 172.
    Hunk #4 succeeded at 255 (offset -6 lines).
    Hunk #5 succeeded at 291 (offset -6 lines).
    Hunk #6 FAILED at 654.
    2 out of 6 hunks FAILED

    I started with a clean source tree and applied 7.2, 7.4 then tried 7.6a. 

  • Tim,

    It appears that the patch 7.6.a in linux-usb mailing list does not completely match with the sdk7 kernel source code. I will have to figure out what is missing.

    I cheated yesterday while finding patch 7.6.a, I did not directly take the patch from the webpage, but just did 'git revert 700f2faf' to revert the offending patch in my tree.

    Meanwhile, you can either

    - do 'git revert 700f2faf' if you cloned the kernel tree, it is equivalent to patch 7.6.a.

    or

    - manually apply patch 7.6.a, as you can see that

     * Hunk #3 is to removing function cppi_trans_done_work()

     * Hunk #6 is removing the lines:

    -               INIT_WORK(&cppi41_channel->dma_completion,
    -                         cppi_trans_done_work);
    
  • Updated patch #7.6.a in the wiki, so that patch #8.1 can apply on SDK7.0 kernel.

    The problem was that commit 700f2faf in SDK7.0 is different in formatting from that in mainline kernel, which causes the revert patch in linux-usb mailing list does not apply to SDK7.0 kernel.
  • Thank you Bin.

    I have downloaded the new patch and they all install cleanly.  We applied them to the kernel and it has solved the crash we were seeing. 

  • Great! Thanks for the update.