This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: Kernel panic in the socketcan driver (SDK 9.2)

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH

Tool/software:

Hi,

We've just discovered that when our app gets killed, we get kernel panics, and the trace points to the socketcan driver.

How to reproduce:

  • Run the app
  • Kill the app. Either:
    • kill -9 <pid>
    • Close the SSH session
  • kernel panic happens

Here's the trace

[   68.505897] Unable to handle kernel paging request at virtual address 0000000608ace2b0
[   68.513812] Mem abort info:
[   68.516602]   ESR = 0x0000000086000005
[   68.520347]   EC = 0x21: IABT (current EL), IL = 32 bits
[   68.525653]   SET = 0, FnV = 0
[   68.528703]   EA = 0, S1PTW = 0
[   68.531840]   FSC = 0x05: level 1 translation fault
[   68.536714] user pgtable: 64k pages, 48-bit VAs, pgdp=000000094246e000
[   68.543238] [0000000608ace2b0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   68.551939] Internal error: Oops: 0000000086000005 [#1] PREEMPT SMP
[   68.558191] Modules linked in: can_raw can veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter ip_tablei
[   68.558322]  videodev pci_j721e_host m_can pci_j721e at24 pcie_cadence_host rti_wdt mc pcie_cadence can_dev pwm_tiehrpwm optee_rng rng_core fuse drm drm_panel_orientation_quirks ipv6
[   68.661497] CPU: 5 PID: 40 Comm: ksoftirqd/5 Tainted: G           O       6.1.80-g2e423244f8 #1
[   68.670175] Hardware name: Texas Instruments J784S4 EVM (DT)
[   68.675816] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   68.682759] pc : 0x608ace2b0
[   68.685632] lr : skb_release_head_state+0x34/0xe4
[   68.690330] sp : ffff80000a5afb80
[   68.693630] x29: ffff80000a5afb80 x28: ffff0008c060b100 x27: ffff000f7d9cac78
[   68.700748] x26: 000000000000000a x25: 0000000000000001 x24: ffff80000a5afd50
[   68.707866] x23: ffff800009390008 x22: ffff8000097b3a20 x21: ffff80000984d440
[   68.714985] x20: ffff0008c4cccae0 x19: ffff0008d3050000 x18: ffffffffffffffff
[   68.722103] x17: 676e696d61657274 x16: 73205d6d61657274 x15: 0000000000000200
[   68.729221] x14: 000000000000001a x13: 0000000000000000 x12: 0000000000003fff
[   68.736338] x11: 0000000000000040 x10: ffff800009849ec0 x9 : ffff800008ad4f0c
[   68.743455] x8 : ffff80000a5afaa8 x7 : 0000000000000000 x6 : 0000000000000101
[   68.750573] x5 : ffff8000097b4000 x4 : ffff8000097b4258 x3 : 0000000000000000
[   68.757691] x2 : ffff800008ad5348 x1 : 0000000608ace2b0 x0 : ffff0008d3050000
[   68.764810] Call trace:
[   68.767243]  0x608ace2b0
[   68.769766]  kfree_skb_reason+0x48/0x14c
[   68.773677]  skb_queue_purge+0x28/0x44
[   68.777413]  can_sock_destruct+0x24/0x40 [can]
[   68.781854]  __sk_destruct+0x34/0x230
[   68.785503]  sk_destruct+0x5c/0x6c
[   68.788892]  __sk_free+0x80/0x130
[   68.792194]  sk_free+0x6c/0x90
[   68.795235]  can_rx_delete_receiver+0x90/0xb0 [can]
[   68.800102]  rcu_core+0x2c4/0xab4
[   68.803407]  rcu_core_si+0x18/0x24
[   68.806796]  __do_softirq+0x178/0x474
[   68.810446]  run_ksoftirqd+0x64/0x84
[   68.814013]  smpboot_thread_fn+0x1e8/0x21c
[   68.818099]  kthread+0x104/0x110
[   68.821316]  ret_from_fork+0x10/0x20
[   68.824886] Code: bad PC value
[   68.827930] ---[ end trace 0000000000000000 ]---
[   68.832532] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   68.839388] SMP: stopping secondary CPUs
[   68.843305] Kernel Offset: disabled
[   68.846779] CPU features: 0x40000,20028084,0000420b
[   68.851641] Memory Limit: none
[   68.854685] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

  • Hi Fred,

    I do not have a TDA4VH with me, so I used a TDA4VEN that I had on-hand. So far, I am not able to reproduce the issue using can-utils commandline tool like "ip", "candump", and "cansend". They should be using SocketCAN underneath.

    As for the experiment, I did the following sequence:

        9  ifconfig -a
       10  ip link set main_mcan0 type can bitrate 500000
       11  ip link set up main_mcan0
       12  candump main_mcan0 &
       13  ip link set mcu_mcan0 type can bitrate 500000
       14  ip link set up mcu_mcan0
       15  candump mcu_mcan0 &
       16  cansend mcu_mcan0 123#DEADBEEF
       17  cansend mcu_mcan0 123#DEADBEEF
       18  cansend mcu_mcan0 123#DEADBEEF
       19  vi send_can.sh
       20  chmod +x send_can.sh
       21  ./send_can.sh
    
    

    I put cansend in a while loop bash script (attached below) to see if issue only occurs when messages are constantly sent.

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/send_5F00_can.sh

    And I had a separate ssh terminal that was used to kill the candump and cansend bash script processes. Kernel panic did not occur when each of the processes were killed.

    Feel free to try out the experiment I tried out to see if this shows errors on your system. If this does not show issues, it may be an application-level issue.

    Regards,

    Takuma

  • Hi Takuma,

    Thank you for the quick reply. I had a look at candump's code and saw that they add a signal handler to gracefully close the open sockets before crashing.

    static void sigterm(int signo)
    {
    	running = 0;
    	signal_num = signo;
    }
    
    /* ... */ 
        signal(SIGTERM, sigterm);
        signal(SIGHUP, sigterm);
        signal(SIGINT, sigterm);
    

    Adding that seems to stop the kernel panics. I will tag this as resolved.

  • Hi Fred,

    Glad to hear you were able to figure it out! And thank you for posting the resolution.

    Regards,

    Takuma