This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: How to debug a kernel panic error? (Asynchronous SError Interrupt)

Part Number: AM6442

Tool/software:

Hi,

On a custom board with AM6442, I have a situation where kernel panic errors happen. It is always related to an "Asynchronous SError Interrupt".

Some examples of logs with the kernel panic:

log 1:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[ 1199.521390] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[ 1199.521419] CPU: 0 PID: 12 Comm: ktimers/0 Not tainted 6.1.80-rt26 #1
[ 1199.521430] Hardware name: ---
[ 1199.521436] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1199.521445] pc : 0xffff8000080ac250
[ 1199.521452] lr : 0xffff8000080ac450
[ 1199.521455] sp : ffff0000000d3c60
[ 1199.521458] x29: ffff0000000d3c60 x28: 0000000000000020 x27: ffff0000000d3d28
[ 1199.521474] x26: ffff00001bf7ca10 x25: 00000001000db9c1 x24: 0000000000000000
[ 1199.521484] x23: dead000000000122 x22: ffff00001bf7ca00 x21: ffff00001bf7ca50
[ 1199.521495] x20: 00000001400db9c0 x19: ffff800008b68000 x18: 0000000000000000
[ 1199.521506] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000016d
[ 1199.521515] x14: 00000000b123f581 x13: 0000000000000000 x12: 0000000000000000
[ 1199.521525] x11: ffff00001bf7ca98 x10: 0000000000000001 x9 : 00000000000000a7
[ 1199.521535] x8 : 0000100000000000 x7 : ffff0000000d3d30 x6 : ffff00001bf7ca50
[ 1199.521545] x5 : ffff0000000d3d30 x4 : 0000000000000002 x3 : 0000100000000000
[ 1199.521555] x2 : 00000000000000c0 x1 : 00000001000db9c1 x0 : ffff00001bf7ca00
[ 1199.521570] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1199.521575] CPU: 0 PID: 12 Comm: ktimers/0 Not tainted 6.1.80-rt26 #1
[ 1199.521583] Hardware name: ---
[ 1199.521587] Call trace:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

log 2:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[ 1371.051212] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[ 1371.051242] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 6.1.80-rt26 #1
[ 1371.051253] Hardware name: ---
[ 1371.051259] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1371.051269] pc : 0xffff800008825d18
[ 1371.051275] lr : 0xffff8000080acef4
[ 1371.051278] sp : ffff0000002e7d00
[ 1371.051281] x29: ffff0000002e7d00 x28: 0000000000000000 x27: ffff800008c44c00
[ 1371.051296] x26: ffff00001bf7ca00 x25: 0000000000000000 x24: ffff800008b68000
[ 1371.051307] x23: 00000000ffffffff x22: 00000001001059bc x21: ffff00001bf7ca00
[ 1371.051317] x20: ffff00001bf7ca00 x19: ffff0000002e7da0 x18: 0000000000000000
[ 1371.051328] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000132
[ 1371.051338] x14: 0000000000000009 x13: 0000000000000000 x12: 0000000000000000
[ 1371.051348] x11: 0000000000000002 x10: 00000000000008f0 x9 : ffff0000002e7cf0
[ 1371.051358] x8 : 0100000000000000 x7 : 0000000000000001 x6 : ffff00001bf7ca50
[ 1371.051367] x5 : 0000000000000001 x4 : 000000001e000000 x3 : 00000001001059c0
[ 1371.051377] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00001bf7ca00
[ 1371.051391] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1371.051397] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 6.1.80-rt26 #1
[ 1371.051403] Hardware name: ---
[ 1371.051408] Call trace:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

log 3:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[ 3962.368990] SError Interrupt on CPU1, code 0x00000000bf000002 -- SError
[ 3962.369024] CPU: 1 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
[ 3962.369034] Hardware name: ---
[ 3962.369040] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3962.369050] pc : 0xffff8000087e0f60
[ 3962.369055] lr : 0xffff8000083277d0
[ 3962.369057] sp : ffff000001a07960
[ 3962.369060] x29: ffff000001a07960 x28: 0001000000000000 x27: 0000000000000000
[ 3962.369076] x26: ffff000001a07d20 x25: ffff000001a07d10 x24: ffff000001a07d20
[ 3962.369087] x23: 00000000000007d8 x22: ffff000001a07dd8 x21: ffff00000194e000
[ 3962.369097] x20: 00000000000007d8 x19: 0000000000000000 x18: 0000000000000000
[ 3962.369107] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00000194e000
[ 3962.369116] x14: 20656c62616c6961 x13: 7661206f6e203a5d x12: 322072656375646f
[ 3962.369127] x11: 72505b205d013838 x10: 355b017265767265 x9 : 732d3335612d636d
[ 3962.369137] x8 : 34303a323230320a x7 : 6174616420425355 x6 : 0000000022d819df
[ 3962.369147] x5 : 0000000022d81fe7 x4 : 0000000000000000 x3 : 00000000000007d8
[ 3962.369157] x2 : 0000000000000598 x1 : ffff00000194e210 x0 : 0000000022d8180f
[ 3962.369171] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 3962.369176] CPU: 1 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
[ 3962.369183] Hardware name: ---
[ 3962.369187] Call trace:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

How can I debug that and find which part of the kernel is causing an issue?

Thanks,

Stephane

PS: this system is running with ti kernel 09.02.00.009-rt

  • Hi Stephane,

    First please enable CONFIG_KALLSYMS in kernel config, it will add symbols in the kernel trace logs, which might give some hints about the crashes.

  • Hello,

    Thanks for the suggestion, I have added that configuration in the kernel build. Unfortunately in the last tests I have made, I still see a kernel crash (console is not responding anymore), but without any message on the debug UART... I will try to repeat the tests a few more times this week with and without CONFIG_KALLSYMS to see if this is what made a difference for the debug messages.

    Regards,

    Stephane

  • Hi Stephane,

    The CONFIG_KALLSYMS won't solve the kernel issue, rather provide more information in any kernel crash log, which would might give a hit on the debugging.

    Looking forward to the next test results/logs.

  • Hi,

    I finally got new results this afternoon, with kernel panic showing up:

    Log 1 (new type of kernel panic: "Internal error: synchronous external abort"):

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [ 906.730994] Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
    [ 906.731024] Modules linked in: rpmsg_ctrl rpmsg_char bulk_1_usb cdc_acm bulk_2_usb bulk_3_usb bulk_4_usb bulk_5_usb bulk_6_usb xhci_plat_hcd cdns3 cdns_usb_common at25 crct10dif_ce ti_k3_r5_remoteproc cdns3_ti virtio_rpmsg_bus rti_wdt rpmsg_ns sa2ul spi_omap2_mcspi at24 overlay fuse ipv6
    [ 906.731112] CPU: 0 PID: 1209 Comm: systemd-tmpfile Not tainted 6.1.80-rt26 #1
    [ 906.731122] Hardware name: ---
    [ 906.731128] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [ 906.731138] pc : copy_page+0x64/0xd0
    [ 906.731163] lr : copy_user_highpage+0x34/0x4c
    [ 906.731180] sp : ffff0000016d7c20
    [ 906.731184] x29: ffff0000016d7c20 x28: ffff0000015d3770 x27: 0000000000000000
    [ 906.731195] x26: ffff0000011c8000 x25: ffff0000015d3770 x24: fffffc00001313c0
    [ 906.731206] x23: 0000000000000a55 x22: ffff000003f5a800 x21: fffffc00001f6140
    [ 906.731216] x20: 0000ffff89862000 x19: fffffc00001f6140 x18: 000000000000ef96
    [ 906.731226] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [ 906.731235] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    [ 906.731245] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    [ 906.731254] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    [ 906.731263] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000022b7e0
    [ 906.731272] x2 : 0000000000000000 x1 : ffff000004c4f700 x0 : ffff000007d85780
    [ 906.731283] Call trace:
    [ 906.731288] copy_page+0x64/0xd0
    [ 906.731295] wp_page_copy+0x8c/0x660
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Log 2 (same "Asynchronous SError Interrupt" as last week):

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [ 1659.516725] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
    [ 1659.516752] CPU: 0 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
    [ 1659.516762] Hardware name: ---
    [ 1659.516768] pstate: 80000000 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [ 1659.516777] pc : 0000ffff9e5116c8
    [ 1659.516779] lr : 0000ffff9e5116a4
    [ 1659.516782] sp : 0000ffffe7df6db0
    [ 1659.516785] x29: 0000ffffe7df6df0 x28: 0000ffffe7df6fe0 x27: 0000ffffe7df6e60
    [ 1659.516801] x26: 0000ffff9d6a8620 x25: 0000ffffe7df72e0 x24: 0000000000000000
    [ 1659.516812] x23: 0000000000000000 x22: 62a1255053723ec2 x21: 0000000000000000
    [ 1659.516822] x20: 00000000000001c0 x19: 00000000136a26b0 x18: 0000000000000000
    [ 1659.516831] x17: 0000ffff9e21aa40 x16: 0000ffff9e60ee00 x15: ea0ba3ff2dd83ac4
    [ 1659.516842] x14: 00000000000cb778 x13: ea0ba3ff2dd83ac4 x12: 00000000000cb778
    [ 1659.516851] x11: 48cbbb06f8f33437 x10: 00000000001285b0 x9 : d4959e675cb764a0
    [ 1659.516861] x8 : 0000000000128540 x7 : 074271864aac32a7 x6 : 00000000001284d8
    [ 1659.516871] x5 : 0000ffff9d6a87e0 x4 : 0000ffffe7df6fe0 x3 : 0000ffff9d6a8760
    [ 1659.516881] x2 : fffffffffffffff0 x1 : 0000ffffe7df6fa0 x0 : 0000000062ea25ee
    [ 1659.516895] Kernel panic - not syncing: Asynchronous SError Interrupt
    [ 1659.516900] CPU: 0 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
    [ 1659.516907] Hardware name: ---
    [ 1659.516911] Call trace:
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    In one case, the call trace brings us to el0t_64_sync, and in the 2nd case, we see el0t_64_error. Does this help to start chasing the crash?

    Best regards,

    Stephane

  • Hi Stephane,

    Thanks for the new logs. Both logs seem to be not related, basically means the crash seems to be "random". The common problem of such behavior on custom boards is either unstable DDR or power supplies. To narrow it down, first can you please stop your application but run "memtester" test in Linux on your board to see if it reports any DDR error?