AM4378: kernel panic observed when while accessing i2c EEPROM.

siva nallamalli

Part Number: AM4378

I am getting a kernel panic & the board was hung while writing into eeprom over i2c on am437x based custom board.

there is shell script that test by accessing each peripheral on the board. We are noticing this crash & hang in long duartion test.

Kernel panic was observed after testing can using candump & cansend command.

Does the kernel panic is due to CAN module/CAN Driver (or) some random kernel bug caused this?

There is similar oops trace where some patches available if it is kernel bug.

https://lists.ubuntu.com/archives/kernel-team/2017-March/083068.html

https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/1490801481-5276-2-git-send-email-tim.gardner@canonical.com/

[2021-08-11 22:53:09.863] [11048.592019] net can1: c_can_hw_raminit_wait_syscon: time out
[2021-08-11 22:53:09.863] [11048.598088] c_can_platform 481d0000.can can1: setting BTR=2701 BRPE=0000
[2021-08-11 22:53:09.863] 2021-08-11T22:52:54.758589+00:00 am437x-custom kernel: [11048.598088] c_can_platform 481d0000.can can1: setting BTR=2701 BRPE=0000
[2021-08-11 22:54:41.123] [11139.890077] Unable to handle kernel NULL pointer dereference at virtual address 00000014
[2021-08-11 22:54:41.498] [11139.898634] pgd = c06f9a6b
[2021-08-11 22:54:41.498] [11139.909872] [00000014] *pgd=00000000
[2021-08-11 22:54:41.498] [11139.913942] Internal error: Oops: 5 [#1] PREEMPT ARM
[2021-08-11 22:54:41.498] [11139.913948] Modules linked in: can_raw apm cfg80211 bnep iptable_filter iptable_mangle iptable_nat nf_nat_ipv4 ip_tables ip6table_filter ip6table_mangle ip6table_nat nf_nat_ipv6 nf_nat nf_conntrack libcrc32c nf_defrag_ipv4 nf_defrag_ipv6 ip6_tables x_tables rfcomm bluetooth cdc_subset drbg ecdh_generic usb_f_ecm g_ether usb_f_rndis u_ether libcomposite cdc_ether usbnet xhci_plat_hcd xhci_hcd usbcore dwc3 udc_core usb_common ti_am335x_adc sha512_generic sha512_arm ctr sha256_generic cbc hmac ecb pm33xx omap_des md5 pvrsrvkm(O) des_generic omap_aes_driver crypto_engine omap_sham ti_emif_sram omap_crypto dwc3_omap pixcir_i2c_ts wkup_m3_rproc rtc_omap omap_wdt at24 wkup_m3_ipc ti_am335x_tscadc remoteproc phy_omap_usb2
[2021-08-11 22:54:41.498] [11139.914074] CPU: 0 PID: 23187 Comm: kworker/u2:2 Tainted: G O 4.19.160-rt69-gb3a2247aae #1
[2021-08-11 22:54:41.498] [11139.914077] Hardware name: Generic AM43 (Flattened Device Tree)
[2021-08-11 22:54:41.514] [11139.914100] Workqueue: events_unbound flush_to_ldisc
[2021-08-11 22:54:41.514] [11139.914124] PC is at n_tty_receive_char_special+0x2dc/0xb50
[2021-08-11 22:54:41.514] [11139.914132] LR is at n_tty_receive_buf_common+0x438/0xa54

[2021-08-11 22:54:41.530] [11139.914264] Backtrace:
[2021-08-11 22:54:41.530] [11139.914278] [<c042fd50>] (n_tty_receive_buf_common) from [<c04307c0>] (n_tty_receive_buf2+0x1c/0x24)
[2021-08-11 22:54:41.530] [11139.914287] r10:d52cf408 r9:c04334a4 r8:d5024dac r7:d52cc6c0 r6:00000014 r5:c04307a4
[2021-08-11 22:54:41.530] [11139.914289] r4:00000014
[2021-08-11 22:54:41.530] [11139.914302] [<c04307a4>] (n_tty_receive_buf2) from [<c0432b0c>] (tty_ldisc_receive_buf+0x28/0x64)
[2021-08-11 22:54:41.530] [11139.914313] [<c0432ae4>] (tty_ldisc_receive_buf) from [<c04334e4>] (tty_port_default_receive_buf+0x40/0x60)
[2021-08-11 22:54:41.530] [11139.914317] r5:00000000 r4:d5024dac
[2021-08-11 22:54:41.530] [11139.914324] [<c04334a4>] (tty_port_default_receive_buf) from [<c0432ff8>] (flush_to_ldisc+0x88/0xcc)
[2021-08-11 22:54:41.530] [11139.914329] r7:d52cf400 r6:d52cf414 r5:d52cf404 r4:d5024c00
[2021-08-11 22:54:41.530] [11139.914342] [<c0432f70>] (flush_to_ldisc) from [<c004dae4>] (process_one_work+0x208/0x430)
[2021-08-11 22:54:41.654] [11139.914349] r9:00000000 r8:dc405000 r7:00000000 r6:dc406000 r5:db9b4080 r4:d52cf404
[2021-08-11 22:54:41.654] [11139.914356] [<c004d8dc>] (process_one_work) from [<c004dd68>] (worker_thread+0x5c/0x674)
[2021-08-11 22:54:41.654] [11139.914363] r10:db9b4080 r9:00000088 r8:c0ba1380 r7:dc405014 r6:ffffe000 r5:db9b4094
[2021-08-11 22:54:41.654] [11139.914366] r4:dc405000
[2021-08-11 22:54:41.654] [11139.914381] [<c004dd0c>] (worker_thread) from [<c0053b44>] (kthread+0x188/0x1a0)
[2021-08-11 22:54:41.654] [11139.914388] r10:dc47be80 r9:00000000 r8:c004dd0c r7:db9b4080 r6:d6c56000 r5:d48a5ac0
[2021-08-11 22:54:41.654] [11139.914390] r4:d48a58c0
[2021-08-11 22:54:41.654] [11139.914402] [<c00539bc>] (kthread) from [<c00090e0>] (ret_from_fork+0x14/0x34)
[2021-08-11 22:54:41.654] [11139.914405] Exception stack(0xd6c57fb0 to 0xd6c57ff8)

over 4 years ago

0 Bin Liu over 4 years ago

TI__Guru**** 173571 points

Hi Siva,

siva nallamalli said:
[2021-08-11 22:54:41.498] [11139.914074] CPU: 0 PID: 23187 Comm: kworker/u2:2 Tainted: G O 4.19.160-rt69-gb3a2247aae #1

That kernel you are running is not a TI released kernel. Is there a reason this kernel version is being used? Where is the kernel source being cloned from?

I would recommend switching back to using the RT kernel that is in the TI Processor SDK v6.03 which has kernel Version: 4.19.94-rt39. The reason I recommend this is because pulling from mainline and compiling for RT is likely to miss patches that TI applies and tests.

0 Mahesh Gaikwad over 4 years ago

Prodigy 190 points

Hello Bin Liu,

We switched to ti-rt-39 kernel and started running our tests again. So far 3 times we have seen hang.

We used CCS to find the program counter values and attached is the finding.

We would need your assistance to find why this is happening.

vmlinux-rt39-dump - Copy.zip

0 Mahesh Gaikwad over 4 years ago in reply to Mahesh Gaikwad

Prodigy 190 points

Data from one more hang is attached here

ti-rt-39-hang-3-sep-2021.pdf

-----------------------------------------------------------------------------------------------------------------------------------------------

Hello Bin Liu,

We switched to ti-rt-39 kernel and started running our tests again. So far 3 times we have seen hang.

We used CCS to find the program counter values and attached is the finding.

We would need your assistance to find why this is happening.

0 Annapurna B over 4 years ago in reply to Mahesh Gaikwad

Prodigy 120 points

Hi Team,

The test is in progress with TI recommended kernel, ti-rt-39.

Adding few more debug and Logs information.

There were two tests run:

on sep-9 with CAN validation in place ---Kernel hang is observed

on Sep-13 with our CAN validation and only GPIO toggle. --- Kernel Crash is observed.

Kernel Hang issue:

Kernel Crash:

Sep13-logs.zip

Attached are the logs for these two observations.

0 Annapurna B over 4 years ago in reply to Annapurna B

Prodigy 120 points

Sep-9.zip

0 Bin Liu over 4 years ago in reply to Annapurna B

TI__Guru**** 173571 points

Hi Annapurna,

Thanks for the updates. The new crash logs from the TI SDK kernel seem to be different from the original one you provided in the first post. The original crash seems happening in the CAN driver related code, but the latest crash is in kernel workqueue processing. Not sure if both crashes are caused by the same reason.

I also received the script to run the GPIO/ADC test from offline emails. I have it running on my AM437x GPEVM with a slight modification (added steps to enable the GPIOs and disabled a few which are not available in the SDK v6.3 prebuilt kernel), hopefully I can see the same crash tomorrow morning. I will keep you posted on the test result.

0 Annapurna B over 4 years ago in reply to Bin Liu

Prodigy 120 points

Hi Bin Liu,

My Bad, it is a typo from my end. Both the tests are run without the CAN module.

It has to be "with out CAN Validation".

0 Bin Liu over 4 years ago in reply to Annapurna B

TI__Guru**** 173571 points

Hi Annapurna,

The GPIO/ADC test on my AM437x GP EVM (with SDK v6.3 prebuilt Linux binaries) has been running for about 16 hours now without any issue.

I will rebuild the kernel later today with your kernel config and try to use your board DTS to see if I can see the kernel crash.

Processors

Processors forum

AM4378: kernel panic observed when while accessing i2c EEPROM.