AM625: Issue about tidss rcu_preempt self-detected stall on CPU

Allen Chiu

Part Number: AM625
Other Parts Discussed in Thread: TFP410

Tool/software:

Hi Ti expert,

We are using AM625 custom board with VGA connected to LCD monitor.

We got an error log with "rcu_preempt self-detected stall on CPU" and system hang.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[   34.281580] rcu: INFO: rcu_preempt self-detected stall on CPU
[   34.281611] rcu:     0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=4819 rcuc=21107 jiffies(starved)
[   34.281624]  (t=21000 jiffies g=6881 q=1364 ncpus=4)
[   34.281637] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[   34.281643] Hardware name: Texas Instruments AM625 SK (DT)
[   34.281648] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   34.281654] pc : _raw_spin_unlock_irq+0x18/0x60
[   34.281669] lr : irq_finalize_oneshot.part.0+0x64/0x100
[   34.281687] sp : ffff000000eb3d90
[   34.281689] x29: ffff000000eb3d90 x28: ffff800008089000 x27: ffff000001ccfb10
[   34.281700] x26: ffff000001ccfadc x25: ffff800008089ee0 x24: ffff000001daee00
[   34.281708] x23: ffff000001ccfa00 x22: ffff000001ccfa60 x21: ffff000001ccfadc
[   34.281715] x20: ffff000001daee00 x19: ffff000001ccfa00 x18: ffff8000091ee000
[   34.281723] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[   34.281730] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[   34.281737] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[   34.281744] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[   34.281751] x5 : ffff000001ccfa60 x4 : ffff000001ccfa60 x3 : 0000000000100000
[   34.281760] x2 : ffff800009220000 x1 : ffff0000015d6c00 x0 : 0000000100000001
[   34.281769] Call trace:
[   34.281772]  _raw_spin_unlock_irq+0x18/0x60
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[   34.281580] rcu: INFO: rcu_preempt self-detected stall on CPU
[   34.281611] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=4819 rcuc=21107 jiffies(starved)
[   34.281624] 	(t=21000 jiffies g=6881 q=1364 ncpus=4)
[   34.281637] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[   34.281643] Hardware name: Texas Instruments AM625 SK (DT)
[   34.281648] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   34.281654] pc : _raw_spin_unlock_irq+0x18/0x60
[   34.281669] lr : irq_finalize_oneshot.part.0+0x64/0x100
[   34.281687] sp : ffff000000eb3d90
[   34.281689] x29: ffff000000eb3d90 x28: ffff800008089000 x27: ffff000001ccfb10
[   34.281700] x26: ffff000001ccfadc x25: ffff800008089ee0 x24: ffff000001daee00
[   34.281708] x23: ffff000001ccfa00 x22: ffff000001ccfa60 x21: ffff000001ccfadc
[   34.281715] x20: ffff000001daee00 x19: ffff000001ccfa00 x18: ffff8000091ee000
[   34.281723] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[   34.281730] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[   34.281737] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[   34.281744] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[   34.281751] x5 : ffff000001ccfa60 x4 : ffff000001ccfa60 x3 : 0000000000100000
[   34.281760] x2 : ffff800009220000 x1 : ffff0000015d6c00 x0 : 0000000100000001
[   34.281769] Call trace:
[   34.281772]  _raw_spin_unlock_irq+0x18/0x60
[   34.281777]  irq_forced_thread_fn+0x84/0xb0
[   34.281782]  irq_thread+0x12c/0x1d0
[   34.281787]  kthread+0x120/0x12c
[   34.281795]  ret_from_forklf-detected stall on CPU
[   97.284599] rcu: 	0-....: (2 GPs b+0x10/0x20
[   97.284578] rcu: INFO: rcu_preempt seehind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=19031 rcuc=84110 jiffies(starved)
[   97.284610] 	(t=84003 jiffies g=6881 q=1434 ncpus=4)
[   97.284618] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[   97.284625] Hardware name: Texas Instruments AM625 SK (DT)
[   97.284634] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   97.284639] pc : dispc_read_and_clear_irqstatus+0x58/0x1f0
[   97.284656] lr : tidss_irq_handler+0x1c/0x110
[   97.284662] sp : ffff000000eb3d50
[   97.284664] x29: ffff000000eb3d50 x28: ffff800008089000 x27: ffff000001ccfb10
[   97.284674] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[   97.284681] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[   97.284688] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[   97.284696] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[   97.284706] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[   97.284713] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : ffff8000091eead8
[   97.284720] x8 : 0000000000000000 x7 : ffff000001ccf680 x6 : ffffffffffffffff
[   97.284727] x5 : ffff00007f668808 x4 : 0000000000000000 x3 : ffff80007674e000
[   97.284734] x2 : ffff80000857fec0 x1 : ffff800009413000 x0 : 0000000000000000
[   97.284743] Call trace:
[   97.284745]  dispc_read_and_clear_irqstatus+0x58/0x1f0
[   97.284751]  tidss_irq_handler+0x1c/0x110
[   97.284756]  irq_forced_thread_fn+0x38/0xb0
[   97.284763]  irq_thread+0x12c/0x1d0
[   97.284767]  kthread+0x120/0x12c
[   97.284776]  ret_from_fork+0x10/0x20
[  160.287579] rcu: INFO: rcu_preempt self-detected stall on CPU
[  160.287609] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=32851 rcuc=147113 jiffies(starved)
[  160.287620] 	(t=147006 jiffies g=6881 q=1554 ncpus=4)
[  160.287632] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  160.287639] Hardware name: Texas Instruments AM625 SK (DT)
[  160.287643] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  160.287648] pc : _raw_spin_unlock_irq+0x10/0x60
[  160.287668] lr : irq_finalize_oneshot.part.0+0x64/0x100
[  160.287676] sp : ffff000000eb3d90
[  160.287678] x29: ffff000000eb3d90 x28: ffff800008089000 x27: ffff000001ccfb10
[  160.287688] x26: ffff000001ccfadc x25: ffff800008089ee0 x24: ffff000001daee00
[  160.287695] x23: ffff000001ccfa00 x22: ffff000001ccfa60 x21: ffff000001ccfadc
[  160.287703] x20: ffff000001daee00 x19: ffff000001ccfa00 x18: ffff8000091ee000
[  160.287710] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  160.287717] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  160.287724] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[  160.287731] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[  160.287742] x5 : ffff000001ccfa60 x4 : ffff000001ccfa60 x3 : 0000000000100000
[  160.287749] x2 : ffff800009220000 x1 : 0000000000000000 x0 : 00000000000000e0
[  160.287758] Call trace:
[  160.287761]  _raw_spin_unlock_irq+0x10/0x60
[  160.287766]  irq_forced_thread_fn+0x84/0xb0
[  160.287771]  irq_thread+0x12c/0x1d0
[  160.287776]  kthread+0x120/0x12c
[  160.287785]  ret_from_fork+0x10/0x20
[  223.290579] rcu: INFO: rcu_preempt self-detected stall on CPU
[  223.290602] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=46701 rcuc=210116 jiffies(starved)
[  223.290612] 	(t=210009 jiffies g=6881 q=1580 ncpus=4)
[  223.290622] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  223.290628] Hardware name: Texas Instruments AM625 SK (DT)
[  223.290632] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  223.290639] pc : dispc_read_and_clear_irqstatus+0x58/0x1f0
[  223.290654] lr : tidss_irq_handler+0x1c/0x110
[  223.290664] sp : ffff000000eb3d50
[  223.290666] x29: ffff000000eb3d50 x28: ffff800008089000 x27: ffff000001ccfb10
[  223.290676] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[  223.290684] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[  223.290691] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[  223.290698] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  223.290705] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  223.290712] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : ffff8000091eead8
[  223.290720] x8 : 0000000000000000 x7 : ffff000001ccf680 x6 : ffffffffffffffff
[  223.290727] x5 : ffff00007f668808 x4 : 0000000000000000 x3 : ffff80007674e000
[  223.290734] x2 : ffff80000857fec0 x1 : ffff800009413000 x0 : 0000000000000000
[  223.290745] Call trace:
[  223.290747]  dispc_read_and_clear_irqstatus+0x58/0x1f0
[  223.290754]  tidss_irq_handler+0x1c/0x110
[  223.290760]  irq_forced_thread_fn+0x38/0xb0
[  223.290766]  irq_thread+0x12c/0x1d0
[  223.290770]  kthread+0x120/0x12c
[  223.290778]  ret_from_fork+0x10/0x20
[  286.293578] rcu: INFO: rcu_preempt self-detected stall on CPU
[  286.293595] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=60579 rcuc=273119 jiffies(starved)
[  286.293606] 	(t=273012 jiffies g=6881 q=1589 ncpus=4)
[  286.293615] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  286.293622] Hardware name: Texas Instruments AM625 SK (DT)
[  286.293627] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  286.293637] pc : dispc_k3_clear_irqstatus+0x188/0x1a0
[  286.293650] lr : dispc_read_and_clear_irqstatus+0xfc/0x1f0
[  286.293656] sp : ffff000000eb3d40
[  286.293658] x29: ffff000000eb3d40 x28: ffff800008089000 x27: ffff000001ccfb10
[  286.293667] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[  286.293675] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[  286.293683] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[  286.293690] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  286.293697] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  286.293707] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[  286.293715] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[  286.293722] x5 : 0000000000000015 x4 : 00000000003fffff x3 : 0000000000000002
[  286.293729] x2 : 000000000000002c x1 : 000000000000002c x0 : 0000000000000002
[  286.293737] Call trace:
[  286.293740]  dispc_k3_clear_irqstatus+0x188/0x1a0
[  286.293747]  dispc_read_and_clear_irqstatus+0xfc/0x1f0
[  286.293753]  tidss_irq_handler+0x1c/0x110
[  286.293758]  irq_forced_thread_fn+0x38/0xb0
[  286.293764]  irq_thread+0x12c/0x1d0
[  286.293769]  kthread+0x120/0x12c
[  286.293776]  ret_from_fork+0x10/0x20

full log :

dmesg_error.txt

Fullscreen

1
2
3
4
U-Boot SPL 2023.04-ga37da23008 (Apr 02 2024 - 05:22:32 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0009 '9.1.8--v09.01.08 (Kool Koala)')
SPL initial stack usage: 13384 bytes
Trying to boot from MMC1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

U-Boot SPL 2023.04-ga37da23008 (Apr 02 2024 - 05:22:32 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0009 '9.1.8--v09.01.08 (Kool Koala)')
SPL initial stack usage: 13384 bytes
Trying to boot from MMC1
Authentication passed
Authentication passed
Authentication passed
Authentication passed
Authentication passed
Starting ATF on ARM64 core...

NOTICE:  BL31: v2.9(release):v2.9.0-614-gd7a7135d32-dirty
NOTICE:  BL31: Built : 09:34:15, Aug 24 2023

U-Boot SPL 2023.04-ga37da23008 (Apr 02 2024 - 05:22:32 +0000)
SYSFW ABI: 3.1 (firmware rev 0x0009 '9.1.8--v09.01.08 (Kool Koala)')
SPL initial stack usage: 1856 bytes
Error: could not access storage.
Trying to boot from MMC1
Authentication passed
Authentication passed


U-Boot 2023.04-ga37da23008 (Apr 02 2024 - 05:22:32 +0000)

SoC:   AM62X SR1.0 HS-FS
Model: Texas Instruments AM625 SK
DRAM:  2 GiB
Core:  68 devices, 30 uclasses, devicetree: separate
MMC:   mmc@fa10000: 0, mmc@fa00000: 1
Loading Environment from MMC... OK
In:    serial
Out:   serial
Err:   serial
[BSP] gd->boot_dev = 9 
Saving Environment to MMC... Writing to redundant MMC(0)... OK
switch to partitions #0, OK
mmc0(part 0) is current device
SD/MMC found on device 0
Running eMMC boot ...
574 bytes read in 15 ms (37.1 KiB/s)
EMMC Loaded env from uEnv.txt
Importing environment from mmc0 ...
18477568 bytes read in 216 ms (81.6 MiB/s)
62950 bytes read in 16 ms (3.8 MiB/s)
Working FDT set to 88000000
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
Working FDT set to 88000000
ERROR: reserving fdt memory region failed (addr=ff700000 size=8ca000 flags=4)
   Loading Device Tree to 000000008feed000, end 000000008fffffff ... OK
Working FDT set to 8feed000

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 6.1.46-rt13-BSP_12.4--g17da321871 (oe-user@oe-host) (aarch64-oe-linux-gcc (GCC) 11.4.0, GNU ld (GNU Binutils) 2.38.20220708) #1 SMP PREEMPT_RT Tue Apr 23 02:09:43 UTC 2024
[    0.000000] Machine model: Texas Instruments AM625 SK
[    0.000000] earlycon: ns16550a0 at MMIO32 0x0000000002800000 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] Reserved memory: created CMA memory pool at 0x00000000f7600000, size 128 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x000000009c800000, size 3 MiB
[    0.000000] OF: reserved mem: initialized node ipc-memories@9c800000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x000000009cb00000, size 1 MiB
[    0.000000] OF: reserved mem: initialized node m4f-dma-memory@9cb00000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x000000009cc00000, size 14 MiB
[    0.000000] OF: reserved mem: initialized node m4f-memory@9cc00000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x000000009da00000, size 1 MiB
[    0.000000] OF: reserved mem: initialized node r5f-dma-memory@9da00000, compatible id shared-dma-pool
[    0.000000] Reserved memory: created DMA memory pool at 0x000000009db00000, size 12 MiB
[    0.000000] OF: reserved mem: initialized node r5f-memory@9db00000, compatible id shared-dma-pool
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x000000009c7fffff]
[    0.000000]   node   0: [mem 0x000000009c800000-0x000000009e6fffff]
[    0.000000]   node   0: [mem 0x000000009e700000-0x000000009e77ffff]
[    0.000000]   node   0: [mem 0x000000009e780000-0x000000009fffffff]
[    0.000000]   node   0: [mem 0x00000000a0000000-0x00000000ff6fffff]
[    0.000000]   node   0: [mem 0x00000000ff700000-0x00000000fffc9fff]
[    0.000000]   node   0: [mem 0x00000000fffca000-0x00000000ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.4
[    0.000000] percpu: Embedded 21 pages/cpu s45952 r8192 d31872 u86016
[    0.000000] pcpu-alloc: s45952 r8192 d31872 u86016 alloc=21*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration
[    0.000000] CPU features: detected: ARM erratum 845719
[    0.000000] alternatives: applying boot alternatives
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 516096
[    0.000000] Kernel command line: console=ttyS2,115200n8 earlycon=ns16550a,mmio32,0x02800000 mtdparts=spi-nand0:512k(ospi.tiboot3),2m(ospi.tispl),4m(ospi.u-boot),256k(ospi.env),256k(ospi.env.backup),98048k@32m(ospi.rootfs),256k@130816k(ospi.phypattern);omap2-nand.0:2m(NAND.tiboot3),2m(NAND.tispl),2m(NAND.tiboot3.backup),4m(NAND.u-boot),256k(NAND.u-boot-env),256k(NAND.u-boot-env.backup),-(NAND.file-system) root=PARTUUID=2bd9f996-ba91-4516-be87-edd972d88e33 rw rootfstype=ext4 rootwait systemd.show_status=no loglevel=8 syntec_extend=
[    0.000000] Unknown kernel command line parameters "syntec_extend=", will be passed to user space.
[    0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 1844132K/2097152K available (10752K kernel code, 1212K rwdata, 4084K rodata, 1856K init, 424K bss, 121948K reserved, 131072K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[    0.000000] rcu: 	RCU_SOFTIRQ processing moved to rcuc kthreads.
[    0.000000] 	No expedited grace period (rcu_normal_after_boot).
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 256 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000001880000
[    0.000000] ITS [mem 0x01820000-0x0182ffff]
[    0.000000] GIC: enabling workaround for ITS: Socionext Synquacer pre-ITS
[    0.000000] ITS@0x0000000001820000: Devices Table too large, reduce ids 20->19
[    0.000000] ITS@0x0000000001820000: allocated 524288 Devices @80800000 (flat, esz 8, psz 64K, shr 0)
[    0.000000] ITS: using cache flushing for cmd queue
[    0.000000] GICv3: using LPI property table @0x0000000080020000
[    0.000000] GIC: using cache flushing for LPI property table
[    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000080030000
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] arch_timer: cp15 timer(s) running at 200.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0x3ffffffffffffff max_cycles: 0x2e2049d3e8, max_idle_ns: 440795210634 ns
[    0.000000] sched_clock: 58 bits at 200MHz, resolution 5ns, wraps every 4398046511102ns
[    0.000238] Console: colour dummy device 80x25
[    0.528860] Calibrating delay loop (skipped), value calculated using timer frequency.. 400.00 BogoMIPS (lpj=200000)
[    0.528870] pid_max: default: 32768 minimum: 301
[    0.528954] LSM: Security Framework initializing
[    0.529078] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.529115] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.530806] rcu: Hierarchical SRCU implementation.
[    0.530814] rcu: 	Max phase no-delay instances is 400.
[    0.530859] printk: bootconsole [ns16550a0] printing thread started
[    0.583325] Platform MSI: msi-controller@1820000 domain created
[    0.583554] PCI/MSI: /bus@f0000/interrupt-controller@1800000/msi-controller@1820000 domain created
[    0.583641] EFI services will not be available.
[    0.583895] smp: Bringing up secondary CPUs ...
[    0.584673] Detected VIPT I-cache on CPU1
[    0.584783] GICv3: CPU1: found redistributor 1 region 0:0x00000000018a0000
[    0.584800] GICv3: CPU1: using allocated LPI pending table @0x0000000080040000
[    0.584856] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.634936] Detected VIPT I-cache on CPU2
[    0.635028] GICv3: CPU2: found redistributor 2 region 0:0x00000000018c0000
[    0.635040] GICv3: CPU2: using allocated LPI pending table @0x0000000080050000
[    0.635073] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[    0.660956] Detected VIPT I-cache on CPU3
[    0.661045] GICv3: CPU3: found redistributor 3 region 0:0x00000000018e0000
[    0.661058] GICv3: CPU3: using allocated LPI pending table @0x0000000080060000
[    0.661089] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[    0.661152] smp: Brought up 1 node, 4 CPUs
[    0.661158] SMP: Total of 4 processors activated.
[    0.695342] CPU features: detected: 32-bit EL0 Support
[    0.695345] CPU features: detected: CRC32 instructions
[    0.695396] CPU: All CPU(s) started at EL2
[    0.695398] alternatives: applying system-wide alternatives
[    0.696873] devtmpfs: initialized
[    0.708270] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.708299] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.713493] pinctrl core: initialized pinctrl subsystem
[    0.714295] DMI not present or invalid.
[    0.714808] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.728208] DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
[    0.728504] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.728694] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.728786] audit: initializing netlink subsys (disabled)
[    0.728942] audit: type=2000 audit(0.726:1): state=initialized audit_enabled=0 res=1
[    0.729570] thermal_sys: Registered thermal governor 'step_wise'
[    0.729574] thermal_sys: Registered thermal governor 'power_allocator'
[    0.729825] ASID allocator initialised with 65536 entries
[    0.744707] platform 30200000.dss: Fixed dependency cycle(s) with /display0
[    0.753789] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    0.753800] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[    0.753804] HugeTLB: registered 32.0 MiB page size, pre-allocated 0 pages
[    0.753807] HugeTLB: 0 KiB vmemmap can be freed for a 32.0 MiB page
[    0.753810] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    0.753813] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[    0.753817] HugeTLB: registered 64.0 KiB page size, pre-allocated 0 pages
[    0.753820] HugeTLB: 0 KiB vmemmap can be freed for a 64.0 KiB page
[    0.759777] k3-chipinfo 43000014.chipid: Family:AM62X rev:SR1.0 JTAGID[0x0bb7e02f] Detected
[    0.761914] iommu: Default domain type: Translated 
[    0.761924] iommu: DMA domain TLB invalidation policy: strict mode 
[    0.762282] SCSI subsystem initialized
[    0.762399] libata version 3.00 loaded.
[    0.762625] usbcore: registered new interface driver usbfs
[    0.762664] usbcore: registered new interface driver hub
[    0.762691] usbcore: registered new device driver usb
[    0.763034] pps_core: LinuxPPS API ver. 1 registered
[    0.763037] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.763052] PTP clock support registered
[    0.763199] EDAC MC: Ver: 3.0.0
[    0.764034] omap-mailbox 29000000.mailbox: omap mailbox rev 0x66fc9100
[    0.764404] FPGA manager framework
[    0.764504] Advanced Linux Sound Architecture Driver Initialized.
[    0.765341] vgaarb: loaded
[    0.765587] clocksource: Switched to clocksource arch_sys_counter
[    0.765818] VFS: Disk quotas dquot_6.6.0
[    0.765854] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.772355] NET: Registered PF_INET protocol family
[    0.772668] IP idents hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.774397] tcp_listen_portaddr_hash hash table entries: 1024 (order: 3, 40960 bytes, linear)
[    0.774453] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.774467] TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear)
[r   0.774701] TCP bind hash table entries: 16384 (order: 8, 1310720 bytes, linear)
[  [  1.042393] printk: console [ttyS2] printing thread started
    0.776458] TCP: Hash tables configured (established 16384 bind 16384)
[    1.042412] printk: console [ttyS2] enabled
[    1.042416] printk: bootconsole [ns16550a0] disabled
[    1.054631] printk: bootconsole [ns16550a0] printing thread stopped
[    1.055873] 2810000.serial: ttyS1 at MMIO 0x2810000 (irq = 288, base_baud = 3000000) is a 8250
[    1.059914] [drm] Initialized tidss 1.0.0 20180215 for 30200000.dss on minor 0
[    1.064881] Console: switching to colour dummy device 80x25
[    1.107066] Console: switching to colour frame buffer device 171x48
[    1.123722] tidss 30200000.dss: [drm] fb0: tidssdrmfb frame buffer device
[    1.125660] davinci_mdio 8000f00.mdio: Configuring MDIO in manual mode
[    1.159619] davinci_mdio 8000f00.mdio: davinci mdio revision 9.7, bus freq 1000000
[    1.159907] mdio_bus 8000f00.mdio: MDIO device at address 0 is missing.
[    1.161029] davinci_mdio 8000f00.mdio: phy[3]: device 8000f00.mdio:03, driver unknown
[    1.161099] am65-cpsw-nuss 8000000.ethernet: initializing am65 cpsw nuss version 0x6BA01103, cpsw version 0x6BA81103 Ports: 3 quirks:00000006
[    1.161352] am65-cpsw-nuss 8000000.ethernet: Use random MAC address
[    1.161371] am65-cpsw-nuss 8000000.ethernet: initialized cpsw ale version 1.5
[    1.161376] am65-cpsw-nuss 8000000.ethernet: ALE Table size 512
[    1.162189] am65-cpsw-nuss 8000000.ethernet: CPTS ver 0x4e8a010c, freq:500000000, add_val:1 pps:0
[    1.182752] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19
[   1.188374] mmc1: CQHCI version 5.10
[    1.188999] mmc0: CQHCI version 5.10
[    1.189033] mmc2: CQHCI version 5.10
[    1.189330] physmap-flash 50000000.fpgaBus: physmap platform flash device: [mem 0x50000000-0x50ffffff]
[    1.192406] physmap-flash 51000000.fpgaBus: physmap platform flash device: [mem 0x51000000-0x51ffffff]
[    1.194811] physmap-flash 52000000.fpgaBus: physmap platform flash device: [mem 0x52000000-0x52ffffff]
[    1.205046] debugfs: Directory 'pd:182' with parent 'pm_genpd' already present!
[    1.207751] debugfs: Directory 'pd:186' with parent 'pm_genpd' already present!
[    1.220511] ALSA device list:
[    1.220523]   No soundcards found.
[    1.226094] mmc0: SDHCI controller on fa10000.mmc [fa10000.mmc] using ADMA 64-bit
[    1.230489] mmc2: SDHCI controller on fa20000.mmc [fa20000.mmc] using ADMA 64-bit
[    1.238651] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[    1.240737] Waiting for root device PARTUUID=2bd9f996-ba91-4516-be87-edd972d88e33...
[    1.304478] mmc0: Command Queue Engine enabled
[    1.304502] mmc0: new HS200 MMC card at address 0001
[    1.305286] mmcblk0: mmc0:0001 016GB0 14.7 GiB 
[    1.309683]  mmcblk0: p1 p2 p3 p4 p5
[    1.310841] mmcblk0boot0: mmc0:0001 016GB0 4.00 MiB 
[    1.312164] mmcblk0boot1: mmc0:0001 016GB0 4.00 MiB 
[    1.313283] mmcblk0rpmb: mmc0:0001 016GB0 4.00 MiB, chardev (241:0)
[    1.334639] EXT4-fs (mmcblk0p4): recovery complete
[    1.335050] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Quota mode: none.
[    1.335211] VFS: Mounted root (ext4 filesystem) on device 179:4.
[    1.335369] devtmpfs: mounted
[    1.336835] Freeing unused kernel memory: 1856K
[    1.336962] Run /sbin/init as init process
[    1.336966]   with arguments:
[    1.336967]     /sbin/init
[    1.336969]   with environment:
[    1.336971]     HOME=/
[    1.336972]     TERM=linux
[    1.336974]     syntec_extend=
[    1.347275] mmc1: new MMC card at address 0001
[    1.348277] mmcblk1: mmc1:0001 004GA0 3.69 GiB 
[    1.350655]  mmcblk1: p1
[    1.351425] mmcblk1boot0: mmc1:0001 004GA0 2.00 MiB 
[    1.353329] mmcblk1boot1: mmc1:0001 004GA0 2.00 MiB 
[    1.354937] mmcblk1rpmb: mmc1:0001 004GA0 512 KiB, chardev (241:1)
[    1.464676] systemd[1]: System time before build time, advancing clock.
[    1.490854] NET: Registered PF_INET6 protocol family
[    1.492130] Segment Routing with IPv6
[    1.492163] In-situ OAM (IOAM) with IPv6
[    1.505969] systemd[1]: systemd 250.5+ running in system mode (+PAM -AUDIT -SELINUX -APPARMOR +IMA -SMACK +SECCOMP -GCRYPT -GNUTLS -OPENSSL +ACL +BLKID -CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK -PCRE2 -PWQUALITY -P11KIT -QRENCODE -BZIP2 -LZ4 -XZ -ZLIB +ZSTD -BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=hybrid)
[    1.506491] systemd[1]: Detected architecture arm64.
[    1.509008] systemd[1]: Hostname set to <am62xx-evm>.
[    1.639221] systemd-sysv-generator[182]: SysV service '/etc/init.d/thermal-zone-init' lacks a native systemd unit file. Automatically generating a unit file for compatibility. Please update package to include a native systemd unit file, in order to make it more safe and robust.
[    1.953335] systemd[1]: Configuration file /etc/systemd/system/syntec-rename-network.service is marked executable. Please remove executable permission bits. Proceeding anyway.
[    1.956253] systemd[1]: /etc/systemd/system/sync-clocks.service:11: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
[    2.019553] systemd[1]: Queued start job for default target Graphical Interface.
[    2.023037] systemd[1]: Created slice Slice /system/modprobe.
[    2.024657] systemd[1]: Created slice Slice /system/serial-getty.
[    2.025173] systemd[1]: Created slice User and Session Slice.
[    2.025530] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    2.025780] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    2.026174] systemd[1]: Reached target Path Units.
[    2.026275] systemd[1]: Reached target Remote File Systems.
[    2.026341] systemd[1]: Reached target Slice Units.
[    2.026434] systemd[1]: Reached target Swaps.
[    2.031173] systemd[1]: Listening on RPCbind Server Activation Socket.
[    2.031344] systemd[1]: Reached target RPC Port Mapper.
[    2.037625] systemd[1]: Listening on Process Core Dump Socket.
[    2.038081] systemd[1]: Listening on initctl Compatibility Named Pipe.
[    2.038903] systemd[1]: Listening on Journal Audit Socket.
[    2.039489] systemd[1]: Listening on Journal Socket (/dev/log).
[    2.040135] systemd[1]: Listening on Journal Socket.
[    2.040953] systemd[1]: Listening on Network Service Netlink Socket.
[    2.042841] systemd[1]: Listening on udev Control Socket.
[    2.043356] systemd[1]: Listening on udev Kernel Socket.
[    2.043864] systemd[1]: Listening on User Database Manager Socket.
[    2.081231] systemd[1]: Mounting Huge Pages File System...
[    2.085314] systemd[1]: Mounting POSIX Message Queue File System...
[    2.089728] systemd[1]: Mounting Kernel Debug File System...
[    2.090513] systemd[1]: Kernel Trace File System was skipped because of a failed condition check (ConditionPathExists=/sys/kernel/tracing).
[    2.092257] systemd[1]: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway.
[    2.095872] systemd[1]: Mounting Temporary Directory /tmp...
[    2.101514] systemd[1]: Starting Create List of Static Device Nodes...
[    2.110335] systemd[1]: Starting Load Kernel Module configfs...
[    2.115901] systemd[1]: Starting Load Kernel Module drm...
[    2.121022] systemd[1]: Starting Load Kernel Module fuse...
[    2.127966] systemd[1]: Starting RPC Bind...
[    2.128475] systemd[1]: File System Check on Root Device was skipped because of a failed condition check (ConditionPathIsReadWrite=!/).
[    2.130161] systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
[    2.130189] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
[    2.134019] systemd[1]: Starting Journal Service...
[    2.136028] fuse: init (API version 7.37)
[    2.141463] systemd[1]: Starting Load Kernel Modules...
[    2.146331] systemd[1]: Starting Generate network units from Kernel command line...
[    2.156354] systemd[1]: Starting Remount Root and Kernel File Systems...
[    2.162104] systemd[1]: Starting Coldplug All udev Devices...
[    2.168176] systemd[1]: Starting Setup Virtual Console...
[    2.186519] cryptodev: loading out-of-tree module taints kernel.
[    2.190881] cryptodev: driver 1.12 loaded.
[    2.196773] systemd[1]: Started RPC Bind.
[    2.197725] systemd[1]: Mounted Huge Pages File System.
[    2.198449] systemd[1]: Mounted POSIX Message Queue File System.
[    2.199084] systemd[1]: Mounted Kernel Debug File System.
[    2.199664] systemd[1]: Mounted Temporary Directory /tmp.
[    2.201071] systemd[1]: Finished Create List of Static Device Nodes.
[    2.202765] systemd[1]: modprobe@configfs.service: Deactivated successfully.
[    2.203487] systemd[1]: Finished Load Kernel Module configfs.
[    2.205137] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    2.205867] systemd[1]: Finished Load Kernel Module drm.
[    2.207328] systemd[1]: modprobe@fuse.service: Deactivated successfully.
[    2.208223] systemd[1]: Finished Load Kernel Module fuse.
[    2.209704] systemd[1]: Finished Load Kernel Modules.
[    2.211178] systemd[1]: Finished Generate network units from Kernel command line.
[    2.228257] systemd[1]: Mounting FUSE Control File System...
[    2.234386] EXT4-fs (mmcblk0p4): re-mounted. Quota mode: none.
[    2.244910] systemd[1]: Mounting Kernel Configuration File System...
[    2.252721] systemd[1]: Starting Apply Kernel Variables...
[    2.272991] systemd[1]: Finished Remount Root and Kernel File Systems.
[    2.276441] systemd[1]: Finished Setup Virtual Console.
[    2.277413] systemd[1]: Mounted FUSE Control File System.
[    2.278196] systemd[1]: Mounted Kernel Configuration File System.
[    2.283987] systemd[1]: Rebuild Hardware Database was skipped because of a failed condition check (ConditionNeedsUpdate=/etc).
[    2.284314] systemd[1]: Platform Persistent Storage Archival was skipped because of a failed condition check (ConditionDirectoryNotEmpty=/sys/fs/pstore).
[    2.284777] systemd[1]: Create System Users was skipped because of a failed condition check (ConditionNeedsUpdate=/etc).
[    2.299862] systemd[1]: Starting Create Static Device Nodes in /dev...
[    2.301520] systemd[1]: Started Journal Service.
[    2.338461] systemd-journald[192]: Received client request to flush runtime journal.
[    2.733641] random: crng init done
[    2.880263] mc: Linux media interface: v0.10
[    2.887760] rtc-s35390a 1-0030: registered as rtc0
[    2.888861] rtc-s35390a 1-0030: setting system clock to 2024-07-30T09:59:39 UTC (1722333579)
[    2.889635] systemd-journald[192]: Oldest entry in /run/log/journal/78255e5166e54a0298aad2c717ea6678/system.journal is older than the configured file retention duration (1month), suggesting rotation.
[    2.889666] systemd-journald[192]: /run/log/journal/78255e5166e54a0298aad2c717ea6678/system.journal: Journal header limits reached or header out-of-date, rotating.
[    2.890154] fram@0 enforce active low on chipselect handle
[    2.893275] systemd-journald[192]: Oldest entry in /run/log/journal/78255e5166e54a0298aad2c717ea6678/system.journal is older than the configured file retention duration (1month), suggesting rotation.
[    2.893298] systemd-journald[192]: /run/log/journal/78255e5166e54a0298aad2c717ea6678/system.journal: Journal header limits reached or header out-of-date, rotating.
[    2.930963] videodev: Linux video capture interface: v2.00
[    3.020325] k3-m4-rproc 5000000.m4fss: assigned reserved memory node m4f-dma-memory@9cb00000
[    3.020960] k3-m4-rproc 5000000.m4fss: configured M4 for remoteproc mode
[    3.021686] k3-m4-rproc 5000000.m4fss: local reset is deasserted for device
[    3.022864] remoteproc remoteproc0: 5000000.m4fss is available
[    3.024868] remoteproc remoteproc0: Direct firmware load for am62-mcu-m4f0_0-fw failed with error -2
[    3.024906] remoteproc remoteproc0: powering up 5000000.m4fss
[    3.024954] remoteproc remoteproc0: Direct firmware load for am62-mcu-m4f0_0-fw failed with error -2
[    3.024962] remoteproc remoteproc0: request_firmware failed: -2
[    3.197746] platform 78000000.r5f: R5F core may have been powered on by a different host, programmed state (0) != actual state (1)
[    3.204797] platform 31000000.usb: Fixed dependency cycle(s) with /bus@f0000/i2c@20000000/tps6598x@3f/connector
[    3.206785] platform 78000000.r5f: configured R5F for IPC-only mode
[    3.207063] platform 78000000.r5f: assigned reserved memory node r5f-dma-memory@9da00000
[    3.207523] remoteproc remoteproc1: 78000000.r5f is available
[    3.207704] remoteproc remoteproc1: attaching to 78000000.r5f
[    3.231294] platform 78000000.r5f: R5F core initialized in IPC-only mode
[    3.231343] rproc-virtio rproc-virtio.2.auto: assigned reserved memory node r5f-dma-memory@9da00000
[    3.232510] virtio_rpmsg_bus virtio0: rpmsg host is online
[    3.232562] rproc-virtio rproc-virtio.2.auto: registered virtio0 (type 7)
[    3.232570] remoteproc remoteproc1: remote processor 78000000.r5f is now attached
[    3.237812] virtio_rpmsg_bus virtio0: creating channel ti.ipc4.ping-pong addr 0xd
[    3.239665] virtio_rpmsg_bus virtio0: creating channel rpmsg_chrdev addr 0xe
[    3.241961] rtc-ti-k3 2b1f0000.rtc: registered as rtc1
[    3.370033] Init FPGA interrupt driver (1.0)
[    3.370877] Get kick_gpio number= (418)
[    3.977830] fm25_syntec spi2.0: 32 KByte fm25 fram, pagesize 4096
[    4.147279] remoteproc remoteproc2: 30074000.pru is available
[    4.149240] remoteproc remoteproc3: 30078000.pru is available
[    4.365457] xhci-hcd xhci-hcd.4.auto: xHCI Host Controller
[    4.365515] xhci-hcd xhci-hcd.4.auto: new USB bus registered, assigned bus number 1
[    4.366843] cdns-csi2rx: probe of 30101000.csi-bridge failed with error -22
[    4.371110] xhci-hcd xhci-hcd.4.auto: USB3 root hub has no ports
[    4.371135] xhci-hcd xhci-hcd.4.auto: hcc params 0x0258fe6d hci version 0x110 quirks 0x0000000000010010
[    4.384747] xhci-hcd xhci-hcd.4.auto: irq 468, io mem 0x31100000
[    4.385984] hub 1-0:1.0: USB hub found
[    4.386037] hub 1-0:1.0: 1 port detected
[    4.627740] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[    4.801426] hub 1-1:1.0: USB hub found
[    4.807069] hub 1-1:1.0: 4 ports detected
[    4.919862] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    4.985409] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[    4.986545] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    4.986658] cfg80211: failed to load regulatory.db
[    5.095040] am65-cpsw-nuss 8000000.ethernet eth0: PHY [8000f00.mdio:03] driver [Generic PHY] (irq=POLL)
[    5.095076] am65-cpsw-nuss 8000000.ethernet eth0: configuring for phy/rmii link mode
[    5.257698] usb 1-1.3: new high-speed USB device number 3 using xhci-hcd
[    5.276841] EXT4-fs (mmcblk0p1): recovery complete
[    5.276878] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Quota mode: none.
[    5.292522] EXT4-fs (mmcblk0p2): recovery complete
[    5.292560] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Quota mode: none.
[    5.505516] hub 1-1.3:1.0: USB hub found
[    5.507687] EXT4-fs (mmcblk0p3): recovery complete
[    5.507733] EXT4-fs (mmcblk0p3): mounted filesystem with ordered data mode. Quota mode: none.
[    5.509282] hub 1-1.3:1.0: 4 ports detected
[    5.509562] EXT4-fs (mmcblk0p5): Using encoding defined by superblock: utf8-12.1.0 with flags 0x1
[    5.572824] EXT4-fs (mmcblk0p5): warning: mounting fs with errors, running e2fsck is recommended
[    5.575077] EXT4-fs (mmcblk0p5): recovery complete
[    5.576178] EXT4-fs (mmcblk0p5): mounted filesystem with ordered data mode. Quota mode: none.
[    5.802108] am65-cpsw-nuss 8000000.ethernet lan1: renamed from eth0
[    5.828023] am65-cpsw-nuss 8000000.ethernet lan1: PHY [8000f00.mdio:03] driver [Generic PHY] (irq=POLL)
[    5.828058] am65-cpsw-nuss 8000000.ethernet lan1: configuring for phy/rmii link mode
[    5.889654] usb 1-1.3.1: new low-speed USB device number 4 using xhci-hcd
[    5.900381] EXT4-fs (mmcblk1p1): Using encoding defined by superblock: utf8-12.1.0 with flags 0x1
[    5.953213] EXT4-fs (mmcblk1p1): recovery complete
[    5.953980] EXT4-fs (mmcblk1p1): mounted filesystem with ordered data mode. Quota mode: none.
[    6.162350] usbcore: registered new interface driver usbhid
[    6.162372] usbhid: USB HID core driver
[    6.178639] input: USB OPTICAL MOUSE  as /devices/platform/bus@f0000/f910000.dwc3-usb/31100000.usb/xhci-hcd.4.auto/usb1/1-1/1-1.3/1-1.3.1/1-1.3.1:1.0/0003:30FA:0301.0001/input/input0
[    6.180337] hid-generic 0003:30FA:0301.0001: input,hidraw0: USB HID v1.11 Mouse [USB OPTICAL MOUSE ] on usb-xhci-hcd.4.auto-1.3.1/input0
[    6.313670] usb 1-1.3.3: new low-speed USB device number 5 using xhci-hcd
[    6.600563] input: Logitech USB Keyboard as /devices/platform/bus@f0000/f910000.dwc3-usb/31100000.usb/xhci-hcd.4.auto/usb1/1-1/1-1.3/1-1.3.3/1-1.3.3:1.0/0003:046D:C31C.0002/input/input1
[    6.653947] hid-generic 0003:046D:C31C.0002: input,hidraw1: USB HID v1.10 Keyboard [Logitech USB Keyboard] on usb-xhci-hcd.4.auto-1.3.3/input0
[    6.670340] input: Logitech USB Keyboard Consumer Control as /devices/platform/bus@f0000/f910000.dwc3-usb/31100000.usb/xhci-hcd.4.auto/usb1/1-1/1-1.3/1-1.3.3/1-1.3.3:1.1/0003:046D:C31C.0003/input/input2
***************************************************************
***************************************************************
NOTICE: This file system contains the following GPL-3.0 packages:
	adwaita-icon-theme-symbolic
	bash
	cifs-utils
	dosfsto[    6.722139] input: Logitech USB Keyboard System Control as /devices/platform/bus@f0000/f910000.dwc3-usb/31100000.usb/xhci-hcd.4.auto/usb1/1-1/1-1.3/1-1.3.3/1-1.3.3:1.1/0003:046D:C31C.0003/input/input3
ols
	grub-common
	grub-editenv
	grub-efi
	less
	lib32-libgc[    6.722530] hid-generic 0003:046D:C31C.0003: input,hidraw2: USB HID v1.10 Device [Logitech USB Keyboard] on usb-xhci-hcd.4.auto-1.3.3/input1
c1
	lib32-libstdc++6
	libdw1
	libelf1
	libgcc1
	libgdbm-compat4
	libgdbm6
	libgmp10
	libidn2-0
	libreadline8
	libstdc++6
	libunistring2
	libvte-2.91-0
	nettle
	onboard
	parted
	rxvt-unicode

If you do not wish to distribute GPL-3.0 components please remove
the above packages prior to distribution.  This can be done using
the opkg remove command.  i.e.:
    opkg remove <package>
Where <package> is the name printed in the list above

NOTE: If the package is a dependency of another package you
      will be notified of the dependent packages.  You should
      use the --force-removal-of-dependent-packages option to
      also remove the dependent packages as well
***************************************************************
***************************************************************
[    7.232256] am65-cpsw-nuss 8000000.ethernet lan1: PHY [8000f00.mdio:03] driver [Generic PHY] (irq=POLL)
[    7.232288] am65-cpsw-nuss 8000000.ethernet lan1: configuring for phy/rmii link mode


 _____                    _____           _         _   
|  _  |___ ___ ___ ___   |  _  |___ ___  |_|___ ___| |_ 
|     |  _| .'| . | . |  |   __|  _| . | | | -_|  _|  _|
|__|__|_| |__,|_  |___|  |__|  |_| |___|_| |___|___|_|  
              |___|                    |___|            

Arago Project am62xx-evm -

Arago 2023.10 am62xx-evm -

am62xx-evm login: 
[   34.281580] rcu: INFO: rcu_preempt self-detected stall on CPU
[   34.281611] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=4819 rcuc=21107 jiffies(starved)
[   34.281624] 	(t=21000 jiffies g=6881 q=1364 ncpus=4)
[   34.281637] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[   34.281643] Hardware name: Texas Instruments AM625 SK (DT)
[   34.281648] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   34.281654] pc : _raw_spin_unlock_irq+0x18/0x60
[   34.281669] lr : irq_finalize_oneshot.part.0+0x64/0x100
[   34.281687] sp : ffff000000eb3d90
[   34.281689] x29: ffff000000eb3d90 x28: ffff800008089000 x27: ffff000001ccfb10
[   34.281700] x26: ffff000001ccfadc x25: ffff800008089ee0 x24: ffff000001daee00
[   34.281708] x23: ffff000001ccfa00 x22: ffff000001ccfa60 x21: ffff000001ccfadc
[   34.281715] x20: ffff000001daee00 x19: ffff000001ccfa00 x18: ffff8000091ee000
[   34.281723] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[   34.281730] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[   34.281737] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[   34.281744] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[   34.281751] x5 : ffff000001ccfa60 x4 : ffff000001ccfa60 x3 : 0000000000100000
[   34.281760] x2 : ffff800009220000 x1 : ffff0000015d6c00 x0 : 0000000100000001
[   34.281769] Call trace:
[   34.281772]  _raw_spin_unlock_irq+0x18/0x60
[   34.281777]  irq_forced_thread_fn+0x84/0xb0
[   34.281782]  irq_thread+0x12c/0x1d0
[   34.281787]  kthread+0x120/0x12c
[   34.281795]  ret_from_forklf-detected stall on CPU
[   97.284599] rcu: 	0-....: (2 GPs b+0x10/0x20
[   97.284578] rcu: INFO: rcu_preempt seehind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=19031 rcuc=84110 jiffies(starved)
[   97.284610] 	(t=84003 jiffies g=6881 q=1434 ncpus=4)
[   97.284618] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[   97.284625] Hardware name: Texas Instruments AM625 SK (DT)
[   97.284634] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   97.284639] pc : dispc_read_and_clear_irqstatus+0x58/0x1f0
[   97.284656] lr : tidss_irq_handler+0x1c/0x110
[   97.284662] sp : ffff000000eb3d50
[   97.284664] x29: ffff000000eb3d50 x28: ffff800008089000 x27: ffff000001ccfb10
[   97.284674] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[   97.284681] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[   97.284688] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[   97.284696] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[   97.284706] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[   97.284713] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : ffff8000091eead8
[   97.284720] x8 : 0000000000000000 x7 : ffff000001ccf680 x6 : ffffffffffffffff
[   97.284727] x5 : ffff00007f668808 x4 : 0000000000000000 x3 : ffff80007674e000
[   97.284734] x2 : ffff80000857fec0 x1 : ffff800009413000 x0 : 0000000000000000
[   97.284743] Call trace:
[   97.284745]  dispc_read_and_clear_irqstatus+0x58/0x1f0
[   97.284751]  tidss_irq_handler+0x1c/0x110
[   97.284756]  irq_forced_thread_fn+0x38/0xb0
[   97.284763]  irq_thread+0x12c/0x1d0
[   97.284767]  kthread+0x120/0x12c
[   97.284776]  ret_from_fork+0x10/0x20
[  160.287579] rcu: INFO: rcu_preempt self-detected stall on CPU
[  160.287609] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=32851 rcuc=147113 jiffies(starved)
[  160.287620] 	(t=147006 jiffies g=6881 q=1554 ncpus=4)
[  160.287632] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  160.287639] Hardware name: Texas Instruments AM625 SK (DT)
[  160.287643] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  160.287648] pc : _raw_spin_unlock_irq+0x10/0x60
[  160.287668] lr : irq_finalize_oneshot.part.0+0x64/0x100
[  160.287676] sp : ffff000000eb3d90
[  160.287678] x29: ffff000000eb3d90 x28: ffff800008089000 x27: ffff000001ccfb10
[  160.287688] x26: ffff000001ccfadc x25: ffff800008089ee0 x24: ffff000001daee00
[  160.287695] x23: ffff000001ccfa00 x22: ffff000001ccfa60 x21: ffff000001ccfadc
[  160.287703] x20: ffff000001daee00 x19: ffff000001ccfa00 x18: ffff8000091ee000
[  160.287710] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  160.287717] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  160.287724] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[  160.287731] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[  160.287742] x5 : ffff000001ccfa60 x4 : ffff000001ccfa60 x3 : 0000000000100000
[  160.287749] x2 : ffff800009220000 x1 : 0000000000000000 x0 : 00000000000000e0
[  160.287758] Call trace:
[  160.287761]  _raw_spin_unlock_irq+0x10/0x60
[  160.287766]  irq_forced_thread_fn+0x84/0xb0
[  160.287771]  irq_thread+0x12c/0x1d0
[  160.287776]  kthread+0x120/0x12c
[  160.287785]  ret_from_fork+0x10/0x20
[  223.290579] rcu: INFO: rcu_preempt self-detected stall on CPU
[  223.290602] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=46701 rcuc=210116 jiffies(starved)
[  223.290612] 	(t=210009 jiffies g=6881 q=1580 ncpus=4)
[  223.290622] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  223.290628] Hardware name: Texas Instruments AM625 SK (DT)
[  223.290632] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  223.290639] pc : dispc_read_and_clear_irqstatus+0x58/0x1f0
[  223.290654] lr : tidss_irq_handler+0x1c/0x110
[  223.290664] sp : ffff000000eb3d50
[  223.290666] x29: ffff000000eb3d50 x28: ffff800008089000 x27: ffff000001ccfb10
[  223.290676] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[  223.290684] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[  223.290691] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[  223.290698] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  223.290705] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  223.290712] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : ffff8000091eead8
[  223.290720] x8 : 0000000000000000 x7 : ffff000001ccf680 x6 : ffffffffffffffff
[  223.290727] x5 : ffff00007f668808 x4 : 0000000000000000 x3 : ffff80007674e000
[  223.290734] x2 : ffff80000857fec0 x1 : ffff800009413000 x0 : 0000000000000000
[  223.290745] Call trace:
[  223.290747]  dispc_read_and_clear_irqstatus+0x58/0x1f0
[  223.290754]  tidss_irq_handler+0x1c/0x110
[  223.290760]  irq_forced_thread_fn+0x38/0xb0
[  223.290766]  irq_thread+0x12c/0x1d0
[  223.290770]  kthread+0x120/0x12c
[  223.290778]  ret_from_fork+0x10/0x20
[  286.293578] rcu: INFO: rcu_preempt self-detected stall on CPU
[  286.293595] rcu: 	0-....: (2 GPs behind) idle=9104/1/0x4000000000000000 softirq=0/0 fqs=60579 rcuc=273119 jiffies(starved)
[  286.293606] 	(t=273012 jiffies g=6881 q=1589 ncpus=4)
[  286.293615] CPU: 0 PID: 137 Comm: irq/289-tidss Tainted: G           O       6.1.46-rt13-BSP_12.4--g17da321871 #1
[  286.293622] Hardware name: Texas Instruments AM625 SK (DT)
[  286.293627] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  286.293637] pc : dispc_k3_clear_irqstatus+0x188/0x1a0
[  286.293650] lr : dispc_read_and_clear_irqstatus+0xfc/0x1f0
[  286.293656] sp : ffff000000eb3d40
[  286.293658] x29: ffff000000eb3d40 x28: ffff800008089000 x27: ffff000001ccfb10
[  286.293667] x26: ffff000001ccfadc x25: ffff000000070000 x24: ffff000001daee00
[  286.293675] x23: ffff000001ccfa00 x22: ffff0000015d6c00 x21: 0000000000000001
[  286.293683] x20: ffff000001ccfa00 x19: 0000000000000000 x18: ffff8000091ee000
[  286.293690] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000003c
[  286.293697] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000000000000
[  286.293707] x11: ffff000001ccf680 x10: ffff8000091ee000 x9 : 0000000000000000
[  286.293715] x8 : ffff800008b594e8 x7 : 000000000000002b x6 : ffffffffffffffff
[  286.293722] x5 : 0000000000000015 x4 : 00000000003fffff x3 : 0000000000000002
[  286.293729] x2 : 000000000000002c x1 : 000000000000002c x0 : 0000000000000002
[  286.293737] Call trace:
[  286.293740]  dispc_k3_clear_irqstatus+0x188/0x1a0
[  286.293747]  dispc_read_and_clear_irqstatus+0xfc/0x1f0
[  286.293753]  tidss_irq_handler+0x1c/0x110
[  286.293758]  irq_forced_thread_fn+0x38/0xb0
[  286.293764]  irq_thread+0x12c/0x1d0
[  286.293769]  kthread+0x120/0x12c
[  286.293776]  ret_from_fork+0x10/0x20

Please help check if this error is caused by Tidss.

Thanks,
Allen

7 months ago

0 Bin Liu 6 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Hi Johathan,

Please apply the kernel patch below and capture the ftrace (cat /sys/kernel/tracing/trace_pipe in a telnet session maybe) along with the Linux console log.

kernel-61-dss-irq-dump.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..92ceff0a049f 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -870,6 +870,21 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
    dispc_k2g_read_irqenable(dispc);
 }
 
+void dispc_dump_irq_regs(struct dispc_device *dispc, int pos)
+{
+   u32 reg[5];
+   char *prefix = pos ? "<<" : "__";
+
+   reg[0] = dispc_read(dispc, DISPC_IRQSTATUS);
+   reg[1] = dispc_read(dispc, DISPC_VID_IRQSTATUS(0));
+   reg[2] = dispc_read(dispc, DISPC_VID_IRQSTATUS(1));
+   reg[3] = dispc_read(dispc, DISPC_VP_IRQSTATUS(0));
+   reg[4] = dispc_read(dispc, DISPC_VP_IRQSTATUS(1));
+
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..92ceff0a049f 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -870,6 +870,21 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
 	dispc_k2g_read_irqenable(dispc);
 }
 
+void dispc_dump_irq_regs(struct dispc_device *dispc, int pos)
+{
+	u32 reg[5];
+	char *prefix = pos ? "<<" : "__";
+
+	reg[0] = dispc_read(dispc, DISPC_IRQSTATUS);
+	reg[1] = dispc_read(dispc, DISPC_VID_IRQSTATUS(0));
+	reg[2] = dispc_read(dispc, DISPC_VID_IRQSTATUS(1));
+	reg[3] = dispc_read(dispc, DISPC_VP_IRQSTATUS(0));
+	reg[4] = dispc_read(dispc, DISPC_VP_IRQSTATUS(1));
+
+	trace_printk("%s: irq 0x%x, vid0 0x%x, vid1 0x%x, vp0 0x%x, vp1 0x%x\n",
+			prefix, reg[0], reg[1], reg[2], reg[3], reg[4]);
+}
+
 static dispc_irq_t dispc_k3_vp_read_irqstatus(struct dispc_device *dispc,
 					      u32 vp_idx)
 {
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.h b/drivers/gpu/drm/tidss/tidss_dispc.h
index dee647145d51..12e68dae8d01 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.h
+++ b/drivers/gpu/drm/tidss/tidss_dispc.h
@@ -154,5 +154,6 @@ int dispc_init(struct tidss_device *tidss);
 void dispc_remove(struct tidss_device *tidss);
 
 void dispc_splash_fini(struct dispc_device *dispc);
+void dispc_dump_irq_regs(struct dispc_device *dispc, int pos);
 
 #endif
diff --git a/drivers/gpu/drm/tidss/tidss_irq.c b/drivers/gpu/drm/tidss/tidss_irq.c
index a4f2757cb196..c2ec62652b5c 100644
--- a/drivers/gpu/drm/tidss/tidss_irq.c
+++ b/drivers/gpu/drm/tidss/tidss_irq.c
@@ -60,6 +60,8 @@ static irqreturn_t tidss_irq_handler(int irq, void *arg)
 	unsigned int id;
 	dispc_irq_t irqstatus;
 
+	dispc_dump_irq_regs(tidss->dispc, 0);
+
 	irqstatus = dispc_read_and_clear_irqstatus(tidss->dispc);
 
 	for (id = 0; id < tidss->num_crtcs; id++) {
@@ -81,6 +83,7 @@ static irqreturn_t tidss_irq_handler(int irq, void *arg)
 	if (irqstatus & DSS_IRQ_DEVICE_OCP_ERR)
 		dev_err_ratelimited(tidss->dev, "OCP error\n");
 
+	dispc_dump_irq_regs(tidss->dispc, 1);
 	return IRQ_HANDLED;
 }

0 Jonathan Cormier 6 months ago in reply to Bin Liu

Genius 3710 points

Okay testing restarted

0 Jonathan Cormier 5 months ago in reply to Jonathan Cormier

Genius 3710 points

Uploaded 4 rcu_preempt failures over the weekend, with the tracing. Note I don't see any difference between the dispc_dump_irq_regs output before and after the preempt happens. But maybe I missed something.

Connection-6252-TX-XXD-RI-23026345-2024-09-15-15-08-01.log

Connection-6252-TX-XXD-RI-23026301-2024-09-16-05-45-07.log

Connection-6252-TX-XXD-RI-23026301-2024-09-16-05-01-55.log

Connection-6252-TX-XXD-RI-23026301-2024-09-15-05-55-22.log

As side note: With this trace code running, we saw a huge uptick in kernel panics relating to the camera test. All of them pointing to csi2rx_get_frame_desc, even though the panic itself was different.

It would be interesting to see if these odd errors continue even if we aren't cat-ing the trace_pipe... But either way, this seems to instigate some error or instability in the camera drivers.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[50.117] TESTFIXTURE:Capturing test patterns
[50.120] 
[50.120] Setting pipeline to PAUSED ...
[50.280] Pipeline is live and does not need PREROLL ...
[50.284] Pipeline is PREROLLED ...
[50.286] Setting pipeline to PLAYING ...
[50.289] New clock: GstSystemClock
[50.291] [   31.852262] Unable to handle kernel paging request at virtual address ffff80008289ba5c
[50.299] [   31.860237] Mem abort info:
[50.302] [   31.863260]   ESR = 0x0000000096000061
[50.306] [   31.867073]   EC = 0x25: DABT (current EL), IL = 32 bits
[50.311] [   31.872424]   SET = 0, FnV = 0
[50.314] [   31.875498]   EA = 0, S1PTW = 0
[50.317] [   31.878646]   FSC = 0x21: alignment fault
[50.321] [   31.882661] Data abort info:
[50.324] [   31.885532]   ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
[50.329] [   31.891020]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[50.334] [   31.896079]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[50.340] [   31.901391] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008333b000
[50.346] [   31.908094] [ffff80008289ba5c] pgd=10000000f7fff003, p4d=10000000f7fff003, pud=10000000f7ffe003, pmd=10000000818ac003, pte=006800004bc9b713
[50.359] [   31.920637] Internal error: Oops: 0000000096000061 [#1] PREEMPT SMP
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[50.117] TESTFIXTURE:Capturing test patterns
[50.120] 
[50.120] Setting pipeline to PAUSED ...
[50.280] Pipeline is live and does not need PREROLL ...
[50.284] Pipeline is PREROLLED ...
[50.286] Setting pipeline to PLAYING ...
[50.289] New clock: GstSystemClock
[50.291] [   31.852262] Unable to handle kernel paging request at virtual address ffff80008289ba5c
[50.299] [   31.860237] Mem abort info:
[50.302] [   31.863260]   ESR = 0x0000000096000061
[50.306] [   31.867073]   EC = 0x25: DABT (current EL), IL = 32 bits
[50.311] [   31.872424]   SET = 0, FnV = 0
[50.314] [   31.875498]   EA = 0, S1PTW = 0
[50.317] [   31.878646]   FSC = 0x21: alignment fault
[50.321] [   31.882661] Data abort info:
[50.324] [   31.885532]   ISV = 0, ISS = 0x00000061, ISS2 = 0x00000000
[50.329] [   31.891020]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[50.334] [   31.896079]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[50.340] [   31.901391] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008333b000
[50.346] [   31.908094] [ffff80008289ba5c] pgd=10000000f7fff003, p4d=10000000f7fff003, pud=10000000f7ffe003, pmd=10000000818ac003, pte=006800004bc9b713
[50.359] [   31.920637] Internal error: Oops: 0000000096000061 [#1] PREEMPT SMP
[50.365] [   31.926894] Modules linked in: rpmsg_ctrl rpmsg_char virtio_rpmsg_bus xhci_plat_hcd rpmsg_ns rpmsg_core dwc3 cdns_csi2rx irq_pruss_intc pru_rproc crct10dif_ce panel_simple k3_j72xx_bandgap rtc_ti_k3 rti_wdt dwc3_am62 ti_k3_r5_remoteproc j721e_csi2rx videobuf2_dma_contig videobuf2_memops imx219 videobuf2_v4l2 v4l2_cci cfg80211 v4l2_fwnode videobuf2_common v4l2_async ltc2945 rfkill tidss videodev sa2ul mcrc64 drm_dma_helper cdns_dphy_rx drm_kms_helper mc pruss at24 optee_rng rng_core overlay fuse drm drm_panel_orientation_quirks ipv6
[50.412] [   31.974022] CPU: 0 PID: 362 Comm: v4l2src0:src Not tainted 6.6.32-01383-g16404946caf8-dirty #3
[50.421] [   31.982620] Hardware name: Critical Link MitySOM-AM62x (DT)
[50.427] [   31.988177] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[50.433] [   31.995125] pc : csi2rx_get_frame_desc+0x11c/0x1b0 [cdns_csi2rx]
[50.440] [   32.001138] lr : csi2rx_get_frame_desc+0x98/0x1b0 [cdns_csi2rx]
[50.445] [   32.007049] sp : ffff8000827fb930
[50.449] [   32.010351] x29: ffff8000827fb930 x28: ffff800079d46b58 x27: ffff0000084819c0
[50.456] [   32.017475] x26: 0000000000000014 x25: ffff8000827fb984 x24: ffff000007017060
[50.463] [   32.024599] x23: 0000000000000000 x22: ffff000006e11cb8 x21: ffff000007017090
[50.470] [   32.031722] x20: 0000000000000001 x19: ffff8000827fba50 x18: ffffffffffffffff
[50.477] [   32.038845] x17: 0000000000000003 x16: ffff800080fc9bd8 x15: ffff000005de6dd4
[50.484] [   32.045968] x14: 0000000000000000 x13: 00346e6168632072 x12: 656c6c6f72746e6f
[50.491] [   32.053092] x11: 0000000000000002 x10: 000000000000003a x9 : ffff800079eee648
[50.499] [   32.060215] x8 : 0000000000003014 x7 : 0000000000000001 x6 : 0000000000000000
[50.506] [   32.067338] x5 : 0000000000008001 x4 : 001fa40000003014 x3 : 0000000000002a00
[50.513] [   32.074460] x2 : ffff80008289ba50 x1 : ffff0000071ad6c0 x0 : ffff80008289ba50
[50.520] [   32.081586] Call trace:
[50.522] [   32.084022]  csi2rx_get_frame_desc+0x11c/0x1b0 [cdns_csi2rx]
[50.528] [   32.089674]  ti_csi2rx_get_vc+0x108/0x130 [j721e_csi2rx]
[50.533] [   32.094981]  ti_csi2rx_start_streaming+0x128/0x2a8 [j721e_csi2rx]
[50.539] [   32.101064]  vb2_start_streaming+0x74/0x170 [videobuf2_common]
[50.545] [   32.106911]  vb2_core_streamon+0x120/0x1e8 [videobuf2_common]
[50.551] [   32.112658]  vb2_ioctl_streamon+0x64/0xb8 [videobuf2_v4l2]
[50.557] [   32.118154]  v4l_streamon+0x2c/0x40 [videodev]
[50.561] [   32.122684]  __video_do_ioctl+0x194/0x400 [videodev]
[50.581] [   32.127687]  video_usercopy+0x1e4/0x780 [videodev]
[50.581] [   32.132518]  video_ioctl2+0x20/0x40 [videodev]
[50.582] [   32.137001]  v4l2_ioctl+0x48/0x70 [videodev]
[50.582] [   32.141310]  __arm64_sys_ioctl+0xb0/0x100
[50.584] [   32.145317]  invoke_syscall+0x50/0x128
[50.587] [   32.149061]  el0_svc_common.constprop.0+0x48/0xf8
[50.592] [   32.153755]  do_el0_svc+0x28/0x40
[50.595] [   32.157063]  el0_svc+0x2c/0x88
[50.598] [   32.160115]  el0t_64_sync_handler+0x13c/0x158
[50.603] [   32.164462]  el0t_64_sync+0x190/0x198
[50.606] [   32.168121] Code: 8b000a60 79402883 f840c084 29009847 (f800c044) 
[50.613] [   32.174200] ---[ end trace 0000000000000000 ]---

0 Mukul Bhatnagar 5 months ago in reply to Jonathan Cormier

TI__Guru* 82365 points

Thank you for these logs, they are valuable. From the logs this looks like your board, have you also been able to see/reproduce this on the TI EVM?

Bin had a setup running since Friday, but has not seen a fail, so thought we will just cross check.

In your case it seems like some how DSS is not clearing up the interrupt and interrupt storm happens, hogging the A53 core 0 and eventually causing RCU_stall.

Fullscreen

1
2
3
4
5
6
7
[31.195]           <idle>-0       [000] d.h1.    15.816868: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x2, vp1 0x0
[31.205]           <idle>-0       [000] d.h1.    15.816874: dispc_dump_irq_regs: <<: irq 0x0, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.216] CPU:0 [LOST 39936 EVENTS]
[31.219]        memtester-301     [000] dNH1.    20.602278: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602282: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602286: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.283]        memtester-301     [000] dNH1.    20.602291: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[31.195]           <idle>-0       [000] d.h1.    15.816868: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x2, vp1 0x0
[31.205]           <idle>-0       [000] d.h1.    15.816874: dispc_dump_irq_regs: <<: irq 0x0, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.216] CPU:0 [LOST 39936 EVENTS]
[31.219]        memtester-301     [000] dNH1.    20.602278: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602282: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602286: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.283]        memtester-301     [000] dNH1.    20.602291: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0

0 Jonathan Cormier 5 months ago in reply to Mukul Bhatnagar

Genius 3710 points

We have not tried to reproduce this on the TI EVM.

Mukul Bhatnagar said:
In your case it seems like some how DSS is not clearing up the interrupt and interrupt storm happens, hogging the A53 core 0 and eventually causing RCU_stall.

Okay breaking this down so I understand it. I agree that the logs are consistent with a IRQ storm and getting stuck handling DSS interrupts.

Fullscreen

1
2
[31.219]        memtester-301     [000] dNH1.    20.602278: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602282: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[31.219]        memtester-301     [000] dNH1.    20.602278: dispc_dump_irq_regs: __: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0
[31.282]        memtester-301     [000] dNH1.    20.602282: dispc_dump_irq_regs: <<: irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0

[000] -
This indicates the traces are being made by CPU0.
Thus CPU0 isn't hung

dNH1. -

d - irqs are disabled.
This makes sense since we are printing this from within an interrupt handler

N - both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set.
This seems to indicate that the scheduler has stuff to do that isn't getting done.

H - hard irq occurred inside a softirq
h - hard irq is running
The fact that it switches from h -> H when the problem starts seems consistent with an IRQ storm. Since the hard irq would occur as soon as we switched to softirq and thus the interrupts were re-enabled.

1 - preempt-depth of 1

Couldn't find a description of the last ., buts its always a period so i guess its not relevent.

irq 0x1, vid0 0x0, vid1 0x0, vp0 0x0, vp1 0x0

This is showing that the VID/VP interrupts aren't the issue. Somehow the DISPC_IRQSTATUS interrupt field isn't getting cleared, as it should be 0x0 by the end of the handler...

0 Jonathan Cormier 5 months ago in reply to Jonathan Cormier

Genius 3710 points

Jonathan Cormier said:
This is showing that the VID/VP interrupts aren't the issue. Somehow the DISPC_IRQSTATUS interrupt field isn't getting cleared, as it should be 0x0 by the end of the handler...

Note the current implementation of dispc_k3_clear_irqstatus, uses the VP/VID interrupt statuses to clear the DISPC_IRQSTATUS field instead of the value from DISPC_IRQSTATUS. This means if we ever (however unlikely) end up in this situation where an IRQ is set in DISPC_IRQSTATUS but isn't in either VP_IRQSTATUS or VID_IRQSTATUS, then the DISPC_IRQSTATUS register will never be cleared. And we will end up in this infinite loop.

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static
void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
{
    unsigned int i;
    u32 top_clear = 0;
    for (i = 0; i < dispc->feat->num_vps; ++i) {
        if (clearmask & DSS_IRQ_VP_MASK(i)) {
            dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
            top_clear |= BIT(i);
        }
    }
    for (i = 0; i < dispc->feat->num_planes; ++i) {
        if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
            dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
            top_clear |= BIT(4 + i);
        }
    }
    if (dispc->feat->subrev == DISPC_K2G)
        return;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

static
void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
{
	unsigned int i;
	u32 top_clear = 0;

	for (i = 0; i < dispc->feat->num_vps; ++i) {
		if (clearmask & DSS_IRQ_VP_MASK(i)) {
			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
			top_clear |= BIT(i);
		}
	}
	for (i = 0; i < dispc->feat->num_planes; ++i) {
		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
			top_clear |= BIT(4 + i);
		}
	}
	if (dispc->feat->subrev == DISPC_K2G)
		return;

	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);

	/* Flush posted writes */
	dispc_read(dispc, DISPC_IRQSTATUS);
}

Note that the implementation of dispc_k2g_read_and_clear_irqstatus makes more sense as it is correctly using the value of DISPC_IRQSTATUS to clear DISPC_IRQSTATUS.

Fullscreen

1
2
3
4
5
6
7
8
static
dispc_irq_t dispc_k2g_read_and_clear_irqstatus(struct dispc_device *dispc)
{
    dispc_irq_t stat = 0;
    /* always clear the top level irqstatus */
    dispc_write(dispc, DISPC_IRQSTATUS,
            dispc_read(dispc, DISPC_IRQSTATUS));
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

static
dispc_irq_t dispc_k2g_read_and_clear_irqstatus(struct dispc_device *dispc)
{
	dispc_irq_t stat = 0;

	/* always clear the top level irqstatus */
	dispc_write(dispc, DISPC_IRQSTATUS,
		    dispc_read(dispc, DISPC_IRQSTATUS));

I will test using the code from the k2g, however this would have the side effect that we might be clearing interrupts that the function as written wasn't intending to clear. Example dispc_k3_set_irqenable wants to only clear interrupts it's currently enabling and so in that situation, we'd clear too many interrupts.

dispc_write(dispc, DISPC_IRQSTATUS,
dispc_read(dispc, DISPC_IRQSTATUS));

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

Jonathan Cormier Could you please try cherry picking https://github.com/torvalds/linux/commit/76c0b99d614127ceadcd3563dee4983c20627e09 and see if it fixes the camera issue reported in previous comment: csi2rx_get_frame_desc+0x11c/0x1b0 [cdns_csi2rx] ?

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Hi Jonathan,

In additional to the the csi patch Devarsh mentioned above, please apply the following kernel patch to see if it resolves the rcu stall problem. (You don't need to dump the ftrace in this test.)

With this patch, I expect one of the following messages in kernel dmesg:

- tidss 30200000.dss: clearing irq 0x1 instead of 0x0
- tidss 30200000.dss: irq 0x1 not cleared

If the first message is printed, the rcu stall issue should be fixed. But if the second message is printed, I would expect the rcu issue still exists.

kernel-61-dss-irq-workaround.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..2cd18955d174 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -939,6 +954,7 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 {
    unsigned int i;
    u32 top_clear = 0;
+   u32 val;
 
    for (i = 0; i < dispc->feat->num_vps; ++i) {
        if (clearmask & DSS_IRQ_VP_MASK(i)) {
@@ -955,10 +971,18 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
    if (dispc->feat->subrev == DISPC_K2G)
        return;
 
+   val = dispc_read(dispc, DISPC_IRQSTATUS);
+   if (val > top_clear) {
+       dev_warn_ratelimited(dispc->dev,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..2cd18955d174 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -939,6 +954,7 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 {
 	unsigned int i;
 	u32 top_clear = 0;
+	u32 val;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		if (clearmask & DSS_IRQ_VP_MASK(i)) {
@@ -955,10 +971,18 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 	if (dispc->feat->subrev == DISPC_K2G)
 		return;
 
+	val = dispc_read(dispc, DISPC_IRQSTATUS);
+	if (val > top_clear) {
+		dev_warn_ratelimited(dispc->dev,
+			"clearing irq 0x%x instead of 0x%x\n", val, top_clear);
+		top_clear = val;
+	}
 	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
 
 	/* Flush posted writes */
-	dispc_read(dispc, DISPC_IRQSTATUS);
+	val = dispc_read(dispc, DISPC_IRQSTATUS);
+	if (val)
+		dev_warn_ratelimited(dispc->dev, "irq 0x%x not cleared\n", val);
 }
 
 static

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

I don't understand the use of the > check on a bit field. Shouldn't we be checking for them being not equal or specifically checking for when top_clear is zero and val is not?

 val = dispc_read(dispc, DISPC_IRQSTATUS);
 if (val != top_clear) {
 if (val && !top_clear) {

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Checking for (val != top_clear) is not correct. There are cases in which val is 0 but top_clear is not.

Checking for (val && !top_clear) probably is more accurate, I just didn't want to check for 2 conditions.

Please hold on this patch. The sw dev team just shared with me a different solution, let me test it and add some dbg printk then share the patch with you.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
Please hold on this patch. The sw dev team just shared with me a different solution, let me test it and add some dbg printk then share the patch with you.

Gotcha. I do agree that adding some debug prints to try to catch how we ended up in this state would be good.

I've attached my proposed fix.

0001-drm-tidss-Fix-chance-of-irq-storm-with-k3_clear_irqs.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
From 6a3e30438d518223c0613329c99240b8507df116 Mon Sep 17 00:00:00 2001
From: Jonathan Cormier <jcormier@criticallink.com>
Date: Tue, 17 Sep 2024 10:22:18 -0400
Subject: [PATCH] drm/tidss: Fix chance of irq storm with k3_clear_irqstatus
dispc_k3_clear_irqstatus was using a clearmask based on its submodule
 irqstatuses to clear the main dispc_irqstatus register.  This led to
 the scenario were if ever the submodules irqstatuses were clear but
 the main dispc_irqstatus wasn't, the driver would never clear the irq
 and an interrupt storm would occur.
Make sure we are setting the dispc clearmask based on the dispc_irqstatus register.
Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
---
 drivers/gpu/drm/tidss/tidss_dispc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

From 6a3e30438d518223c0613329c99240b8507df116 Mon Sep 17 00:00:00 2001
From: Jonathan Cormier <jcormier@criticallink.com>
Date: Tue, 17 Sep 2024 10:22:18 -0400
Subject: [PATCH] drm/tidss: Fix chance of irq storm with k3_clear_irqstatus

dispc_k3_clear_irqstatus was using a clearmask based on its submodule
 irqstatuses to clear the main dispc_irqstatus register.  This led to
 the scenario were if ever the submodules irqstatuses were clear but
 the main dispc_irqstatus wasn't, the driver would never clear the irq
 and an interrupt storm would occur.

Make sure we are setting the dispc clearmask based on the dispc_irqstatus register.

Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
---
 drivers/gpu/drm/tidss/tidss_dispc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index a56a6647f48f..dd34dd872fbb 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -885,27 +885,24 @@ static void dispc_k3_vid_set_irqenable(struct dispc_device *dispc,
 }
 
 static
-void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
+void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask, dispc_irq_t dispc_clearmask)
 {
 	unsigned int i;
-	u32 top_clear = 0;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		if (clearmask & DSS_IRQ_VP_MASK(i)) {
 			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(i);
 		}
 	}
 	for (i = 0; i < dispc->feat->num_planes; ++i) {
 		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
 			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(4 + i);
 		}
 	}
 	if (dispc->feat->subrev == DISPC_K2G)
 		return;
 
-	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
+	dispc_write(dispc, DISPC_IRQSTATUS, dispc_clearmask);
 
 	/* Flush posted writes */
 	dispc_read(dispc, DISPC_IRQSTATUS);
@@ -914,7 +911,8 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 static
 dispc_irq_t dispc_k3_read_and_clear_irqstatus(struct dispc_device *dispc)
 {
-	dispc_irq_t status = 0;
+	dispc_irq_t status = 0
+	dispc_irq_t dispc_irqstatus;
 	unsigned int i;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i)
@@ -923,7 +921,9 @@ dispc_irq_t dispc_k3_read_and_clear_irqstatus(struct dispc_device *dispc)
 	for (i = 0; i < dispc->feat->num_planes; ++i)
 		status |= dispc_k3_vid_read_irqstatus(dispc, i);
 
-	dispc_k3_clear_irqstatus(dispc, status);
+	dispc_irqstatus = dispc_read(dispc, DISPC_IRQSTATUS);
+
+	dispc_k3_clear_irqstatus(dispc, status, dispc_irqstatus);
 
 	return status;
 }
@@ -952,7 +952,7 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 	old_mask = dispc_k3_read_irqenable(dispc);
 
 	/* clear the irqstatus for newly enabled irqs */
-	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
+	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask, 0);
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		dispc_k3_vp_set_irqenable(dispc, i, mask);
@@ -970,8 +970,11 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 			main_disable |= BIT(i + 4);	/* VID IRQ */
 	}
 
-	if (main_enable)
+	if (main_enable) {
+		/* clear the irqstatus for newly enabled irqs */
+		dispc_k3_clear_irqstatus(dispc, 0, main_enable);
 		dispc_write(dispc, DISPC_IRQENABLE_SET, main_enable);
+	}
 
 	if (main_disable)
 		dispc_write(dispc, DISPC_IRQENABLE_CLR, main_disable);
-- 
2.25.1

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Hi Jonathan,

Attached below is the patch from our sw dev, it is currently under discussion with the DSS driver maintainer. What the patch basically does is to clear DISPC_IRQSTATUS register with its current value at the end of dispc_k3_clear_irqstatus().

0001-drm-tidss-Clear-global-irq-status-unconditionally.patch

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
Attached below is the patch from our sw dev, it is currently under discussion with the DSS driver maintainer. What the patch basically does is to clear DISPC_IRQSTATUS register with its current value at the end of dispc_k3_clear_irqstatus().

I did test this change last night. We got about 1200 test cycles without any rcu_preempts. This test was not long enough to guarantee a rcu_preempt but we probably should have seen one.

The reason I didn't stick with this simple implementation is because of the irqenable implementation. It was possible the irqenable code could accidentally consume an interrupt that wouldn't then be handled. I don't know enough about the dss to know if this would be critical or not.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

I guess your last night test is still running?

I don't work on DSS and don't know this module either, but I will relay your concern to our sw dev team.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
I guess your last night test is still running?

I don't work on DSS and don't know this module either, but I will relay your concern to our sw dev team.

We had already stopped it and started a variation of your last post xD. I am curious exactly how we end up in this state.

I appreciate that, thanks for the help you've given so far.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Testing with my last patch is also fine, all the last few patches are all basically do the same thing.

Jonathan Cormier said:
I am curious exactly how we end up in this state.

Please see the inline comments in patch "0001-drm-tidss-Clear-global-irq-status-unconditionally.patch", the problem seems related to a hardware bug in which the software doesn't handle the corner case correctly.

Jonathan Cormier said:
I appreciate that, thanks for the help you've given so far.

No worries, I am glad we finally found the root cause of the rcu stall problem. Thank you for the patience.

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

So far with this patch, we've not seen any of the camera induced kernel panics.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

This is great! Thanks for the update.

Jonathan Cormier said:
The reason I didn't stick with this simple implementation is because of the irqenable implementation. It was possible the irqenable code could accidentally consume an interrupt that wouldn't then be handled. I don't know enough about the dss to know if this would be critical or not.

Here is the response from our sw dev about this concern:

The concern about clearing any extra level 1 interrupt which was not enable in first place should not happen in the first place since we are not relying on level 1 interrupts altogether and checking all level 2 status directly instead.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
Testing with my last patch is also fine, all the last few patches are all basically do the same thing.

At just shy of 2000 tests, no preempts/irq storms yet. Will check tests again tomorrow.

Note: I have seen 4 "clearing irq 0x1 instead of 0x0" prints between the two units. I assume that each of these would have put us in the irq storm or at least given us a chance of being stuck.

"clearing irq 0x0 instead of 0x1" happen a couple times every boot. This does seem a little odd as I'd assume you cannot have a VP or VID irq without the parent irqstatus also being set. But maybe thats part of the hardware bug mentioned above. Regardless clearing an irq that isn't set should be okay, so I'm ignoring these.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
The concern about clearing any extra level 1 interrupt which was not enable in first place should not happen in the first place since we are not relying on level 1 interrupts altogether and checking all level 2 status directly instead.

Hmm, does that mean there are separate interrupt lines for the level 2 interrupts to the processor, so they don't have to go through the level 1? I had assumed this was an irq chain or vector? I don't recall the exact terminology.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Jonathan Cormier said:
Note: I have seen 4 "clearing irq 0x1 instead of 0x0" prints between the two units. I assume that each of these would have put us in the irq storm or at least given us a chance of being stuck.

Yes, this is the case which would trigger irq storm without the patch.

Jonathan Cormier said:
"clearing irq 0x0 instead of 0x1" happen a couple times every boot.

I believe my version of the patch won't print this message, since I have "if (val > top_clear)", and '1' will be written to DISPC_IRQSTATUS register.

What the "if" condition in the patch have you changed to? is '0' or '1' written to DISPC_IRQSTATUS register in this case?

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Right now its just doing !=. So 0 gets written, clearing no interrupts.

Bin Liu said:
Attached below is the patch from our sw dev, it is currently under discussion with the DSS driver maintainer. What the patch basically does is to clear DISPC_IRQSTATUS register with its current value at the end of dispc_k3_clear_irqstatus().

0001-drm-tidss-Clear-global-irq-status-unconditionally.patch

Note that this patch seems to accidentally get rid of the "Flush posted writes" line

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Jonathan Cormier said:
Right now its just doing !=. So 0 gets written, clearing no interrupts.

Okay. I guess this is okay, since the interrupt has been clear in the level 2 register.

With !=, the patch is basically does the same thing as our sw dev's version:
dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));

Jonathan Cormier said:
Note that this patch seems to accidentally get rid of the "Flush posted writes" line

Thanks for catching it, I am not sure if this flush read() is important or not, but I will let our sw dev know.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));

Agreed. I'm mostly running this version of the fix to get some indication of how many storms we've prevented.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
The concern about clearing any extra level 1 interrupt which was not enable in first place should not happen in the first place since we are not relying on level 1 interrupts altogether and checking all level 2 status directly instead.

Assuming the testing continues to look good tomorrow, we are planning on providing a patch to our customer. Can I get an updated copy of the patch, including commit message that TI plans to use?

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

I will ask for the final patch but not sure I will get it though. As far as I know, the patch is still under review with the DSS maintainer in the kernel upstream.

Regarding missing "Flush posted writes" in the patch, it was intentional. But our sw dev told me that the DSS maintainer wants to keep it, so it should be added back in the final version.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Is there a mailing list thread? I'd like to add my reviewed-by/tested-by on it.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Jonathan Cormier said:
Is there a mailing list thread? I'd like to add my reviewed-by/tested-by on it.

I will ask out sw dev for it.

To narrow down the irq storm issue, can you please help to run another test? Please revert the previous kernel patch which fixes the irq storm, but apply the following debug patch. The irq storm or rcu stall would happen with this patch, but the kernel log should provide more information.

Please note that I got the patch from our sw dev, and only did kernel compile test. Please let me know if you run into any issue with this patch.

kernel-61-dss-irq-dbg-0919.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..64fad287c7fd 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -38,6 +38,8 @@
 #include "tidss_dispc_regs.h"
 #include "tidss_scale_coefs.h"
 
+bool toggle  = 0;
+
 static const u16 tidss_k2g_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
    [DSS_REVISION_OFF] =                    0x00,
    [DSS_SYSCONFIG_OFF] =                   0x04,
@@ -884,6 +886,15 @@ static void dispc_k3_vp_write_irqstatus(struct dispc_device *dispc,
    u32 stat = dispc_vp_irq_to_raw(vpstat, vp_idx);
 
    dispc_write(dispc, DISPC_VP_IRQSTATUS(vp_idx), stat);
+
+   if (toggle && vp_idx == 0) {
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..64fad287c7fd 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -38,6 +38,8 @@
 #include "tidss_dispc_regs.h"
 #include "tidss_scale_coefs.h"
 
+bool toggle  = 0;
+
 static const u16 tidss_k2g_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
 	[DSS_REVISION_OFF] =                    0x00,
 	[DSS_SYSCONFIG_OFF] =                   0x04,
@@ -884,6 +886,15 @@ static void dispc_k3_vp_write_irqstatus(struct dispc_device *dispc,
 	u32 stat = dispc_vp_irq_to_raw(vpstat, vp_idx);
 
 	dispc_write(dispc, DISPC_VP_IRQSTATUS(vp_idx), stat);
+
+	if (toggle && vp_idx == 0) {
+		pr_err("%s : Mismatch detected!, VP clear called without DISPC_IRQ clear for previous event! toggle = %d\n", __func__, toggle);
+		pr_err("%s : Clearing with vp stat =  %x\n", __func__, stat);
+		pr_err("%s : VP_IRQ_STATUS = %x, DISPC_IRQ status = %x\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS));
+	}
+
+	toggle = 1;
 }
 
 static dispc_irq_t dispc_k3_vid_read_irqstatus(struct dispc_device *dispc,
@@ -957,8 +968,23 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 
 	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
 
+	if (toggle && top_clear == 0x0) {
+		pr_err("%s : Mismatch detected toggle = %d, top_clear = %d\n", __func__, toggle, top_clear);
+		pr_err("%s : Missed calling dispc clear after vp clear\n", __func__);
+		pr_err("%s: VP_IRQ_STATUS = %x, DISPC_IRQ status = %x\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS));
+	}
+
+	if (top_clear & 0x1)
+		toggle = 0;
+
 	/* Flush posted writes */
 	dispc_read(dispc, DISPC_IRQSTATUS);
+
+	if (dispc_read(dispc, DISPC_IRQSTATUS) && !top_clear) {
+		pr_err("%s: Mismatch detected : VP_IRQ_STATUS = %x, DISPC_IRQ status = %x toggle = %d\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS), toggle);
+	}
 }
 
 static

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

I saw 4 irq storms overnight with the above debug. Each printed the following.

[24.006] [   11.970174] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 2, DISPC_IRQ status = 1 toggle = 0
[24.007] [   11.980029] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0
[24.007] [   11.989900] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0
... Repeat last line forever

We've added some additional print statements to track this down.

0 Bin Liu 5 months ago in reply to Jonathan Cormier

TI__Guru**** 153091 points

Thanks for the log.

Jonathan Cormier said:
Note: I have seen 4 "clearing irq 0x1 instead of 0x0" prints between the two units. I assume that each of these would have put us in the irq storm or at least given us a chance of being stuck.

"clearing irq 0x0 instead of 0x1" happen a couple times every boot. This does seem a little odd as I'd assume you cannot have a VP or VID irq without the parent irqstatus also being set. But maybe thats part of the hardware bug mentioned above. Regardless clearing an irq that isn't set should be okay, so I'm ignoring these.

Do you still have the log showing both "clearing" messages? If so, please share.

0 Jonathan Cormier 5 months ago in reply to Bin Liu

Genius 3710 points

Bin Liu said:
Do you still have the log showing both "clearing" messages? If so, please share.

Connection-6252-TX-XXD-RI-23026301-2024-09-17-15-00-31.log

0 Jonathan Cormier 5 months ago in reply to Jonathan Cormier

Genius 3710 points

Jonathan Cormier said:

We've added some additional print statements to track this down.

Alright so some interesting developments. I added on to the Mismatch detected debug print to get some additional info.

I also thought it would be interesting to know if the irqenable function was getting called, since it also can call dispc_k3_clear_irqstatus. However when I added a pr_err in that function, suddenly irq storm error rates jumped from 1 in 600 tests to 7 in 10 tests. It seems that adding some amount of delay to the dispc_k3_set_irqenable() function, greatly increases the chance of this bug.

Removing the print statement or using trace_printk instead, returns the error rate to normal slow rates.

Note: A debug countdown counter was added to ensure the next 10 dispc_k3_clear_irqstatus calls would print out after an irqenable. I wanted to make sure I wasn't missing calls.

In this log, right before we enter the irq storm, we see that some IRQs were disabled. These are bits 5 and 6, DSS_IRQ_VP_VSYNC_EVEN(0) | DSS_IRQ_VP_VSYNC_ODD(0). Presumably from tidss_irq_disable_vblank(). Note: From other logs, I have seen an irq storm trigger from the opposite tidss_irq_enable_vblank().

The clear_mask ends up as 1, which doesn't end up doing anything. And yet the very next clear_irqstatus, the VP_IRQ_STATUS register is no longer 2. I haven't yet found anything that clears that register between the two calls... So maybe, if the hardware is clearing the VP irq status itself, thats what is creating the difference between the vp status and the dispc status. Don't know, pretty strange.

Fullscreen

1
2
3
4
[   11.970224] dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1
[24.014] [   11.975895] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 2, DISPC_IRQ status = 1 toggle = 0, clearmask = 1, top_stat = 1, top_clear = 0
[24.014] [   11.989591] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 1, top_clear = 0
[24.014] [   12.003395] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 1, top_clear = 0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[   11.970224] dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1
[24.014] [   11.975895] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 2, DISPC_IRQ status = 1 toggle = 0, clearmask = 1, top_stat = 1, top_clear = 0
[24.014] [   11.989591] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 1, top_clear = 0
[24.014] [   12.003395] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 1, top_clear = 0

Next, I commented out the dispc_k3_clear_irqstatus call from the irqenable function. And the irq storm triggered anyways so it doesn't look like it's the clearmask of 1 that is at fault. Even though its a bit out of place.

Note: Bit(0) is DSS_IRQ_DEVICE_OCP_ERR. This doesn't seem to exist on this hardware. And all the VP/VID irq functions ignore it. Not sure if its an artifact from a different driver or an older version of the driver...

Fullscreen

1
2
3
4
[   11.970224] dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1
[24.042] [   11.977565] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 9
[24.042] [   11.992459] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 8
[24.042] [   12.007354] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 7
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[   11.970224] dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1
[24.042] [   11.977565] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 9
[24.042] [   11.992459] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 8
[24.042] [   12.007354] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 0, DISPC_IRQ status = 1 toggle = 0, clearmask = 0, top_stat = 0, top_clear = 0, counter = 7

dbg.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..e769cc95c590 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -38,6 +38,9 @@
 #include "tidss_dispc_regs.h"
 #include "tidss_scale_coefs.h"
 
+bool toggle  = 0;
+u32 counter = 0;
+
 static const u16 tidss_k2g_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
    [DSS_REVISION_OFF] =                    0x00,
    [DSS_SYSCONFIG_OFF] =                   0x04,
@@ -884,6 +887,15 @@ static void dispc_k3_vp_write_irqstatus(struct dispc_device *dispc,
    u32 stat = dispc_vp_irq_to_raw(vpstat, vp_idx);
 
    dispc_write(dispc, DISPC_VP_IRQSTATUS(vp_idx), stat);
+
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..e769cc95c590 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -38,6 +38,9 @@
 #include "tidss_dispc_regs.h"
 #include "tidss_scale_coefs.h"
 
+bool toggle  = 0;
+u32 counter = 0;
+
 static const u16 tidss_k2g_common_regs[DISPC_COMMON_REG_TABLE_LEN] = {
 	[DSS_REVISION_OFF] =                    0x00,
 	[DSS_SYSCONFIG_OFF] =                   0x04,
@@ -884,6 +887,15 @@ static void dispc_k3_vp_write_irqstatus(struct dispc_device *dispc,
 	u32 stat = dispc_vp_irq_to_raw(vpstat, vp_idx);
 
 	dispc_write(dispc, DISPC_VP_IRQSTATUS(vp_idx), stat);
+
+	if (toggle && vp_idx == 0) {
+		pr_err("%s : Mismatch detected!, VP clear called without DISPC_IRQ clear for previous event! toggle = %d\n", __func__, toggle);
+		pr_err("%s : Clearing with vp stat =  %x\n", __func__, stat);
+		pr_err("%s : VP_IRQ_STATUS = %x, DISPC_IRQ status = %x\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS));
+	}
+
+	toggle = 1;
 }
 
 static dispc_irq_t dispc_k3_vid_read_irqstatus(struct dispc_device *dispc,
@@ -940,6 +952,8 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 	unsigned int i;
 	u32 top_clear = 0;
 
+	u32 top_stat = dispc_read(dispc, DISPC_IRQSTATUS);
+
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		if (clearmask & DSS_IRQ_VP_MASK(i)) {
 			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
@@ -957,8 +971,25 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 
 	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
 
+	if (toggle && top_clear == 0x0) {
+		pr_err("%s : Mismatch detected toggle = %d, top_clear = %d\n", __func__, toggle, top_clear);
+		pr_err("%s : Missed calling dispc clear after vp clear\n", __func__);
+		pr_err("%s: VP_IRQ_STATUS = %x, DISPC_IRQ status = %x\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS));
+	}
+
+	if (top_clear & 0x1)
+		toggle = 0;
+
 	/* Flush posted writes */
 	dispc_read(dispc, DISPC_IRQSTATUS);
+
+	if (dispc_read(dispc, DISPC_IRQSTATUS) || counter > 0) {
+		if (counter > 0)
+			counter -= 1;
+		pr_err("%s: Mismatch detected : VP_IRQ_STATUS = %x, DISPC_IRQ status = %x toggle = %d, clearmask = %x, top_stat = %x, top_clear = %x, counter = %u\n", __func__,
+			dispc_read(dispc, DISPC_VP_IRQSTATUS(0)), dispc_read(dispc, DISPC_IRQSTATUS), toggle, clearmask, top_stat, top_clear, counter);
+	}
 }
 
 static
@@ -999,10 +1030,14 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 	u32 main_enable = 0, main_disable = 0;
 	dispc_irq_t old_mask;
 
+	counter = 10;
+
 	old_mask = dispc_k3_read_irqenable(dispc);
 
+	pr_err("%s : irqenabled - mask = %x, old = %x, clr = %x\n", __func__, mask, old_mask, (old_mask ^ mask) & mask);
+
 	/* clear the irqstatus for newly enabled irqs */
 	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		dispc_k3_vp_set_irqenable(dispc, i, mask);

I'm not sure where to look next. Hopefully, yall have some insight that I'm missing.

Note I do think the DISPC_IRQ clear change, still makes sense. Regardless what we find the cause is for the start of the irq storm. It doesn't make sense to me to clear DISPC_IRQ bits based on the assumptions of its level 2 interrupts.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

Thanks Jonathan, this information helps. Could you please revert all the patches and try with just the attached patch ? /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch

0 Devarsh Thakkar 5 months ago in reply to Devarsh Thakkar

TI__Prodigy 150 points

> The clear_mask ends up as 1, which doesn't end up doing anything. And yet the very next clear_irqstatus, the VP_IRQ_STATUS register is no longer 2.

Yes this looks pretty strange, not something driver intended to from from my POV and thinking if attached patch should fix this behavior although I am not 100% sure about the intention of this code from original author.

> haven't yet found anything that clears that register between the two calls... So maybe, if the hardware is clearing the VP irq status itself, thats what is creating the difference between the vp status and the dispc status. Don't know, pretty strange.

[24.014] [ 11.975895] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 2, DISPC_IRQ status = 1 toggle = 0, clearmask = 1, top_stat = 1, top_clear = 0

Yes that looks pretty strange on who clears the vp status after above log :) we will try to see the code again and also check with folks here if any clue we can get. But thanks a lot for observations :

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Thanks Jonathan, this information helps. Could you please revert all the patches and try with just the attached patch ? /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch

We started testing this patch this morning, so we still need some time to get good results.

I'm unclear if this patch makes sense. I would think you would want to clear any possible pending interrupts before you enable the interrupts, like it was doing before. And it could also make sense to clear them after disabling them, though not before disabling them as that would leave a small window for them to be set again.

But I guess all of this assumes the irq status bits are sticky and don't auto-clear when the irqs are disabled. If they did auto-clear that might cause the problem we are seeing, atleast if the 2nd levels auto-cleared and the 1st level didn't.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

> We started testing this patch this morning, so we still need some time to get good results.

Thanks Jonathan.

> I would think you would want to clear any possible pending interrupts before you enable the interrupts, like it was doing before. And it could also make sense to clear them after disabling them, though not before disabling them as that would leave a small window for them to be set again.

Well I thought about calling the clear function towards the end i.e. after disabling the irq but that may also clear the top level irq status for any newly enabled interrupt unknowingly. Furthermore the way I see this clearing is as a fail-safe or backup option since ideally the isr should be clearing any pending interrupts anyhow.

>However when I added a pr_err in that function, suddenly irq storm error rates jumped from 1 in 600 tests to 7 in 10 tests. It seems that adding some amount of delay to the dispc_k3_set_irqenable() function, greatly increases the chance of this bug.

I tried to look into this and I see some possibilty of race condition between different functions handling the clearing of irq status related registers. Attached patch should fix it, although I am not entirely sure if the issue we are facing is related to the race condition. If you could try out with just the attached patch too without any other changes that would help root-cause this further.

/cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Avoid_2D00_race_2D00_condition_2D00_while_2D00_handling_2D00_interr.patch

> In this log, right before we enter the irq storm, we see that some IRQs were disabled. These are bits 5 and 6, DSS_IRQ_VP_VSYNC_EVEN(0) | DSS_IRQ_VP_VSYNC_ODD(0). Presumably from tidss_irq_disable_vblank(). Note: From other logs, I have seen an irq storm trigger from the opposite tidss_irq_enable_vblank().

That's strange but for some reason, your system going into suspend state or display is timing out after staying idle ? If so then, it may trigger tidss_irq_disable_vblank() ? If you could put prints in dispc_runtime_resume() and dispc_runtime_suspend(), that may help confirm the same.

>But I guess all of this assumes the irq status bits are sticky and don't auto-clear when the irqs are disabled. If they did auto-clear that might cause the problem we are seeing, atleast if the 2nd levels auto-cleared and the 1st level didn't.

[24.014] [ 11.975895] dispc_k3_clear_irqstatus: Mismatch detected : VP_IRQ_STATUS = 2, DISPC_IRQ status = 1 toggle = 0, clearmask = 1, top_stat = 1, top_clear = 0

This i presume is w.r.t above log, per my conversation with hardware team here, auto-clear on disable is not supported. But if IP is powered off than the status may get cleared including both the VP status and top level IRQ status, but as I infer you saw VP status as reset but top level IRQ status was set to 0x1, if you could confirm that this is happenning right after suspend/resume cycle i.e. after resume top level irq status is still set as 0x1, then it is an abnormal behaviour and I can follow up with hardware team here.

Also thanks for the help and sorry for suggesting multiple experiments on this. BTW is it possible to share your test setup related details like how you are testing or if you are using any test scripts and any hacks to reproduce the issue quickly?

Regards

Devarsh

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Thanks Jonathan, this information helps. Could you please revert all the patches and try with just the attached patch ? /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch

Tests ran about 2400 times. I'd expect about 2-4 rcu_preempts but saw none.

This seems to push this issue more towards it being a software bug, not a hardware bug.

Based on this, I think what is happening is the VSYNC irqs get disabled, but the code doesn't clear the statuses. And unfortunately the level 2 statuses clear themselves, leaving the level 1 DISPC status still set. And the previous code will never clear the level 1 status if the level 2 aren't set.

I do think it still makes sense to clear the DISPC interrupts, like mine and Bins patch does. And clear the level 2 interrupts in the irqenable.

I updated my original patch to include this new information. Let me know if you think its worth testing this variant of the fix.

0001-v2-drm-tidss-Prevent-potential-IRQ-storm-in-k3_clear_irq.diff

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
From c0574ff68313e96dd975ec9ba8b441f79c0420de Mon Sep 17 00:00:00 2001
From: Jonathan Cormier <jcormier@criticallink.com>
Date: Tue, 17 Sep 2024 10:22:18 -0400
Subject: [PATCH] drm/tidss: Prevent potential IRQ storm in k3_clear_irqstatus
The function dispc_k3_clear_irqstatus was previously using a clear mask
derived from its submodule irqstatus values to clear the main
dispc_irqstatus register. This created a scenario where, if the
submodule irqstatus bits were cleared but the main dispc_irqstatus
wasn't, the driver would fail to clear the interrupt, leading to an
interrupt storm.
Additionally, when vp/vid irqenable values were being cleared while the
corresponding IRQ status bits remained set. The vp/vid IRQ status bits
would unexpectedly clear themselves, while the associated
dispc_irqstatus bits remained set. This mismatch would put us in the
interrupt storm mentioned above.
This patch ensures the main clear mask is correctly derived from the
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

From c0574ff68313e96dd975ec9ba8b441f79c0420de Mon Sep 17 00:00:00 2001
From: Jonathan Cormier <jcormier@criticallink.com>
Date: Tue, 17 Sep 2024 10:22:18 -0400
Subject: [PATCH] drm/tidss: Prevent potential IRQ storm in k3_clear_irqstatus

The function dispc_k3_clear_irqstatus was previously using a clear mask
derived from its submodule irqstatus values to clear the main
dispc_irqstatus register. This created a scenario where, if the
submodule irqstatus bits were cleared but the main dispc_irqstatus
wasn't, the driver would fail to clear the interrupt, leading to an
interrupt storm.

Additionally, when vp/vid irqenable values were being cleared while the
corresponding IRQ status bits remained set. The vp/vid IRQ status bits
would unexpectedly clear themselves, while the associated
dispc_irqstatus bits remained set. This mismatch would put us in the
interrupt storm mentioned above.

This patch ensures the main clear mask is correctly derived from the
dispc_irqstatus register itself, preventing the above issues. It also
ensure that the vp/vid irqstatus bits are cleared correctly.

Signed-off-by: Jonathan Cormier <jcormier@criticallink.com>
---
 drivers/gpu/drm/tidss/tidss_dispc.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
index 5bcc9153a977..c3c86f914b7a 100644
--- a/drivers/gpu/drm/tidss/tidss_dispc.c
+++ b/drivers/gpu/drm/tidss/tidss_dispc.c
@@ -935,27 +935,24 @@ static void dispc_k3_vid_set_irqenable(struct dispc_device *dispc,
 }
 
 static
-void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
+void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask, dispc_irq_t dispc_clearmask)
 {
 	unsigned int i;
-	u32 top_clear = 0;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		if (clearmask & DSS_IRQ_VP_MASK(i)) {
 			dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(i);
 		}
 	}
 	for (i = 0; i < dispc->feat->num_planes; ++i) {
 		if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
 			dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
-			top_clear |= BIT(4 + i);
 		}
 	}
 	if (dispc->feat->subrev == DISPC_K2G)
 		return;
 
-	dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
+	dispc_write(dispc, DISPC_IRQSTATUS, dispc_clearmask);
 
 	/* Flush posted writes */
 	dispc_read(dispc, DISPC_IRQSTATUS);
@@ -964,7 +961,8 @@ void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t clearmask)
 static
 dispc_irq_t dispc_k3_read_and_clear_irqstatus(struct dispc_device *dispc)
 {
-	dispc_irq_t status = 0;
+	dispc_irq_t status = 0
+	dispc_irq_t dispc_irqstatus;
 	unsigned int i;
 
 	for (i = 0; i < dispc->feat->num_vps; ++i)
@@ -973,7 +971,9 @@ dispc_irq_t dispc_k3_read_and_clear_irqstatus(struct dispc_device *dispc)
 	for (i = 0; i < dispc->feat->num_planes; ++i)
 		status |= dispc_k3_vid_read_irqstatus(dispc, i);
 
-	dispc_k3_clear_irqstatus(dispc, status);
+	dispc_irqstatus = dispc_read(dispc, DISPC_IRQSTATUS);
+
+	dispc_k3_clear_irqstatus(dispc, status, dispc_irqstatus);
 
 	return status;
 }
@@ -1001,8 +1001,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 
 	old_mask = dispc_k3_read_irqenable(dispc);
 
-	/* clear the irqstatus for newly enabled irqs */
-	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
+	/* clear the irqstatus for the changed irqs */
+	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask), 0);
 
 	for (i = 0; i < dispc->feat->num_vps; ++i) {
 		dispc_k3_vp_set_irqenable(dispc, i, mask);
@@ -1020,8 +1020,15 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
 			main_disable |= BIT(i + 4);	/* VID IRQ */
 	}
 
-	if (main_enable)
+	/*
+	 * clear the vp/vid irqstatus after to handle any interrupts that were disabled
+	 * also specifically clear the main irq status bits that were cleared/set
+	 */
+	dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask), main_enable | main_disable);
+
+	if (main_enable) {
 		dispc_write(dispc, DISPC_IRQENABLE_SET, main_enable);
+	}
 
 	if (main_disable)
 		dispc_write(dispc, DISPC_IRQENABLE_CLR, main_disable);
-- 
2.25.1

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

> Tests ran about 2400 times. I'd expect about 2-4 rcu_preempts but saw none.

Thanks Jonathan this is good information.

>Based on this, I think what is happening is the VSYNC irqs get disabled, but the code doesn't clear the statuses. And unfortunately the level 2 statuses clear themselves, leaving the level 1 DISPC status still set. And the previous code will never clear the level 1 status if the level 2 aren't set.

Well I had discussed the same with IP team previously and they mentioned this scenario is not possible that HW clear the VP status automatically. I am planning to run some experiment to see if this is actually happenning in which case it is an IP/H/W bug.

> I updated my original patch to include this new information. Let me know if you think its worth testing this variant of the fix.

This looks similar to /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_global_2D00_irq_2D00_status_2D00_unconditionally.patch and the version Bin made so I am pretty sure it would work :) as previous test worked, but last I checked with maintainer on these approaches I was asked to get clarification on exact HW issue landing us into this state and H/W team has denied existence of any such issues faced in their testing, so for now my focus is to get some proof and steps to reproduce which clearly convey any H/W abnormally if any and also rule out any s/w bugs and later decide on which version of patch is the best approach. But if we decide to go with this and maintainer is okay than irrespective of which version gets chosen we will add developed-by/signed-off for you and Bin.

Also could you please help to put a test with all changes reverted and just applying the avoid race condition patch shared in previous patch : /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Avoid_2D00_race_2D00_condition_2D00_while_2D00_handling_2D00_interr.patch ?

I just want to rule out any race conditions causing this behaviour :).

Regards

Devarsh

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Also could you please help to put a test with all changes reverted and just applying the avoid race condition patch shared in previous patch : /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Avoid_2D00_race_2D00_condition_2D00_while_2D00_handling_2D00_interr.patch ?

Can you work me through this patch? The k2g functions don't run on the k3 hardware. The hw_init function gets called before the irq handlers are setup. I don't see where the race conditions are, at least on the am62x.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

There is a possibility of race between irq handler (tidss_irq_handler) and other functions to program/clear/handle the interrupt status related registers for e.g. tidss_irq_disable_vblank. This patch also adds locking to the irq handler to avoid this scenario.

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Ah okay, I didn't notice those locks were already in place.

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Thanks Jonathan, this information helps. Could you please revert all the patches and try with just the attached patch ? /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch

Okay stopped testing this patch. Two units ran about 5k tests each without a rcu_preempt.

Starting the spin_lock patch testing.

0 Jonathan Cormier 5 months ago in reply to Jonathan Cormier

Genius 3710 points

Jonathan Cormier said:
Also could you please help to put a test with all changes reverted and just applying the avoid race condition patch shared in previous patch : /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Avoid_2D00_race_2D00_condition_2D00_while_2D00_handling_2D00_interr.patch ?

Both test units have had several rcu_preempts with only this patch.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

Thanks Jontathan, this helps I will probably try some experiments and have some conversations with IP team for any unexpected behaviours from the IP before concluding. But anyways I hope you are unblocked on this thread with the set of patches we have on this thread.

0 Bob Duke 5 months ago in reply to Devarsh Thakkar

Intellectual 271 points

Devarsh,

We have shared a version of the patches discussed here with our customer and initial testing has shown an elimination of the rcu_preempts during operation. Thanks for your help.

-Bob

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
But anyways I hope you are unblocked on this thread with the set of patches we have on this thread

Thanks Devarsh, Our customer has been testing with Bins Clear global irq patch, and it looks good. So yes we are unblocked.

However, I am still waiting to see what we think the final official patch should look like. Let me know if we can help.

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Thanks Jonathan, this information helps. Could you please revert all the patches and try with just the attached patch ? /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch

We have been testing this patch by itself, and while it reduces the failure rate, we still see rcu preempts. Our latest testing showed a failure in ~1/5000 tests.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

Thanks Jonathan, that's an interesting observation, I would be curious to know the observations with both

/cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Avoid_2D00_race_2D00_condition_2D00_while_2D00_handling_2D00_interr.patch and https://e2e.ti.com/cfs-

file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_the_2D00_interrupt_2D00_status_2D00_for_2D00_interrupts_2D00_.patch applied together.

If RCU stall is still seen with above two patches applied, then I guess it could be that some H/W IP bug is landing into a situation where level 1 and level 2 irq are getting mismatched and we then mandatorily need /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_drm_2D00_tidss_2D00_Clear_2D00_global_2D00_irq_2D00_status_2D00_unconditionally.patch

to force clear the global irq unconditionally.

0 Jonathan Cormier 5 months ago in reply to Devarsh Thakkar

Genius 3710 points

Devarsh Thakkar said:
Thanks Jonathan, that's an interesting observation, I would be curious to know the observations with both

Sure we will start this testing.

Devarsh Thakkar said:
If RCU stall is still seen with above two patches applied, then I guess it could be that some H/W IP bug is landing into a situation where level 1 and level 2 irq are getting mismatched and we then mandatorily need

We are currently planning on using both the clear irq status patch and the clear irq unconditionally patch as that is super solid. Granted just the clear irq unconditionally is enough to remove the chance of the infinite irq storm.

0 Devarsh Thakkar 5 months ago in reply to Jonathan Cormier

TI__Prodigy 150 points

Hi Jonathan,

Just wanted to check if tests ran well or not.

Regards

Devarsh

Processors

Processors forum

AM625: Issue about tidss rcu_preempt self-detected stall on CPU