Other Parts Discussed in Thread: TPS65219
Tool/software:
Hi Dear Expert,
Current SDK version is Processor SDK 10.00.07.04. This issue is normal in 25 degrees Celsius.
dump stack msg:
[15166.525026] Unable to handle kernel paging request at virtual address ffff0080010fd828
[15166.533093] Mem abort info:
[15166.535958] ESR = 0x0000000096000004
[15166.539773] EC = 0x25: DABT (current EL), IL = 32 bits
[15166.539787] SET = 0, FnV = 0
[15166.539792] EA = 0, S1PTW = 0
[15166.539796] FSC = 0x04: level 0 translation fault
[15166.539801] Data abort info:
[15166.539804] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[15166.539810] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[15166.539815] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[15166.539821] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082c43000
[15166.539836] [ffff0080010fd828] pgd=0000000000000000, p4d=0000000000000000
[15166.539852] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[15166.539860] Modules linked in: iptable_filter iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables x_tables rpmsg_ctrl rpmsg_char wlan_mt7961_sdio mtprealloc rti_wdt ti_k3_r5_remoteproc realtek rtc_ti_k3 ti_k3_m4_remoteproc sa2ul authenc mcrc64 tps65219_pwrbutton tps6598x typec cfg80211 cryptodev(O) fuse ipv6
[15166.594699] CPU: 0 PID: 1133 Comm: rx_thread Tainted: G O 6.6.32-ti #1
[15166.594716] Hardware name: Texas Instruments AM62x LP SK (DT)
[15166.594722] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[15166.594732] pc : am65_cpsw_nuss_rx_poll+0x164/0x994
[15166.594763] lr : am65_cpsw_nuss_rx_poll+0x160/0x994
[15166.594774] sp : ffff800080003da0
[15166.594778] x29: ffff800080003e30 x28: ffff00000b40ee00 x27: ffff0080010fd800
[15166.659437] x26: ffff00000042e4c0 x25: 0000000000000040 x24: ffff0000007f7000
[15166.666575] x23: ffff000001186080 x22: 0000000000000000 x21: 0000000000000027
[15166.680824] x20: ffff0000011875c8 x19: ffff0000006b7810 x18: 0000000000000036
[15166.680839] x17: ffff7fff9b1af000 x16: ffff800080000000 x15: 472fdd4e168924bf
[15166.687971] x14: c1cb1a37c6bd6673 x13: 0000497b00021850 x12: 0000000000000006
[15166.687984] x11: 000000000000d780 x10: ffff0000034207d8 x9 : ffff0000091ff10e
[15166.709348] x8 : ffff800080003e28 x7 : 007a0080810fd800 x6 : 007a0080810fd800
[15166.716483] x5 : ffff800080003de8 x4 : 00000000000001f4 x3 : 000000000000006a
[15166.723616] x2 : 00000000817a0000 x1 : 0000007fff95d800 x0 : ffff0080010fd800
[15166.730755] Call trace:
[15166.730762] am65_cpsw_nuss_rx_poll+0x164/0x994
[15166.730778] __napi_poll+0x38/0x178
[15166.741211] net_rx_action+0x128/0x270
[15166.744958] __do_softirq+0x100/0x26c
[15166.748618] ____do_softirq+0x10/0x1c
[15166.752278] call_on_irq_stack+0x24/0x4c
[15166.756198] do_softirq_own_stack+0x1c/0x2c
[15166.760378] do_softirq+0x54/0x6c
[15166.763692] __local_bh_enable_ip+0x8c/0x98
[15166.767873] netif_rx+0x6c/0x80
[15166.767888] kalRxIndicateOnePkt+0x120/0x3cc [wlan_mt7961_sdio]
[15166.776960] rx_thread+0x138/0x2c8 [wlan_mt7961_sdio]
[15166.782320] kthread+0x110/0x114
[15166.785555] ret_from_fork+0x10/0x20
[15166.789139] Code: d65f03c0 f9400a80 97ffec46 aa0003fb (f9401405)
[15166.795228] ---[ end trace 0000000000000000 ]---
[15166.827532] pstore: backend (ramoops) writing error (-28)
[15166.832962] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[15166.839832] SMP: stopping secondary CPUs
[15166.843760] Kernel Offset: disabled
[15166.847242] CPU features: 0x0,00000008,00020000,1000420b
[15166.852549] Memory Limit: none
[15166.883272] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
a. The abnormal instruction viewed from the crash log is as follows(ti-am65-cpsw-nuss.S, k3-cppi-desc-pool.S)
k3-cppi-desc-pool.S:
16 000000000000001c <k3_cppi_desc_pool_dma2virt>:
17 1c: b40000a1 cbz x1, 30 <k3_cppi_desc_pool_dma2virt+0x14>
18 20: a9408002 ldp x2, x0, [x0, #8]
19 24: cb020021 sub x1, x1, x2
20 28: 8b010000 add x0, x0, x1
21 2c: d65f03c0 ret
ti-am65-cpsw-nuss.S:
1580593 ffff8000805dde64: 97ffec46 bl ffff8000805d8f7c <k3_cppi_desc_pool_dma2virt>
1580594 ffff8000805dde68: aa0003fb mov x27, x0
1580595 ffff8000805dde6c: f9401405 ldr x5, [x0, #40]
b、Corresponding to the issue with C code, line 2914 in include/linux/dma/ti-cppi5.h:
488 static inline void cppi5_hdesc_get_obuf(struct cppi5_host_desc_t *desc,
489 dma_addr_t *obuf, u32 *obuf_len)
490 {
491 *obuf = desc->org_buf_ptr;
492 *obuf_len = desc->org_buf_len & CPPI5_OBUFINFO0_HDESC_BUF_LEN_MASK;
493 }
c. Call stack:
static int am65_cpsw_nuss_rx_poll(struct napi_struct *napi_rx, int budget)
--> static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
-------> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_idx, &desc_dma);
-------> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
---------> void *k3_cppi_desc_pool_dma2virt(struct k3_cppi_desc_pool *pool, dma_addr_t dma)
return dma ? pool->cpumem + (dma - pool->dma_addr) : NULL;
dma = 80810FD800 , pool->dma_addr = 00000000817a0000
X1 = X1 - X2
0000007fff95d800 = 80810FD800 - 00000000817a0000
X0 = X0 + X1
ffff0080010fd800 = FFFF0000017A0000 + 0000007fff95d800
-------> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
From the above analysis, it can be seen that the dma is too large, causing the pool->cpumem pointer to be accessed beyond the limit.
d. more logs see attachs.