AM62A7: Internal error: Oops

Joshua Bourgeot

Part Number: AM62A7
Other Parts Discussed in Thread: TFP410

Tool/software:

Doing reboot tests with the new 6.6 kernel (10 sdk) on the am62a7, we have noticed an "Internal error: Oops" related to pwm in about 1 in 4500 boots. See log snippet:

[    6.897404] Mem abort info:
[    6.900210]   ESR = 0x0000000086000006
[    6.907004]   EC = 0x21: IABT (current EL), IL = 32 bits
[    6.912723]   SET = 0, FnV = 0
[    6.912745]   EA = 0, S1PTW = 0
[    6.912748]   FSC = 0x06: level 2 translation fault
[    6.912755] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000849f7000
[    6.912761] [0000000000000000] pgd=08000000848f9003, p4d=08000000848f9003, pud=08000000848cb003, pmd=0000000000000000
[    6.912783] Internal error: Oops: 0000000086000006 [#1] PREEMPT SMP
[    6.912791] Modules linked in: display_connector pwm_omap_dmtimer v4l2_jpeg dwc3_am62 k3_j72xx_bandgap rtc_ti_k3 rti_wdt j721e_csi2rx ti_k3_r5_remoteproc wave5 ti_k3_dsp_remoteproc videobuf2_dma_contig v4l2_mem2mem videobuf2_memops virtio_rpmsg_bus rpmsg_ns videobuf2_v4l2 rpmsg_core v4l2_async snd_soc_davinci_mcasp videobuf2_common snd_soc_ti_udma snd_soc_ti_edma tidss snd_soc_ti_sdma videodev ti_tfp410 drm_dma_helper sa2ul cdns_dphy_rx mcrc64 mc drm_kms_helper ltc2945 at24 spi_omap2_mcspi pwm_tiehrpwm optee_rng rng_core fuse drm drm_panel_orientation_quirks ipv6
[    6.912905] CPU: 3 PID: 48 Comm: kworker/u8:3 Not tainted 6.6.32-01382-g0718452195f9-dirty #23
[    6.912912] Hardware name: Critical Link MitySOM-AM62A (DT)
[    6.912919] Workqueue: events_unbound deferred_probe_work_func
[    6.912947] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    6.912955] pc : 0x0
[    6.912965] lr : of_pwm_get+0x158/0x238
[    6.912975] sp : ffff800081e6ba00
[    6.912978] x29: ffff800081e6ba00 x28: 0000000000000000 x27: ffff0000078e1c80
[    6.912988] x26: ffff00007fccd4c0 x25: ffff80008196ef30 x24: ffff000001123c10
[    6.912997] x23: 0000000000000000 x22: ffff00007fccd4a8 x21: ffff00007fccd948
[    6.913005] x20: ffff80008196ef50 x19: ffff0000063d8d80 x18: ffffffffffffffff
[    6.913014] x17: ffff7ffffc77b000 x16: ffff800081af0000 x15: ffff00000602a3da
[    6.913023] x14: ffffffffffffffff x13: 0000000000000038 x12: 0101010101010101
[    6.913032] x11: 7f7f7f7fffffffff x10: ffff840083413df4 x9 : ffff8000806e2688
[    6.913040] x8 : 0101010101010101 x7 : 000000000000736c x6 : 00000000000080ef
[    6.913049] x5 : fffffbfffde0a580 x4 : 0000000000000000 x3 : 0000000000000000
[    6.913057] x2 : 0000000000000000 x1 : ffff800081e6ba68 x0 : ffff0000063d8d80
[    6.913068] Call trace:
[    6.913071]  0x0
[    6.913080]  devm_fwnode_pwm_get+0x5c/0xb0
[    6.913087]  led_pwm_probe+0x174/0x420
[    6.913096]  platform_probe+0x70/0xf0
[    6.913104]  really_probe+0x150/0x2c0
[    6.913111]  __driver_probe_device+0x80/0x140
[    6.913119]  driver_probe_device+0xe0/0x170
[    6.913126]  __device_attach_driver+0xc0/0x148
[    6.913134]  bus_for_each_drv+0x88/0xf0
[    6.913143]  __device_attach+0xb0/0x1c0
[    6.913151]  device_initial_probe+0x1c/0x30
[    6.913159]  bus_probe_device+0xb4/0xc0
[    6.913166]  deferred_probe_work_func+0x90/0xd0
[    6.913173]  process_one_work+0x148/0x388
[    6.913184]  worker_thread+0x338/0x450
[    6.913190]  kthread+0x120/0x130
[    6.913197]  ret_from_fork+0x10/0x20
[    6.913225] Code: ???????? ???????? ???????? ???????? (????????) 
[    6.913235] ---[ end trace 0000000000000000 ]---

We are unsure what this could possibly be related to. Has there been any large changes to the PWM system that could be related?

Thank you,

Joshua Bourgeot

over 1 year ago

0 Aparna Patra over 1 year ago

TI__Genius 12980 points

Hi,

Is this issue still seen?

Regards,
Aparna

0 Joshua Bourgeot over 1 year ago in reply to Aparna Patra

Prodigy 240 points

Hello Aparna,

Yes, this issue is still present. We have made some progress debugging though. The line which is causing the error is in of_pwm_get in the drivers/pwm/core.c file. Specifically the of_xlate function call:

pwm = chip->of_xlate(chip, &args);

It appears that all though rare (again we saw this problem about once every several thousand boot cycles), the of_xlate function pointer can be null.

Our current problem is figuring out why that is and how we can address it.

Thank you,

Joshua Bourgeot

0 Joshua Bourgeot over 1 year ago in reply to Joshua Bourgeot

Prodigy 240 points

Some additional debug information on the state of the chip struct right before the of_xlate call.

Bad boot:

[    7.121912] chip->dev: 00000000dfcdfcd7
[    7.121919] chip->ops: 00000000570db9b8
[    7.121922] chip->base: 2
[    7.121924] chip->npwm: 1
[    7.121926] chip->of_xlate: 0000000000000000
[    7.121929] chip->of_pwm_n_cells: 0
[    7.121932] chip->list.prev: 000000003b77431f
[    7.121934] chip->list.next: 000000009e5ade5e
[    7.121937] chip->pwms: 000000007463478b

Good boot:

[    7.255214] chip->dev: 000000008f62e400
[    7.255221] chip->ops: 000000002b5abdbb
[    7.255224] chip->base: 2
[    7.255226] chip->npwm: 1
[    7.255229] chip->of_xlate: 00000000617ff81a
[    7.255232] chip->of_pwm_n_cells: 3
[    7.255234] chip->list.prev: 00000000ad60268a
[    7.255237] chip->list.next: 000000000752ab6c
[    7.255239] chip->pwms: 00000000aa6b6bd7

Thank you,

Joshua Bourgeot

0 Aparna Patra over 1 year ago in reply to Joshua Bourgeot

TI__Genius 12980 points

Hi,

Are you still seeing the issue?

Regards,
Aparna

0 Jonathan Cormier over 1 year ago in reply to Aparna Patra

Genius 4061 points

Thanks for checking in Aparna.

We were able to solve the issue by reverting the "c8135b5174145a65c72c4303f2752cc8cecf8d08 pwm: Reduce time the pwm_lock mutex is held in pwmchip_add()" commit.

Note that this change would not be required in the 6.12 branch since the mutex change is irrelevant since they switched to the auto mutex syntax.

Fullscreen 0001-Revert-pwm-Reduce-time-the-pwm_lock-mutex-is-held-in.diff Download

From 20c28ac9f32a264115e7e840c423ea01d583bdec Mon Sep 17 00:00:00 2001
From: Joshua Bourgeot <jbourgeot@criticallink.com>
Date: Wed, 20 Nov 2024 10:58:35 -0500
Subject: [PATCH] Revert "pwm: Reduce time the pwm_lock mutex is held in
 pwmchip_add()"

This reverts commit c8135b5174145a65c72c4303f2752cc8cecf8d08.

Releasing the lock early allows pwm_get to try and access chip before
initialization is complete.

Had issue with led_pwm_probe crashing kernel due to null pointer in
chip->of_xlate.
---
 drivers/pwm/core.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
index 0c8c63239adb..51fabb8958fa 100644
--- a/drivers/pwm/core.c
+++ b/drivers/pwm/core.c
@@ -272,21 +272,20 @@ int pwmchip_add(struct pwm_chip *chip)
 	if (!pwm_ops_check(chip))
 		return -EINVAL;
 
-	chip->pwms = kcalloc(chip->npwm, sizeof(*pwm), GFP_KERNEL);
-	if (!chip->pwms)
-		return -ENOMEM;
-
 	mutex_lock(&pwm_lock);
 
 	ret = alloc_pwms(chip->npwm);
-	if (ret < 0) {
-		mutex_unlock(&pwm_lock);
-		kfree(chip->pwms);
-		return ret;
-	}
+	if (ret < 0)
+		goto out;
 
 	chip->base = ret;
 
+	chip->pwms = kcalloc(chip->npwm, sizeof(*pwm), GFP_KERNEL);
+	if (!chip->pwms) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	for (i = 0; i < chip->npwm; i++) {
 		pwm = &chip->pwms[i];
 
@@ -297,14 +296,18 @@ int pwmchip_add(struct pwm_chip *chip)
 
 	list_add(&chip->list, &pwm_chips);
 
-	mutex_unlock(&pwm_lock);
+	ret = 0;
 
 	if (IS_ENABLED(CONFIG_OF))
 		of_pwmchip_add(chip);
 
-	pwmchip_sysfs_export(chip);
+out:
+	mutex_unlock(&pwm_lock);
 
-	return 0;
+	if (!ret)
+		pwmchip_sysfs_export(chip);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(pwmchip_add);
 
-- 
2.25.1

0 Suren Porwar over 1 year ago in reply to Jonathan Cormier

TI__Mastermind 34440 points

Thanks Jonathan. I am going to close this thread as by reverting the commit, you are able to resolve.

Best Regards,

Suren

Processors

Processors forum

AM62A7: Internal error: Oops