TDA4VM: TDA4VM: A72 crashed and RTI0 ESM 344 is trigger

dakar

Part Number: TDA4VM

Tool/software:

Hi ，

During operation of TDA4, RTI0 A72 wagchdog timeout after A72 crashed at 2024/12/31 9:51:59 (Running for 52 minutes),But The kernel log (journalctl )did not record any exceptions.

1.What causes this situation?

2. Could you provide survey ideas and suggestions abort a72 crash?

Using SDK 8.6. psdkla/board-support/linux-5.10.162+gitAUTOINC+76b3e88d56-g76b3e88d56/drivers/watchdog/rti_wdt.c

kernel log (journalctl)：

25 days ago

0 Keerthy J 25 days ago

TI__Guru**** 151910 points

dakar said:
During operation of TDA4, RTI0 A72 wagchdog timeout after A72 crashed at 2024/12/31 9:51:59 (Running for 52 minutes),But The kernel log (journalctl )did not record any exceptions.

1.What causes this situation?

How is this log indicating that watchdog timed out? Did you see a reset of Linux?

dakar said:
2. Could you provide survey ideas and suggestions abort a72 crash?

Please share the complete logs as a text file attachment.

- Keerthy

0 dakar 22 days ago in reply to Keerthy J

Prodigy 90 points

”How is this log indicating that watchdog timed out? Did you see a reset of Linux?”

- Yes,We see that the ESM 344 event has been triggered,and the system was reset

“Please share the complete logs as a text file attachment.”

We only found that the application log stopped（without any errors） and ESM 344 was triggered.

We have identified the following issues that need to be optimized. Could you please provide some suggestions？
1.The priority of watchdogd kworker task is FIFO 50 same as vxe_enc /mmc et.all irq ,Can we set the watchdog priority to FIFO 99？

2. All system interrupts default binding to core0 , Can we move the vxe-enc 、cpsw9g irq to core1 ？Will there be any case performance issues ?For example, cache synchronization?

0 Keerthy J 21 days ago in reply to dakar

TI__Guru**** 151910 points

Hi,

https://www.geeksforgeeks.org/priority-of-process-in-linux-nice-value/

Linux Nice value could be one way. We do not have expertise on the user space side.

dakar said:
All system interrupts default binding to core0 , Can we move the vxe-enc 、cpsw9g irq to core1 ？Will there be any case performance issues ?For example, cache synchronization?

Yes.

https://docs.kernel.org/core-api/irq/irq-affinity.html

cd to the

/proc/irq/n

Where n is the CPSW9g IRQ

echo 0x2 > smp_affinity

- Keerthy

0 dakar 13 days ago in reply to Keerthy J

Prodigy 90 points

Hi，

We have reproduced this bug:crash on cpsw interrupt...

0 Keerthy J 12 days ago in reply to dakar

TI__Guru**** 151910 points

Hi,

Now the crash is always consistency here?

What are the active use cases that need to be run to reproduce this?

Best Regards,

Keerthy

0 dakar 12 days ago in reply to Keerthy J

Prodigy 90 points

“Now the crash is always consistency here? ”

We only caught this log once,bug the crash reproduce low probability

"What are the active use cases that need to be run to reproduce this?"

we run the view tool .Will send a large amount of video data through the internet.

0 Keerthy J 12 days ago in reply to dakar

TI__Guru**** 151910 points

Hi,

Okay. I am sharing a potential fix. Please try if that fixes the issue.

diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
index 148f54d96..164d3999b 100644
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -1177,11 +1177,13 @@ static int k3_ringacc_ring_push_mem(struct k3_ring *ring, void *elem)
 
 static int k3_ringacc_ring_pop_mem(struct k3_ring *ring, void *elem)
 {
-       void *elem_ptr;
+       volatile dma_addr_t *elem_ptr;
 
        elem_ptr = k3_ringacc_get_elm_addr(ring, ring->state.rindex);
 
-       memcpy(elem, elem_ptr, (4 << ring->elm_size));
+       while (*elem_ptr == 0);
+       memcpy_fromio(elem, elem_ptr, (4 << ring->elm_size));
+       memset_io(elem_ptr, 0, (4 << ring->elm_size));
 
        ring->state.rindex = (ring->state.rindex + 1) % ring->size;
        ring->state.occ--;

- Keerthy

Processors

Processors forum

TDA4VM: TDA4VM: A72 crashed and RTI0 ESM 344 is trigger