This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6422: AM6422 BIST tool support

Part Number: AM6422

1. Scenario:

We need to combine the BIST tool with memtester as the basis for verifying the timing parameters of LPDDR4. This test must be included in all device factory verifications and problem regression of faulty devices.

2. Problem:

We are currently encountering abnormal system hangs and data overwrites, including memory pointer overwrites. However, the power-on initialization training is 100% successful. The problem was only discovered during long-term operation and is difficult to reproduce. The intervals range from 3 days to 2 weeks, and it frequently occurs on a small number of machine units with faulty devices.

3. Tool Requirements

We need BIST detection controllers and chips, as well as the identification of devices and hardware link health.

  • Zheng, 

    Please help me to understand what is being requested here. It sounds like there is some marginality in the DDR design or configuration which is resulting in memory corruption after a long period of operation. However, you then specify that there are data overwrites and memory pointer overwrites. Are you suspecting corruption of internal memory containing instruction code or corruption of the DDR memory contents and/or access to DDR?

    With regards to BIST are you trying to focus on testing DDR health? How do you anticipate this identifying this issue during factory verification if failures may take weeks to observe?

    Do you have a sense of what % of units are demonstrating a failure?

    Any feedback you have to help understand the desired course of debug and support is appreciated. It is not clear to me if you are trying to optimize and resolve DDR failures or are looking for a runtime solution to identify and report a failure has occurred. 

    Thanks,

    Chris 

  • Hi, chris, 

    The screenshot shows a captured kernel panic log. The crash occurred when referencing pointer X3 via the instruction: LDUR X5, [X3, #-16].
    Tracing back the origin of X3, the previous instruction was LDR X3, [X0, #0x18], which is an ARM64 load instruction. Internally, the CPU initiated a read request, fetching data from the memory address X0 + 0x18 and storing it into X3.
    The base address (X0) is ffff0000102661b8.
    For the target address calculation involving offset 0x18: ffff0000102661b8 + 0x18 = ffff0000102661d0.
    Similarly, for the instruction LDR X4, [X0, #0x8], the CPU reads data from address X0 + 0x8 into X4.
    The target address is ffff0000102661b8 + 0x8 = ffff0000102661c0.
    However, the values read into X3 and X4, as seen in the log, exhibit address residue in their lower bytes. As highlighted in the diagram (e.g., ..ff..00..26..c0 and ..ff..00..26..d0), the data appears to contain remnants of the address itself.

    Hardware Path Analysis of LDR Instruction
    We analyzed the hardware path for the LDR instruction: CPU Core → L1 Cache → L2 Cache (LLC) → DDR Controller → PHY → Physical DDR.
    Based on this, we formulated two hypotheses:
    1. Address residue on the external bus due to timing training issues.
    2. Signal crosstalk.
    Exclusion of Hypotheses
    The following analysis rules out both causes:
    1. In LPDDR4, pin functions are strictly segregated: CA pins are dedicated to commands and addresses (Row, Column, Bank), DQ pins handle data, and DMI pins manage mask control during Mask Writes. There is no mechanism that multiplexes address signals onto the DQ bus. Consequently, we can exclude bus residue caused by hardware link or particle defects.
    2. PCB Stack-up and Crosstalk Analysis
    The CPU board utilizes a standard 10-layer stack-up. The DQS, DQ, and DM signals are routed on Layer 3, while the CA, CK, and CKE signals are on Layer 8. These signal layers are separated by two power planes and two ground planes.
    Given this standard structure and the effective shielding provided by the intermediate planes, we can effectively rule out external crosstalk on the PCB.
    Deep Dive Analysis: AM6422 PHY Direction Switching Fault
    Further analysis points to a "Read/Write Switching Logic Fault" within the AM6422 SoC's internal DDR PHY as the primary suspect. In chip-level debugging, this is often referred to as "Bus Turn-around Leakage" or "Direction Switch Deadlock."
    Architecture Context
    The DDR PHY defines the physical interface between the Memory Controller (MC) and external SDRAM. It houses data paths and I/O logic modules responsible for toggling DQ pin direction (Input/Output).
    Mechanism Deduction
    1. I/O Buffer Architecture: Inside the PHY, each DQ pin corresponds to a bidirectional I/O buffer (Pad), comprising a transmit driver, a receive comparator, and a direction control switch.
    2. Turn-around Timing: After a read command, the PHY must undergo a bus turn-around time to switch the buffer from "Transmit Mode" to "Receive Mode."
    3. Failure Trigger: Due to temperature drift, voltage droop, or a PHY state machine bug, the direction switch may fail to complete in time. The transmit driver may fail to shut down (go Hi-Z) before external data arrives.
    4. Formation of "Loopback Leakage":
      • Residual Drive: Prior to the read, the bus performed a write or command transmission, leaving residual address/data levels on the internal bus.
      • Contention: Due to the switching lag, the weak conduction of the internal transmit driver pulls these residual levels (the previous address) onto the DQ pin, overriding the weak signal from the external DRAM.
      • Erroneous Sampling: The PHY receiver samples this "contaminated" pin, capturing the SoC's own address residue instead of data from the DRAM.
    Phenomenon Explanation
    • Data Resembles Address: The read data is effectively the address that was on the bus previously.
    • Bit-wise Discrepancy: Load and delay variations across Byte Lanes can cause inconsistent switching. High-order bit buffers might switch fast enough to receive correct data, while low-order bit buffers switch slower, suffering from "leakage" and capturing address residue.
    Therefore, we request TI's assistance in investigating this primary suspect. We also ask for recommendations on diagnostic tools, such as utilizing the PHY internal loopback mode for verification.
  • Zheng, 

    It sounds like you are suspecting possible data corruption when accessing DDR. Can you run memtester overnight on a board that has been known to fail? This will help run data patterns that could possibly show if there are marginal failures which may be leading to this problem. If we rule out DDR stability we can start to investigate the runtime SW to look if the load instruction is causing the issue. If memtester shows failures we can inspect the DDR configuration and try to identify improvements to eliminate failures. 

    I will also add DDR and SW experts to this thread to monitor once you have feedback. 

    Thanks,

    Chris 

  • hi, chris,

    As we have been analyzing this issue for quite a long time, we initially suspected it to be a software problem. Therefore, we conducted relevant memory pressure tests at the very beginning. The attachment is the test record from the board with known issues that we conducted previously.

    We have implemented many module isolation and status monitoring functions for the software. However, unfortunately, no valuable findings were obtained. But the current phenomenon we have proposed, regarding the memory access of the underlying instructions after disassembly, can basically be attributed to the failure at the physical layer. I wonder if this is correct?
    Attached are two original error logs for reference, as well as the stress test logs of memtest for further reference.
    093843.221: [759069.422858] Unable to handle kernel paging request at virtual address 10ff6100ff2600c0
    093843.229: [759069.422885] Mem abort info:
    093843.229: [759069.422888]   ESR = 0x0000000096000004
    093843.229: [759069.422892]   EC = 0x25: DABT (current EL), IL = 32 bits
    093843.231: [759069.422899]   SET = 0, FnV = 0
    093843.231: [759069.422902]   EA = 0, S1PTW = 0
    093843.231: [759069.422905]   FSC = 0x04: level 0 translation fault
    093843.231: [759069.422909] Data abort info:
    093843.231: [759069.422912]   ISV = 0, ISS = 0x00000004
    093843.231: [759069.422915]   CM = 0, WnR = 0
    093843.231: [759069.422918] [10ff6100ff2600c0] address between user and kernel address ranges
    093843.231: [759069.422927] Internal error: Oops: 0000000096000004 [#1] PREEMPT_RT SMP
    093843.245: [759069.422943] Dumping ftrace buffer:
    093843.245: [759069.422952]    (ftrace buffer empty)
    093843.245: [759069.422955] Modules linked in: sch_mqprio xt_addrtype xt_limit xt_conntrack iptable_filter usb_f_ecm g_ether usb_f_rndis u_ether libcomposite rpmsg_pr093843.245: u rpmsg_ctrl rpmsg_char virtio_rpmsg_bus rpmsg_ns cdns3 cdns_usb_common cdns3_ti ti_k3_r5_remoteproc omap_mailbox irq_pruss_intc mwdriver(O) icssg_prueth 093843.245: pru_rproc icss_iep pruss
    093843.245: [759069.423045] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.1.127-rt48-senux #1
    093843.245: [759069.423055] Hardware name: P3Y EP based on Texas Instruments AM642 EVM (DT)
    093843.245: [759069.423061] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    093843.245: [759069.423070] pc : plist_del+0x20/0x74
    093843.245: [759069.423091] lr : pick_next_task_rt+0x8c/0x1c0
    093843.245: [759069.423103] sp : ffff800009023cd0
    093843.245: [759069.423106] x29: ffff800009023cd0 x28: ffff00001cf75980 x27: ffff800008c5ff70
    093843.245: [759069.423118] x26: ffff80000903b540 x25: ffff80000903bbf0 x24: 0000000000000000
    093843.245: [759069.423129] x23: ffff00001cf76180 x22: ffff00001cf761f0 x21: ffff000010265d00
    093843.245: [759069.423140] x20: ffff000010265ec0 x19: ffff00001cf75980 x18: 0000000000000000
    093843.245: [759069.423150] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000ebc8fb30
    093843.245: [759069.423160] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    093843.245: [759069.423170] x11: 0000000000000000 x10: 0000000000000b70 x9 : ffff80000896e05c
    093843.245: [759069.423181] x8 : ffff80000903c110 x7 : 0000000000000000 x6 : ffff800009034870
    093843.245: [759069.423191] x5 : 0000000000000000 x4 : 10ff6100ff2600c0 x3 : 10ff6100ff2600d0
    093843.245: [759069.423201] x2 : ffff0000102661c0 x1 : ffff00001cf761f0 x0 : ffff0000102661b8
    093843.245: [759069.423213] Call trace:
    093843.245: [759069.423216]  plist_del+0x20/0x74
    093843.245: [759069.423224]  __schedule+0x54c/0x734
    093843.245: [759069.423236]  schedule_idle+0x2c/0x60
    093843.245: [759069.423246]  do_idle+0xc8/0x134
    093843.245: [759069.423253]  cpu_startup_entry+0x38/0x40
    093843.245: [759069.423260]  kernel_init+0x0/0x12c
    093843.267: [759069.423268]  arch_post_acpi_subsys_init+0x0/0x18
    093843.267: [759069.423282]  start_kernel+0x590/0x5cc
    093843.267: [759069.423291]  __primary_switched+0xbc/0xc4
    093843.267: [759069.423306] Code: f9400c03 f9400404 eb03003f 540000a0 (f85f0065) 
    093843.267: [759069.663148] ---[ end trace 0000000000000000 ]---
    093843.267: [759069.663155] Kernel panic - not syncing: Oops: Fatal exception
    093843.267: [759069.673664] SMP: stopping secondary CPUs
    093843.267: [759069.673675] Dumping ftrace buffer:
    093843.267: [759069.673680]    (ftrace buffer empty)
    093843.267: [759069.673693] Kernel Offset: disabled
    093843.267: [759069.673696] CPU features: 0x000000,00800004,0000400b
    093843.277: [759069.673703] Memory Limit: none
    093843.309: [759069.696596] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
    
    031059.624: 20260405_03:10:59[207313.155207] Unable to handle kernel paging request at virtual addre031059.636: ss ffff0000ff1a0000
    031059.636: 20260405_03:10:59[207313.155236] Mem abort info:
    031059.636: 20260405_03:10:59[207313.155239]   ESR = 0x0000000096000005
    031059.636: 20260405_03:10:59[207313.155243]   EC = 0x25: DABT (current EL), IL = 32 bits
    031059.636: 20260405_03:10:59[207313.155250]   SET = 0, FnV = 0
    031059.636: 20260405_03:10:59[207313.155253]   EA = 0, S1PTW = 0
    031059.640: 20260405_03:10:59[207313.155257]   FSC = 0x05: level 1 translation fault
    031059.645: 20260405_03:10:59[207313.155261] Data abort info:
    031059.645: 20260405_03:10:59[207313.155263]   ISV = 0, ISS = 0x00000005
    031059.645: 20260405_03:10:59[207313.155266]   CM = 0, WnR = 0
    031059.645: 20260405_03:10:59[207313.155270] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082c031059.645: ff000
    031059.645: 20260405_03:10:59[207313.155276] [ffff0000ff1a0000] pgd=180000009cff8003, p4d=180000009c031059.645: ff8003, pud=0000000000000000
    031059.645: 20260405_03:10:59[207313.155295] Internal error: Oops: 0000000096000005 [#1] PREEMPT_RT 031059.645: SMP
    031059.645: 20260405_03:10:59[207313.155309] Dumping ftrace buffer:
    031059.645: 20260405_03:10:59[207313.155317] ---------------------------------
    031059.645: 20260405_03:10:59[207313.155420]      cat-1039      0....1.. 5223734us : __arm64_sys_mpr031059.645: otect <-invoke_syscall
    031059.645: 20260405_03:10:59[207313.155438]      cat-1039      0....1.. 5225269us : __arm64_sys_mpr031059.645: otect <-invoke_syscall
    031059.645: 20260405_03:10:59[207313.155455]      cat-1039      0....1.. 5226825us : __arm64_sys_mpr031059.645: otect <-invoke_syscall
    ......
    031104.502: 20260405_03:11:04[207313.165194] CPU:1 [LOST 149858340 EVENTS]
    ......
    031517.508: 20260405_03:15:17[207313.630660] tR_00010-711       1....1.. 206553164992us : __arm64_sy031517.508: s_mprotect <-invoke_syscall
    031517.508: 20260405_03:15:17[207313.630676] tR_00020-713       1....1.. 206553165565us : __arm64_sy031517.508: s_mprotect <-invoke_syscall
    031517.508: 20260405_03:15:17[207313.630692] tR_00005-709       1....1.. 206553165931us : __arm64_sy031517.508: s_mprotect <-invoke_syscall
    031517.508: 20260405_03:15:17[207313.630697] ---------------------------------
    031517.508: 20260405_03:15:17[207313.630702] Modules linked in: xt_addrtype xt_limit xt_conntrack ip031517.508: table_filter usb_f_ecm g_ether usb_f_rndis u_ether libcomposite rpmsg_c031517.508: trl rpmsg_char rpmsg_pru virtio_rpmsg_bus rpmsg_ns cdns3 cdns_usb_commo031517.508: n cdns3_ti ti_k3_r5_remoteproc omap_mailbox irq_pruss_intc mwdriver(O) 031517.524: icssg_prueth pru_rproc icss_iep pruss
    031517.524: 20260405_03:15:17[207313.630790] CPU: 1 PID: 402 Comm: fppCallback Tainted: G           031517.524: O       6.1.127-rt48-senux #1
    031517.524: 20260405_03:15:17[207313.630801] Hardware name: P3Y EP based on Texas Instruments AM642 031517.524: EVM (DT)
    031517.524: 20260405_03:15:17[207313.630807] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS B031517.524: TYPE=--)
    031517.524: 20260405_03:15:17[207313.630817] pc : plist_del+0x20/0x74
    031517.524: 20260405_03:15:17[207313.630837] lr : pick_next_task_rt+0x8c/0x1c0
    031517.539: 20260405_03:15:17[207313.630847] sp : ffff80000a873bc0
    031517.539: 20260405_03:15:17[207313.630850] x29: ffff80000a873bc0 x28: ffff00001cf8d980 x27: ffff800008c94650
    031517.539: 20260405_03:15:17[207313.630862] x26: ffff0000070f2040 x25: ffff0000070f26e8 x24: 0000000000000001
    031517.555: 20260405_03:15:17[207313.630873] x23: ffff00001cf8e180 x22: ffff00001cf8e1f0 x21: ffff0000071ae040
    031517.555: 20260405_03:15:17[207313.630884] x20: ffff0000071ae200 x19: ffff00001cf8d980 x18: 0000000000000000
    031517.571: 20260405_03:15:17[207313.630895] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    031517.571: 20260405_03:15:17[207313.630905] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    031517.571: 20260405_03:15:17[207313.630916] x11: 0000000000000000 x10: ffff800009241610 x9 : ffff800008987a54
    031517.586: 20260405_03:15:17[207313.630927] x8 : ffff000000009b80 x7 : 0000000000000001 x6 : 0000000000000000
    031517.586: 20260405_03:15:17[207313.630937] x5 : 000000000000011c x4 : 07ffe500ff1a0000 x3 : ffff0000ff1a0010
    031517.599: 20260405_03:15:17[207313.630948] x2 : ffff0000071ae500 x1 : ffff00001cf8e1f0 x0 : ffff0000071ae4f8
    031517.602: 20260405_03:15:17[207313.630960] Call trace:
    031517.602: 20260405_03:15:17[207313.630964]  plist_del+0x20/0x74
    031517.602: 20260405_03:15:17[207313.630973]  __schedule+0x564/0x760
    031517.602: 20260405_03:15:17[207313.630986]  schedule+0x5c/0xd0
    031517.602: 20260405_03:15:17[207313.630996]  schedule_timeout+0x180/0x1d4
    031517.602: 20260405_03:15:17[207313.631005]  wait_for_completion_interruptible+0x100/0x1ac
    031517.618: 20260405_03:15:17[207313.631016]  gp_timer_ioctl+0xc0/0x2e4
    031517.618: 20260405_03:15:17[207313.631027]  gp_timer_compat_ioctl+0x20/0x2c
    031517.618: 20260405_03:15:17[207313.631033]  __arm64_compat_sys_ioctl+0x148/0x170
    031517.618: 20260405_03:15:17[207313.631045]  invoke_syscall+0x4c/0x114
    031517.618: 20260405_03:15:17[207313.631053]  el0_svc_common.constprop.0+0xf8/0x11c
    031517.618: 20260405_03:15:17[207313.631061]  do_el0_svc_compat+0x24/0x40
    031517.618: 20260405_03:15:17[207313.631070]  el0_svc_compat+0x1c/0x60
    031517.618: 20260405_03:15:17[207313.631079]  el0t_32_sync_handler+0x90/0x11c
    031517.618: 20260405_03:15:17[207313.631087]  el0t_32_sync+0x14c/0x150
    031517.618: 20260405_03:15:17[207313.631104] Code: f9400c03 f9400404 eb03003f 540000a0 (f85f0065) 
    031517.744: 20260405_03:15:17[207571.891691] ---[ end trace 0000000000000000 ]---
    031517.744: 20260405_03:15:17[207572.047401] Kernel panic - not syncing: Oops: Fatal exception
    031519.346: 20260405_03:15:19[207572.057937] SMP: stopping secondary CPUs
    031519.346: 20260405_03:15:19[207573.154945] SMP: failed to stop secondary CPUs 0-1
    031519.346: 20260405_03:15:19[207573.154964] Dumping ftrace buffer:
    031519.346: 20260405_03:15:19[207573.154975]    (ftrace buffer empty)
    031519.346: 20260405_03:15:19[207573.154989] Kernel Offset: disabled
    031519.346: 20260405_03:15:19[207573.154992] CPU features: 0x000000,00800004,0000400b
    031519.346: 20260405_03:15:19[207573.154998] Memory Limit: none
    031519.346: 20260405_03:15:19[207573.155004] task:systemd         state:S stack:11408 pid:1     ppid031519.346: :0      flags:0x00000004
    031519.346: 20260405_03:15:19[207573.155031] Call trace:
    031519.346: 20260405_03:15:19[207573.155035]  __switch_to+0xc0/0x100
    031519.346: 20260405_03:15:19[207573.155059]  __schedule+0x2a4/0x760
    031519.346: 20260405_03:15:19[207573.155069]  schedule+0x5c/0xd0
    031519.346: 20260405_03:15:19[207573.155078]  schedule_hrtimeout_range_clock+0xb4/0x154
    031519.346: 20260405_03:15:19[207573.155088]  schedule_hrtimeout_range+0x18/0x20
    031519.346: 20260405_03:15:19[207573.155096]  do_epoll_wait+0x540/0x620
    031519.346: 20260405_03:15:19[207573.155109]  do_compat_epoll_pwait.part.0+0x18/0x90
    031519.346: 20260405_03:15:19[207573.155119]  __arm64_sys_epoll_pwait+0x74/0x110
    031519.346: 20260405_03:15:19[207573.155128]  invoke_syscall+0x4c/0x114
    031519.346: 20260405_03:15:19[207573.155141]  el0_svc_common.constprop.0+0x64/0x11c
    031519.346: 20260405_03:15:19[207573.155150]  do_el0_svc+0x24/0x30
    031519.346: 20260405_03:15:19[207573.155158]  el0_svc+0x1c/0x60
    031519.346: 20260405_03:15:19[207573.155166]  el0t_64_sync_handler+0xb4/0x130
    031519.346: 20260405_03:15:19[207573.155175]  el0t_64_sync+0x148/0x14c
    031519.346: 20260405_03:15:19[207573.155186] task:kthreadd        state:S stack:13776 pid:2     ppid031519.346: :0      flags:0x00000008
    031519.346: 20260405_03:15:19[207573.155208] Call trace:
    031519.346: 20260405_03:15:19[207573.155211]  __switch_to+0xc0/0x100
    031519.346: 20260405_03:15:19[207573.155222]  __schedule+0x2a4/0x760
    031519.346: 20260405_03:15:19[207573.155231]  schedule+0x5c/0xd0
    031519.346: 20260405_03:15:19[207573.155241]  kthreadd+0x1c0/0x200
    031519.346: 20260405_03:15:19[207573.155252]  ret_from_fork+0x10/0x20
    031519.346: 20260405_03:15:19[207573.155263] task:rcu_gp          state:I stack:15344 pid:3     ppid031519.346: :2      flags:0x00000008
    031519.346: 20260405_03:15:19[207573.155303] Call trace:
    031519.346: 20260405_03:15:19[207573.155306]  __switch_to+0xc0/0x100
    031519.346: 20260405_03:15:19[207573.155317]  __schedule+0x2a4/0x760
    031519.346: 20260405_03:15:19[207573.155327]  schedule+0x5c/0xd0
    031519.346: 20260405_03:15:19[207573.155336]  rescuer_thread+0x2a0/0x3d0
    031519.346: 20260405_03:15:19[207573.155348]  kthread+0x11c/0x120
    031519.346: 20260405_03:15:19[207573.155356]  ret_from_fork+0x10/0x20
    031519.346: 20260405_03:15:19[207573.155366] task:rcu_par_gp      state:I stack:15344 pid:4     ppid031519.346: :2      flags:0x00000008
    ......
    031540.018: 20260405_03:15:40[207573.177408] 
    031540.018: 20260405_03:15:40[207573.177409] Tick Device: mode:     1
    031540.018: 20260405_03:15:40[207573.177412] Per CPU device: 1
    031540.018: 20260405_03:15:40[207573.177414] Clock Event Device: 
    031540.018: 20260405_03:15:40[207573.177416] arch_sys_timer
    031540.018: 20260405_03:15:40[207573.177419]  max_delta_ns:   1374389514240
    031540.018: 20260405_03:15:40[207573.177422]  min_delta_ns:   1000
    031540.018: 20260405_03:15:40[207573.177424]  mult:           13421773
    031540.018: 20260405_03:15:40[207573.177427]  shift:          26
    031540.018: 20260405_03:15:40[207573.177429]  mode:           3
    031540.018: 20260405_03:15:40[207573.177431]  next_event:     207313147000000 nsecs
    031540.018: 20260405_03:15:40[207573.177434]  set_next_event: arch_timer_set_next_event_phys
    031540.018: 20260405_03:15:40[207573.177443]  shutdown:       arch_timer_shutdown_phys
    031540.018: 20260405_03:15:40[207573.177451]  oneshot stopped: arch_timer_shutdown_phys
    031540.018: 20260405_03:15:40[207573.177459]  event_handler:  hrtimer_interrupt
    031540.018: 20260405_03:15:40[207573.177466] 
    031540.018: 20260405_03:15:40[207573.177468]  retries:        2
    031540.018: 20260405_03:15:40[207573.177470] Wakeup Device: <NULL>
    031540.018: 20260405_03:15:40[207573.177472] 
    031540.018: 20260405_03:15:40[207573.177480] Dumping ftrace buffer:
    031540.033: 20260405_03:15:40[207573.177490]    (ftrace buffer empty)
    
    Statistical Results:
    1. Total Test Count: 450 times
    2. Total Run Time: From 2026-01-23 14:20:04 to 2026-01-26 05:53:59
    3. Cumulative Duration: 2 days 15 hours 33 minutes 55 seconds
    4. Converted to approx. 63.57 hours
    5. 192.168.1.100-[192.168.1.100-20260123_14-19-01].log
    6. 192.168.1.100-[192.168.1.100-20260124_00-00-01].log
    7. [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: ------------[ cut here ]------------
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: WARNING: CPU: 0 PID: 10371 at rcu_note_context_switch+0x3dc/0x410
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: Modules linked in: cdns3 cdns_usb_common irq_pruss_intc icssg_prueth pru_rproc icss_iep cdns3_ti ti_k3_r5_remoteproc pruss omap_mailbox mwdriver(O)
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: CPU: 0 PID: 10371 Comm: memtester Tainted: G           O       6.1.127-rt48-senux #1
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: Hardware name: P3Y EP based on Texas Instruments AM642 EVM (DT)
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: pc : rcu_note_context_switch+0x3dc/0x410
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: lr : rcu_note_context_switch+0xec/0x410
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: sp : ffff80001054ba90
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x29: ffff80001054ba90 x28: ffff80000909f000 x27: ffff8000090a12a8
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x26: ffff000001ff0e40 x25: ffff00000827b4c0 x24: ffff000001ff0e40
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x23: 0000000000000001 x22: ffff000001ff0e40 x21: ffff800008f51ed0
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x20: ffff800008e97800 x19: ffff00001cd71f00 x18: 0000000000000000
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 00000000128dd2a0
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x14: 0000000000000000 x13: 38342020676e6974 x12: 7465730808080808
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x11: 0000000000000040 x10: ffff800008e1ca40 x9 : ffff80000884be10
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x8 : 02de171c99ade72c x7 : 02de84ba21fc1294 x6 : 0000000002e62f9f
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x5 : 0000000000000000 x4 : ffff800010547fff x3 : 0000000000000000
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: x2 : 0000000000000001 x1 : ffff000001ff1200 x0 : ffff000001ff1290
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: Call trace:
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  rcu_note_context_switch+0x3dc/0x410
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  __schedule+0x98/0x684
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  preempt_schedule+0x70/0x170
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  migrate_enable+0x14c/0x160
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  rt_spin_unlock+0x1c/0x70
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  remove_wait_queue+0x48/0x54
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  n_tty_write+0x414/0x470
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  file_tty_write.constprop.0+0x11c/0x2b0
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  tty_write+0x18/0x20
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  vfs_write+0x228/0x2b4
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  ksys_write+0x6c/0xf4
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  __arm64_sys_write+0x20/0x2c
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  invoke_syscall+0x4c/0x114
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  el0_svc_common.constprop.0+0xf8/0x11c
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  do_el0_svc+0x24/0x30
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  el0_svc+0x1c/0x60
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  el0t_64_sync_handler+0xb4/0x130
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel:  el0t_64_sync+0x148/0x14c
      [192.168.1.100-20260125_12:10:27]Jan 25 04:19:00 p3y kernel: ---[ end trace 0000000000000000 ]---
      
    8. 192.168.1.100-[192.168.1.100-20260125_00-00-14].log
    9. 192.168.1.100-[192.168.1.100-20260126_00-00-41].log
  • All four memtester logs do not reveal any issues.  Were these run across your expected operating temperature?  Typically, temp extremes will help reveal marginality issues.  Try running memtester and your lowest and highest operating temp for an extended period of time.

    Can you provide your DDR datasheet and your DDR configuration (.dtsi and .syscfg) files from the DDR Register Configuration tool that you are using in your code?  I can try to review the configuration for any errors.

    Regards,

    James

  • We have tested the problematic boards at both high temperature of 65° and low temperature of -35°, and found no issues. For the non-problematic boards, we also conducted long-term pressure tests at both high and low temperatures, running for two rounds. The first round lasted for 4 days, and the second round for 3 days.

    first round.zipsecond round.zip

     Additionally, we have attached the current DTSI file in use.  

    currently in use.zip

    We have simultaneously created many different files in an attempt to accelerate the resolution of the deteriorating problems, but after updating these DDR configurations, there has been no significant change. Therefore, at present, we cannot determine whether it is related to the configuration.

  • Can you post the latest configuration that is being used on the 4 boards that have been running for 8 days that we talked about on the call?  I would like to concentrate the testing on this configuration.  The original DDR config was generated with v0.8.10 of the tool which is several years old, and since then there have been several updates and optimizations to increase the robustness of the generated configuration files.  I believe the error that was shown on the call may be addressed by one of the tool updates.   I would also suggest adding a few more boards to test with this new DDR configuration, running those at temp extremes to further stress the configuration.

    Regards,

    James