This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: prototype board MCU2_0 Crashed when it runs camera app and it seems to be hardware issue

Part Number: TDA4VM
Other Parts Discussed in Thread: PCM3168A, TIDEP-01020

We have built several of TDA4VM boards, some boards' MCU2_0 always crash when it runs camera app in random time(3min  5min or 1hour), some boards run well without any issues(2 day)

it seems MCU2_1 run well even if MCU2_0 crashed.

it seems not to be an software issue, maybe some differences between these boards.

How should I check these boards to find the difference related this issue.

Best regards

Yantao. 


  • =========================
    Demo : Camera Demo
    =========================

    s: Save CSIx, VISS and LDC outputs

    p: Print performance statistics

    x: Exit

    Enter Choice:


    =========================
    Demo : Camera Demo
    =========================

    s: Save CSIx, VISS and LDC outputs

    p: Print performance statistics

    x: Exit

    Enter Choice: [ 4765.976276] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: hrtimer_nanosleep+0x118/0x118
    [ 4765.986780] CPU: 1 PID: 1148 Comm: vx_app_arm_remo Tainted: G O 5.10.41-g4c2eade9f7 #1
    [ 4765.995975] Hardware name: Texas Instruments K3 J721E SoC (DT)
    [ 4766.001789] Call trace:
    [ 4766.004228] dump_backtrace+0x0/0x1a0
    [ 4766.007877] show_stack+0x18/0x68
    [ 4766.011180] dump_stack+0xd0/0x12c
    [ 4766.014567] panic+0x16c/0x334
    [ 4766.017609] __stack_chk_fail+0x30/0x40
    [ 4766.021431] __arm64_sys_nanosleep+0x0/0xd0
    [ 4766.025599] __arm64_sys_nanosleep+0x94/0xd0
    [ 4766.029856] SMP: stopping secondary CPUs
    [ 4766.033772] Kernel Offset: disabled
    [ 4766.037247] CPU features: 0x0040022,20006008
    [ 4766.041500] Memory Limit: none
    [ 4766.044546] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: hrtimer_nanosleep+0x118/0x118 ]---

  • [ 1345.166290] Unable to handle kernel paging request at virtual address ffbb800010ffb990
    [ 1345.174216] Mem abort info:
    [ 1345.176996] ESR = 0x96000004
    [ 1345.180050] EC = 0x25: DABT (current EL), IL = 32 bits
    [ 1345.185352] SET = 0, FnV = 0
    [ 1345.188398] EA = 0, S1PTW = 0
    [ 1345.191530] Data abort info:
    [ 1345.194402] ISV = 0, ISS = 0x00000004
    [ 1345.198228] CM = 0, WnR = 0
    [ 1345.201181] [ffbb800010ffb990] address between user and kernel address ranges
    [ 1345.208303] Internal error: Oops: 96000004 [#1] PREEMPT SMP
    [ 1345.213858] Modules linked in: xfrm_user xfrm_algo md5 ecb aes_neon_bs aes_neon_blk des_generic libdes cbc xhci_plat_hcd xhci_hcd rpmsg_char ti_am335x_adc kfifo_buf cdns3 udc_core roles omap_rng rng_core irq_pruss_intc pru_rproc icss_iep usbcore usb_common crct10dif_ce ti_j721e_cpsw_virt_mac phy_can_transceiver ti_k3_r5_remoteproc ti_am335x_tscadc pruss sa2ul sha512_generic authenc snd_soc_pcm3168a_i2c snd_soc_pcm3168a ti_k3_dsp_remoteproc virtio_rpmsg_bus cdns_dphy vxd_dec bluetooth ecdh_generic pvrsrvkm(O) ecc rfkill videobuf2_dma_sg v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common cdns3_ti rti_wdt sch_fq_codel rpmsg_kdrv_switch cryptodev(O) ipv6
    [ 1345.274194] CPU: 1 PID: 267 Comm: snmpd Tainted: G O 5.10.41-g4c2eade9f7 #1
    [ 1345.282435] Hardware name: Texas Instruments K3 J721E SoC (DT)
    [ 1345.288250] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
    [ 1345.294245] pc : inode_permission+0xd0/0x170
    [ 1345.298502] lr : link_path_walk.part.0+0x2c0/0x358
    [ 1345.303276] sp : ffff800014eefaf0
    [ 1345.306576] x29: ffff800014eefaf0 x28: ffff0008219bb020
    [ 1345.311873] x27: 0000000000000000 x26: fefefefefefefeff
    [ 1345.317170] x25: ffff800014eefca8 x24: ffff0008219bb02b
    [ 1345.322466] x23: ffff000820317e00 x22: 61c8864680b583eb
    [ 1345.327762] x21: 0000000000000000 x20: 0000000000000001
    [ 1345.333058] x19: ffff000840506120 x18: 0000000000000000
    [ 1345.338354] x17: 0000000000000000 x16: 0000000000000000
    [ 1345.343649] x15: 0000000000000000 x14: ffff000820317e00
    [ 1345.348945] x13: 0000000000000000 x12: fefefefefefefeff
    [ 1345.354241] x11: 0000000000000028 x10: 0101010101010101
    [ 1345.359538] x9 : 000000002b1111ea x8 : 7f7f7f7f7f7f7f7f
    [ 1345.364834] x7 : 6f67ff7272606b62 x6 : 0000000000000000
    [ 1345.370130] x5 : 0000000000000064 x4 : 0000000000000015
    [ 1345.375426] x3 : ffff0008404f7558 x2 : 84e4f0484acdc300
    [ 1345.380722] x1 : 0000000000000001 x0 : ffbb800010ffb980
    [ 1345.386019] Call trace:
    [ 1345.388453] inode_permission+0xd0/0x170
    [ 1345.392360] link_path_walk.part.0+0x2c0/0x358
    [ 1345.396788] path_openat+0xb0/0xca8
    [ 1345.400263] do_filp_open+0x78/0x100
    [ 1345.403826] do_sys_openat2+0x1f0/0x2a0
    [ 1345.407647] do_sys_open+0x58/0xa0
    [ 1345.411034] __arm64_sys_openat+0x24/0x30
    [ 1345.415030] el0_svc_common.constprop.0+0x78/0x1a0
    [ 1345.419804] do_el0_svc+0x24/0x90
    [ 1345.423107] el0_svc+0x14/0x20
    [ 1345.426147] el0_sync_handler+0xb0/0xb8
    [ 1345.429969] el0_sync+0x180/0x1c0
    [ 1345.433272] Code: 3100043f 54fffb01 17ffffdd f9401260 (f9400802)
    [ 1345.439349] ---[ end trace 8a4824f10d4d154f ]---

  • Hi Yantao Dong,

    It seems Linux is crashing, not really mcu2_0. could you help me understand how you see issue in mcu2_0? 

    Also since this is new board, did you run memory test on this board and confirmed the memory is fine? There are mem tester tools in PDK as well as on Linux. Can you please run them first? 

    Regards,

    Brijesh

  • Hi Jadav

    Thanks for your reply.

    We have run the memory test via the test tool in Linux, and all passed. 

    it seem the crash are random ,

    sometimes linux  and MCU2_0 crashed with MCU2_1 alive, since we have a thermal diagnostic running on MCU2_1, it can print the IC temperature on the console.

    some time MCU2_1 freezes since the temperature diagnostic don't output anymore.

    some boards are running very good in a long time, but some boards can't .

    we try to find some difference between them but we can't.  

    Still try to look for root cause.

    Any hardware design are not good, such as the power supply etc..?

    Best regards

    Yantao 

  • Hi Yantao,

    We have run the memory test via the test tool in Linux, and all passed. 

    ok, but this typically does not test entire memory. Can you also please test entire memory using PDK example, available in folder ti-processor-sdk-rtos-j721e-evm-07_03_00_07\pdk_jacinto_07_03_00_29\packages\ti\board\diag\mem\.

    Can we also get schematic reviewed with HW team?

    Regards,

    Brijesh

  • Could you please give me a email address to send the schematic, Thanks

  • Hi Yantao Dong,

    Can you please contact your local TI FAE/support team for schematic review? 

    Also can you please run memtester tool to check entire memory?

    Regards,

    Brijesh

  • Hi,  Brijesh

          Thanks for your advice.

          Our local TI FAE and TI FAE@USA have reviewed our schematic before, maybe no special issues about it. 

           According the analysis recent days, the high possible reason of this issue may be the LDDR4 speed.  We can't run the memory test you mentioned above since we don't have the JTAG tools. but we start to reduce the LDDR4 speed from 4266 to 3733. and do the same camera test to see whether the issue still happen or not.

      Our boards are 10lyr 370HR boards and with no back-drilling (no I-speed material with BD for saving cost ), and the layout of DDR is almost copied from the official 10lyr boards of  TIDEP-01020.  Per the simulation result described in《Jacinto7 Reference PCB Designs and Simulations v1.0.pdf》, we think it can not run at 4266Mbps with a good performance. 

      The test need long time to verify our thought above,  if it's the right root cause, we have 3 resolutions below:

      1. Change the LDDR4 freq to 3733Mbps without any HW change

      2. To achieve 4266Mbps, change the material to I-speed without any PCB change 

      3. To achieve 4266Mbps, without material change but add back-drilling

      Hope we can get a good test result to match our assumption.

         

    BR

    Yantao

  • Hi Yantao,

    Thanks for the update.

    So from TI side, are you looking for configuring DDR at 3733MHz? I think one of the older release had 3733MHz DDR. Let me check and share the exact release.

    Regards,

    Brijesh