This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA2SX: how to locate the question and find the solution? SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Part Number: TDA2SX


Hi team,

Here's an issue from the customer may need your help:

When App is running, and one gate link(on ipu) is turned on,  the log printed as below:

SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Dose that mean the ipc link issue? And what dose the error "link has been severed" indicate?  Is there any documents could help elaborate on the ipc issue?

Environment: TDA2SX, 2G DDR, VSDK, customer made use case;

Usecase txt:


Select_xxx_only -> Gate_xxx -> Alg_FrameCopy (A15) -> Dup_xxx(A15) -> Alg_Arcxxx (DSP2) -> Merge_dsp (DSP1)

Dup_xxx(A15) -> Alg_Arcxxx2 (DSP1) -> Merge_dsp (DSP1)

log as below :

[   67.894541] omap-iommu 40d01000.mmu: iommu fault: da 0x50d05000 flags 0x0
[   67.895417]  remoteproc1: crash detected in 40800000.dsp: type mmufault
[   67.896275] omap-iommu 40d01000.mmu: 40d01000.mmu: errs:0x00000002 da:0x50d05000 pgd:0xec26d434 *pgd:px00000000
[   67.897573] omap-iommu 41501000.mmu: iommu fault: da 0x50d06e00 flags 0x0
[   67.898438]  remoteproc2: crash detected in 41000000.dsp: type mmufault
[   67.899285] omap-iommu 41501000.mmu: 41501000.mmu: errs:0x00000002 da:0x50d06e00 pgd:0xecd39434 *pgd:px00000000
[   67.900788]  remoteproc1: handling crash #1 in 40800000.dsp
[   67.901501]  remoteproc1: recovering 40800000.dsp
enter device.cpp, func: deviceTypeParse, at line: 367.
device message recv from kernel: remove@/devices/platform/44000000.ocp/40800000.dsp/remoteproc1/virtio1/rpmsg1.
[HOST] [HOST  ]     70.149301 s:  SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Could you help resolve this case? Thanks.

Best Regards,

Cherry

  • Hi,

    Quick updates:

    With the progress on development of App software, it is found that OS kernel, driver and base module would influence the app performance, such as memory map, mmu, inter processor communication, etc.
    And since there are many cores in tda2x Soc platform, systematic thinking method is introduced in problem-solving procedure. i.e. now the customer take tda2x as a system consists of different cores, such as DSP,IPU,CPU,GPU and they share some common resource and sometimes there are resource conflicts which will influence the performance, and these cores should maintrain a common reference such as memory map, if the memory map which they got is not consistent, the fault would occur.

    Thanks and regards,

    Cherry

  • Hi,

    May I know is there any updates?

    Thanks and regards,

    Cherry

  • Hi,

    Here's something that found by the customer:

    The customer has checked almost all related questions on TI website and forum, it's found that this issue is quite common.  While the answer is often like "illegal memory access", such as read NULL pointer, free pointer twice, etc.
    In this case, the error occurs under some random modes, sometimes error happens and in other times, the app runs properly. So it seems that it's not a illegal memory access issue.

    And one post pointed out that when SR0 memory space is enlarged, the mmu fault issue disappeared. But they are not sure whether this solution is suitable for this case and what's the mechanism of it. 

    Thanks and regards,

    Cherry

  • Hi Cherry,

    The message happens when there is a remoteproc crash and the remoteproc has gone through error recovery. The existing userspace handles that were used for communicating to remote processors is no longer valid after a crash and the handles are marked as errored out, resulting in the above trace.

    The remoteproc error recovery mechanism is designed to provide some debug information and perform recovery of the remote processor, but the root-cause of the crash needs to be analyzed and fixed within the firmware. 

    The above indicates that there is an MMU fault errors on both DSPs and showing up at addresses 0x50d05000 and 0x50d06e00 respectively. What peripherals have you mapped at these addresses. If this is ISP related region, you need to make sure that the corresponding sub-module is powered ON.

    regards

    Suman

  • Hello,
    Thanks for your reply and kindly analysis on the question. I'm the guy who posted the question which transfered here.

    and I've checked 0x50d05000 and  0x50d06e00 address, but found no function or module related with these addresses. So it's hard to say which sub-module is out of running.

    below is a snapshot of memory map:

    .../binaries/apps_jh6/tda2xx_evm_linux_all/vision_sdk/bin/tda2xx-evm/vision_sdk_c66xdsp_1_release.xe66.map

    MEMORY CONFIGURATION

             name            origin    length      used     unused   attr    fill
    ----------------------  --------  ---------  --------  --------  ----  --------
      L2SRAM                00800000   00038000  000377c0  00000840  RW X
      OCMC_RAM1             40300000   00080000  00000000  00080000  RW X
      OCMC_RAM2             40400000   00100000  00000000  00100000  RW X
      OCMC_RAM3             40500000   00100000  00000000  00100000  RW X
      DSP1_L2_SRAM          40800000   00048000  00000000  00048000  RWIX
      DSP2_L2_SRAM          41000000   00048000  00000000  00048000  RWIX
      NDK_MEM               84000000   00200000  00000000  00200000  RWIX

    Since this mmu fault occurs randomly, it seems that it's hard to make some trouble shooting.

    thanks and best rgds.

    henry