TDA2SX: how to locate the question and find the solution? SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Cherry Zhou

Mastermind 22235 points

Part Number: TDA2SX

Hi team,

Here's an issue from the customer may need your help:

When App is running, and one gate link(on ipu) is turned on, the log printed as below:

SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Dose that mean the ipc link issue? And what dose the error "link has been severed" indicate? Is there any documents could help elaborate on the ipc issue?

Environment: TDA2SX, 2G DDR, VSDK, customer made use case;

Usecase txt:

Select_xxx_only -> Gate_xxx -> Alg_FrameCopy (A15) -> Dup_xxx(A15) -> Alg_Arcxxx (DSP2) -> Merge_dsp (DSP1)

Dup_xxx(A15) -> Alg_Arcxxx2 (DSP1) -> Merge_dsp (DSP1)

log as below :

[ 67.894541] omap-iommu 40d01000.mmu: iommu fault: da 0x50d05000 flags 0x0

[ 67.895417] remoteproc1: crash detected in 40800000.dsp: type mmufault

[ 67.896275] omap-iommu 40d01000.mmu: 40d01000.mmu: errs:0x00000002 da:0x50d05000 pgd:0xec26d434 *pgd:px00000000

[ 67.897573] omap-iommu 41501000.mmu: iommu fault: da 0x50d06e00 flags 0x0

[ 67.898438] remoteproc2: crash detected in 41000000.dsp: type mmufault

[ 67.899285] omap-iommu 41501000.mmu: 41501000.mmu: errs:0x00000002 da:0x50d06e00 pgd:0xecd39434 *pgd:px00000000

[ 67.900788] remoteproc1: handling crash #1 in 40800000.dsp

[ 67.901501] remoteproc1: recovering 40800000.dsp

enter device.cpp, func: deviceTypeParse, at line: 367.

device message recv from kernel: remove@/devices/platform/44000000.ocp/40800000.dsp/remoteproc1/virtio1/rpmsg1.

[HOST] [HOST ] 70.149301 s: SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!

Could you help resolve this case? Thanks.

Best Regards,

Cherry

over 3 years ago

0 Cherry Zhou over 3 years ago

TI__Mastermind 22235 points

Hi,

Quick updates:

With the progress on development of App software, it is found that OS kernel, driver and base module would influence the app performance, such as memory map, mmu, inter processor communication, etc.
And since there are many cores in tda2x Soc platform, systematic thinking method is introduced in problem-solving procedure. i.e. now the customer take tda2x as a system consists of different cores, such as DSP,IPU,CPU,GPU and they share some common resource and sometimes there are resource conflicts which will influence the performance, and these cores should maintrain a common reference such as memory map, if the memory map which they got is not consistent, the fault would occur.

Thanks and regards,

Cherry

0 Cherry Zhou over 3 years ago in reply to Cherry Zhou

TI__Mastermind 22235 points

Hi,

May I know is there any updates?

Thanks and regards,

Cherry

0 Cherry Zhou over 3 years ago in reply to Cherry Zhou

TI__Mastermind 22235 points

Hi,

Here's something that found by the customer:

The customer has checked almost all related questions on TI website and forum, it's found that this issue is quite common. While the answer is often like "illegal memory access", such as read NULL pointer, free pointer twice, etc.
In this case, the error occurs under some random modes, sometimes error happens and in other times, the app runs properly. So it seems that it's not a illegal memory access issue.

And one post pointed out that when SR0 memory space is enlarged, the mmu fault issue disappeared. But they are not sure whether this solution is suitable for this case and what's the mechanism of it.

Thanks and regards,

Cherry

0 Suman Anna over 3 years ago in reply to Cherry Zhou

TI__Guru** 114375 points

Hi Cherry,

The message happens when there is a remoteproc crash and the remoteproc has gone through error recovery. The existing userspace handles that were used for communicating to remote processors is no longer valid after a crash and the handles are marked as errored out, resulting in the above trace.

The remoteproc error recovery mechanism is designed to provide some debug information and perform recovery of the remote processor, but the root-cause of the crash needs to be analyzed and fixed within the firmware.

The above indicates that there is an MMU fault errors on both DSPs and showing up at addresses 0x50d05000 and 0x50d06e00 respectively. What peripherals have you mapped at these addresses. If this is ISP related region, you need to make sure that the corresponding sub-module is powered ON.

regards

Suman

0 henry o over 3 years ago in reply to Suman Anna

Prodigy 90 points

Hello,
Thanks for your reply and kindly analysis on the question. I'm the guy who posted the question which transfered here.

and I've checked 0x50d05000 and 0x50d06e00 address, but found no function or module related with these addresses. So it's hard to say which sub-module is out of running.

below is a snapshot of memory map:

.../binaries/apps_jh6/tda2xx_evm_linux_all/vision_sdk/bin/tda2xx-evm/vision_sdk_c66xdsp_1_release.xe66.map

MEMORY CONFIGURATION

         name            origin    length      used     unused   attr    fill
---------------------- -------- --------- -------- -------- ---- --------
L2SRAM                00800000   00038000 000377c0 00000840 RW X
OCMC_RAM1             40300000   00080000 00000000 00080000 RW X
OCMC_RAM2             40400000   00100000 00000000 00100000 RW X
OCMC_RAM3             40500000   00100000 00000000 00100000 RW X
DSP1_L2_SRAM          40800000   00048000 00000000 00048000 RWIX
DSP2_L2_SRAM          41000000   00048000 00000000 00048000 RWIX
NDK_MEM               84000000   00200000 00000000 00200000 RWIX

Since this mmu fault occurs randomly, it seems that it's hard to make some trouble shooting.

thanks and best rgds.

henry

Processors

Processors forum

TDA2SX: how to locate the question and find the solution? SYSTEM: IPC: [DSP1] Notify recvfrom failed (Link has been severed, 67) !!!