Other Parts Discussed in Thread: AM5718, DRA722, TCA9535, PCA9535, PMP, ADS7950
Hi,
Here is the issue customer met and the debug process.
1. Development environment description:
Kernel version: linux-rt-4.19.94
SDK version: processor_sdk_rtos_am57xx_08_01_00_09
2. Problem scenarios:
In the recent period, we have encountered two problems where the DSP and M4 core failed to start during the startup process of AM5716.
When I encountered the problem for the first time, I found through the application log that the M4 core IPC could not communicate. I tried to find error information in the trace log, but found that no trace log was generated.

If it is suspected that the M4 core fails to start, check whether the M4 core firmware file size is normal.

Kill the lad_dra7xx process, manually execute unbind and bind and try to pull up the M4 core again. It was found that the M4 core can start normally.

3. Analysis process:
Part of the dmesg information when the problem occurred is as follows:

Part of the dmesg information after the faulty device is restarted is as follows:

DSP and M4 core firmware are stored in the /lib/firmware directory of the Linux file system. By comparing the two startup logs, it is suspected that when the problem occurred, the root file system had not been mounted, causing the real-time core to fail to load the firmware.
When encountering the problem for the second time, the trace log was not generated. The initial judgment was that the DSP core failed to start. The difference is that the scene tried to pull up the DSP core again, but it still couldn't. Since the device has been restarted, the dmesg information is not retained, only lad.txt generated by the lad_dra7xx process is retained.
0.446170]
Initializing LAD... [0.446524] NameServer_setup: entered, refCount=0
[0.446582] NameServer_setup: creating listener thread
[0.446699] NameServer_setup: exiting, refCount=1
[0.446774] NameServer_create(): 'GateMP'
[0.446799]
opening FIFO: /tmp/LAD/LADCMDS
[0.449104] listener_cb: Entered Listener thread.
[0.449124] NameServer: waiting for unblockFd: 0, and socks: maxfd: 0
[5.467336] Retrieving command...
[5.467621]
LAD_CONNECT:
[5.467634] client FIFO name = /tmp/LAD/689
[5.467641] client PID = 689
[5.467648] assigned client handle = 0
[5.467680] FIFO /tmp/LAD/689 created
[5.467724]
WARN: Client /tmp/LAD/689 has not yet opened, will retry
[5.467965] FIFO /tmp/LAD/689 opened for writing
[5.468005] sent response
[5.468013] DONE
[5.468019] Retrieving command...
[5.468133] Sending response...
[5.468153] Retrieving command...
[5.468208] LAD_MULTIPROC_GETCONFIG: calling MultiProc_getConfig()...
[5.468221] MultiProc_getConfig() - 5 procs
[5.468227] # processors in cluster: 5
[5.468233] cluster baseId: 0
[5.468238] ProcId 0 - "HOST"
[5.468244] ProcId 1 - "IPU2"
[5.468250] ProcId 2 - "IPU1"
[5.468256] ProcId 3 - "DSP2"
[5.468261] ProcId 4 - "DSP1"
[5.468267] status = 0
[5.468273] DONE
[5.468278] Sending response...
[5.468292] Retrieving command...
[5.468347] LAD_NAMESERVER_SETUP: calling NameServer_setup()...
[5.468357] NameServer_setup: entered, refCount=1
[5.468364] NameServer_setup: already setup
[5.468369] NameServer_setup: exiting, refCount=2
[5.468375] status = 1
[5.468381] DONE
[5.468386] Sending response...
[5.468400] Retrieving command...
[5.468451] LAD_MESSAGEQ_GETCONFIG: calling MessageQ_getConfig()...
[5.468461] status = 0
[5.468467] DONE
[5.468472] Sending response...
[5.468486] Retrieving command...
[5.468536] LAD_MESSAGEQ_SETUP: calling MessageQ_setup()...
[5.468546] MessageQ_setup: entered, refCount=0
[5.468553] NameServer_create(): 'MessageQ'
[5.468564] MessageQ_setup: exiting, refCount=1
[5.468571] status = 0
[5.468576] DONE
[5.468581] Sending response...
[5.468596] Retrieving command...
[5.468876] NameServer_attach: --> procId=1, refCount=0
[5.468905] NameServer_attach: created send socket: 5, procId 1
[5.468935] NameServer_attach: connect failed: procId=1, errno=22 (Invalid argument)
[5.468953] closing send socket: 5
[5.468979] NameServer_attach: <-- refCount=0, status=-1
[5.468987] Sending response...
[5.469005] Retrieving command...
[5.469055] NameServer_attach: --> procId=2, refCount=0
[5.469074] NameServer_attach: created send socket: 5, procId 2
[5.469085] NameServer_attach: connect failed: procId=2, errno=22 (Invalid argument)
[5.469094] closing send socket: 5
[5.469113] NameServer_attach: <-- refCount=0, status=-1
[5.469121] Sending response...
[5.469137] Retrieving command...
[5.469184] NameServer_attach: --> procId=3, refCount=0
[5.469202] NameServer_attach: created send socket: 5, procId 3
[5.469212] NameServer_attach: connect failed: procId=3, errno=22 (Invalid argument)
[5.469221] closing send socket: 5
[5.469240] NameServer_attach: <-- refCount=0, status=-1
[5.469248] Sending response...
[5.469263] Retrieving command...
[5.469310] NameServer_attach: --> procId=4, refCount=0
[5.469327] NameServer_attach: created send socket: 5, procId 4
[5.469337] NameServer_attach: connect failed: procId=4, errno=22 (Invalid argument)
[5.469346] closing send socket: 5
[5.469364] NameServer_attach: <-- refCount=0, status=-1
[5.469373] Sending response...
[5.469387] Retrieving command...
[6.111141] LAD_MESSAGEQ_CREATE: calling MessageQ_create(0x2d7b0, 0x2d7d0)...
[6.111162] MessageQ_create: creating 'Master:MsgQ:00'
[6.111173] MessageQ_create: returning obj=0x32640, qid=0x80
[6.111180] status = 0
[6.111187] DONE
[6.111192] Sending response...
[6.111210] Retrieving command...
[6.688838] LAD_MESSAGEQ_ANNOUNCE: calling MessageQ_announce(0x2d7b0, 0x32640)...
[6.688856] MessageQ_announce: announcing 0x32640
[6.688868] NameServer_add: Entered key: 'Master:MsgQ:00', data: 0x80
[6.688877] status = 0
[6.688883] DONE
[6.688888] Sending response...
[6.688927] Retrieving command...
[6.689214] LAD_MESSAGEQ_CREATE: calling MessageQ_create(0x2d7b0, 0x2d7d0)...
[6.689227] MessageQ_create: creating 'Goose:MsgQ:01'
[6.689279] MessageQ_create: returning obj=0x326b0, qid=0x81
[6.689288] status = 0
[6.689294] DONE
[6.689299] Sending response...
[6.689314] Retrieving command...
[6.689586] LAD_MESSAGEQ_ANNOUNCE: calling MessageQ_announce(0x2d7b0, 0x326b0)...
[6.689597] MessageQ_announce: announcing 0x326b0
[6.689605] NameServer_add: Entered key: 'Goose:MsgQ:01', data: 0x81
[6.689612] status = 0
[6.689618] DONE
[6.689623] Sending response...
[6.689638] Retrieving command...
[6.689734] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[6.689747] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[6.689755] NameServer_getRemote: no socket connection to processor 1
[6.689761] NameServer_getRemote: no socket connection to processor 2
[6.689767] NameServer_getRemote: no socket connection to processor 3
[6.689773] NameServer_getRemote: no socket connection to processor 4
[6.689779] value = 0x81
[6.689806] status = -5
[6.689813] DONE
[6.689818] Sending response...
[6.689832] Retrieving command...
[7.690062] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[7.690088] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[7.690096] NameServer_getRemote: no socket connection to processor 1
[7.690103] NameServer_getRemote: no socket connection to processor 2
[7.690109] NameServer_getRemote: no socket connection to processor 3
[7.690115] NameServer_getRemote: no socket connection to processor 4
[7.690120] value = 0x81
[7.690126] status = -5
[7.690132] DONE
[7.690137] Sending response...
[7.690155] Retrieving command...
[8.690389] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[8.690425] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[8.690434] NameServer_getRemote: no socket connection to processor 1
[8.690441] NameServer_getRemote: no socket connection to processor 2
[8.690447] NameServer_getRemote: no socket connection to processor 3
[8.690453] NameServer_getRemote: no socket connection to processor 4
[8.690459] value = 0x81
[8.690465] status = -5
[8.690470] DONE
[8.690476] Sending response...
[8.690492] Retrieving command...
[9.690703] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[9.690728] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[9.690737] NameServer_getRemote: no socket connection to processor 1
[9.690744] NameServer_getRemote: no socket connection to processor 2
[9.690751] NameServer_getRemote: no socket connection to processor 3
[9.690757] NameServer_getRemote: no socket connection to processor 4
[9.690762] value = 0x81
[9.690768] status = -5
[9.690774] DONE
[9.690779] Sending response...
[9.690818] Retrieving command...
[10.691047] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[10.691073] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[10.691082] NameServer_getRemote: no socket connection to processor 1
[10.691089] NameServer_getRemote: no socket connection to processor 2
[10.691096] NameServer_getRemote: no socket connection to processor 3
[10.691102] NameServer_getRemote: no socket connection to processor 4
[10.691108] value = 0x81
[10.691114] status = -5
[10.691119] DONE
[10.691124] Sending response...
[10.691142] Retrieving command...
[11.691368] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[11.691393] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[11.691401] NameServer_getRemote: no socket connection to processor 1
[11.691409] NameServer_getRemote: no socket connection to processor 2
[11.691415] NameServer_getRemote: no socket connection to processor 3
[11.691421] NameServer_getRemote: no socket connection to processor 4
[11.691427] value = 0x81
[11.691432] status = -5
[11.691438] DONE
[11.691443] Sending response...
[11.691461] Retrieving command...
[12.691749] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[12.691776] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[12.691808] NameServer_getRemote: no socket connection to processor 1
[12.691818] NameServer_getRemote: no socket connection to processor 2
[12.691824] NameServer_getRemote: no socket connection to processor 3
[12.691831] NameServer_getRemote: no socket connection to processor 4
[12.691837] value = 0x81
[12.691844] status = -5
[12.691850] DONE
[12.691856] Sending response...
[12.691873] Retrieving command...
[13.692110] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[13.692135] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[13.692143] NameServer_getRemote: no socket connection to processor 1
[13.692151] NameServer_getRemote: no socket connection to processor 2
[13.692157] NameServer_getRemote: no socket connection to processor 3
[13.692162] NameServer_getRemote: no socket connection to processor 4
[13.692169] value = 0x81
[13.692175] status = -5
[13.692180] DONE
4.
Questions:
The background of the first real-time core startup failure is that developers replace applications. The steps are: (1) Kill the running application first; (2) Copy the application to the device through scp under vscode and replace the application; (3) )reboot.
The background of the second real-time core startup failure was that the tester upgraded the application. The steps were: (1) Upload the upgrade package via http on the web page; (2) Decompress the upgrade package; (3) Restart.
The two operations are similar, that is, replacing the application, and it is suspected that the problem is the same. The question is, compared with normal circumstances, why does it take nearly 10 seconds for the root file system to start mounting? And this problem is not easy to reproduce.
Regards
Zekun