I'm struggling with getting IPC working with the DSP. It seems like I'm on the right track, but I feel that I've hit sort of a roadblock with respect to finding more detailed documentation for troubleshooting. I am using SDK 02.00.02.11 and the unmodified DSP firmware and IPC libraries. Below is what I have been able to validate so far:
At startup, the DSP seems to be coming up and its firmware is loaded.
# dmesg | grep dsp [ 0.000000] Reserved memory: initialized node dsp1_cma@99000000, compatible id shared-dma-pool [ 20.689664] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@99000000 [ 20.689692] remoteproc2: 40800000.dsp is available [ 22.837667] remoteproc2: powering up 40800000.dsp [ 22.837680] remoteproc2: Booting fw image dra7-dsp1-fw.xe66, size 21365636 [ 22.843174] omap_hwmod: mmu0_dsp1: _wait_target_disable failed [ 22.943430] remoteproc2: remote processor 40800000.dsp is now up
The message: "_wait_target_disable failed" is a little disconcerting; but the last line shows the DSP is up, so perhaps it is not an issue...
I have created CMA memory pools for the two IPU's and the DSP per the am57xx-idk DTS example:
# dmesg | grep CMA [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000095800000, size 56 MiB [ 0.000000] Reserved memory: created CMA memory pool at 0x0000000099000000, size 64 MiB [ 0.000000] Reserved memory: created CMA memory pool at 0x000000009d000000, size 32 MiB
# dmesg | grep cma [ 0.000000] Reserved memory: initialized node ipu2_cma@95800000, compatible id shared-dma-pool [ 0.000000] Reserved memory: initialized node dsp1_cma@99000000, compatible id shared-dma-pool [ 0.000000] Reserved memory: initialized node ipu1_cma@9d000000, compatible id shared-dma-pool [ 0.000000] cma: Reserved 128 MiB at 0x00000000b7c00000 [ 0.000000] Memory: 543704K/848896K available (6251K kernel code, 294K rwdata, 2212K rodata, 320K init, 254K bss, 18472K reserved, 286720K cma-reserved, 136192K highmem) [ 20.688756] omap-rproc 58820000.ipu: assigned reserved memory node ipu1_cma@9d000000 [ 20.689495] omap-rproc 55020000.ipu: assigned reserved memory node ipu2_cma@95800000 [ 20.689664] omap-rproc 40800000.dsp: assigned reserved memory node dsp1_cma@99000000
I am also making sure that the CMEM kernel module is loaded with:
# dmesg | grep cmem [ 1661.071894] cmemk initialized
And am succeeding in creating a contiguous memory partitions for use with the DSP.
An 80MB CMEM partition is created per below:
# cat /proc/iomem | grep CMEM a0000000-a4ffffff : CMEM
And it looks like that memory is available for use:
# cat /proc/cmem Block 0: Pool 0: 1 bufs size 0x5000000 (0x5000000 requested) Pool 0 busy bufs: Pool 0 free bufs: id 0: phys addr 0xa0000000
Below is a log trace from the LAD log after starting the LAD daemon with:
# lad_dra7xx -l lad.txt -p 777 Opened log file: lad.txt Set LAD's directory permissions to '777' numProcessors = 5 id = 0 baseId = 0 Spawned daemon: lad_dra7xx
Tail the LAD log:
# tail -f /tmp/LAD/lad.txt [0.802920] Initializing LAD... [0.804143] opening FIFO: /tmp/LAD/LADCMDS
And run one of the opencl examples such as:
# /usr/share/ti/examples/opencl/platforms/platforms
At this point the example program hangs until ^C, and in another terminal I have the following LAD log output. The -5 status code output is printed many times prior to the disconnect at the end of the trace.
Expand the source section below to view the LAD log trace:
[0.693855] Initializing LAD... [0.696416] NameServer_setup: entered, refCount=0 [0.696471] NameServer_setup: creating listener thread [0.696543] NameServer_setup: exiting, refCount=1 [0.696629] NameServer_create(): 'GateMP' [0.696647] opening FIFO: /tmp/LAD/LADCMDS [0.698442] listener_cb: Entered Listener thread. [0.698462] NameServer: waiting for unblockFd: 0, and socks: maxfd: 0 [17015278.660230] Retrieving command... [17015278.660586] LAD_CONNECT: [17015278.660600] client FIFO name = /tmp/LAD/1585 [17015278.660607] client PID = 1585 [17015278.660614] assigned client handle = 0 [17015278.660873] FIFO /tmp/LAD/1585 created [17015278.661349] FIFO /tmp/LAD/1585 opened for writing [17015278.661381] sent response [17015278.661389] DONE [17015278.661396] Retrieving command... [17015278.661464] Sending response... [17015278.661477] Retrieving command... [17015278.661510] LAD_MULTIPROC_GETCONFIG: calling MultiProc_getConfig()... [17015278.661522] MultiProc_getConfig() - 5 procs [17015278.661530] # processors in cluster: 5 [17015278.661536] cluster baseId: 0 [17015278.661543] ProcId 0 - "HOST" [17015278.661550] ProcId 1 - "IPU2" [17015278.661557] ProcId 2 - "IPU1" [17015278.661563] ProcId 3 - "DSP2" [17015278.661570] ProcId 4 - "DSP1" [17015278.661577] status = 0 [17015278.661583] DONE [17015278.661589] Sending response... [17015278.661598] Retrieving command... [17015278.661637] LAD_NAMESERVER_SETUP: calling NameServer_setup()... [17015278.661647] NameServer_setup: entered, refCount=1 [17015278.661654] NameServer_setup: already setup [17015278.661660] NameServer_setup: exiting, refCount=2 [17015278.661666] status = 1 [17015278.661673] DONE [17015278.661679] Sending response... [17015278.661688] Retrieving command... [17015278.661716] LAD_MESSAGEQ_GETCONFIG: calling MessageQ_getConfig()... [17015278.661725] status = 0 [17015278.661732] DONE [17015278.661738] Sending response... [17015278.661746] Retrieving command... [17015278.661773] LAD_MESSAGEQ_SETUP: calling MessageQ_setup()... [17015278.661781] MessageQ_setup: entered, refCount=0 [17015278.661789] NameServer_create(): 'MessageQ' [17015278.661800] MessageQ_setup: exiting, refCount=1 [17015278.661807] status = 0 [17015278.661813] DONE [17015278.661819] Sending response... [17015278.661828] Retrieving command... [17015278.661977] NameServer_attach: --> procId=1, refCount=0 [17015278.677036] NameServer_attach: created send socket: 6, procId 1 [17015278.677084] NameServer_attach: connect failed: procId=1, errno=22 (Invalid argument) [17015278.677108] closing send socket: 6 [17015278.677128] NameServer_attach: <-- refCount=0, status=-1 [17015278.677137] Sending response... [17015278.677154] Retrieving command... [17015278.677204] NameServer_attach: --> procId=2, refCount=0 [17015278.677220] NameServer_attach: created send socket: 6, procId 2 [17015278.677230] NameServer_attach: connect failed: procId=2, errno=22 (Invalid argument) [17015278.677240] closing send socket: 6 [17015278.677255] NameServer_attach: <-- refCount=0, status=-1 [17015278.677263] Sending response... [17015278.677275] Retrieving command... [17015278.677303] NameServer_attach: --> procId=3, refCount=0 [17015278.677317] NameServer_attach: created send socket: 6, procId 3 [17015278.677326] NameServer_attach: connect failed: procId=3, errno=22 (Invalid argument) [17015278.677335] closing send socket: 6 [17015278.677348] NameServer_attach: <-- refCount=0, status=-1 [17015278.677355] Sending response... [17015278.677367] Retrieving command... [17015278.677393] NameServer_attach: --> procId=4, refCount=0 [17015278.677407] NameServer_attach: created send socket: 6, procId 4 [17015278.677416] NameServer_attach: connect failed: procId=4, errno=22 (Invalid argument) [17015278.677425] closing send socket: 6 [17015278.677436] NameServer_attach: <-- refCount=0, status=-1 [17015278.677444] Sending response... [17015278.677456] Retrieving command... [17015278.677492] LAD_GATEMP_ISSETUP: calling GateMP_isSetup()... [17015278.677501] status = 0 [17015278.677508] DONE [17015278.677527] Sending response... [17015278.677539] Retrieving command... [17015278.677571] LAD_GATEHWSPINLOCK_GETCONFIG: calling GateHWSpinlock_getConfig()... [17015278.677580] GateHWSpinlock_getConfig() baseAddr = 0x4a0f6000 [17015278.677587] size = 0x1000 offset = 0x800 [17015278.677594] status = 0 [17015278.677601] DONE [17015278.677607] Sending response... [17015278.677617] Retrieving command... [17015278.677734] LAD_GATEMP_START: calling GateMP_start()... [17015278.677746] status = 0 [17015278.677753] DONE [17015278.677759] Sending response... [17015278.677770] Retrieving command... [17015278.677796] LAD_GATEMP_GETNUMRESOURCES: calling GateMP_getNumResources()... [17015278.677805] status = 0 [17015278.677812] DONE [17015278.677817] Sending response... [17015278.677828] Retrieving command... [17015278.677865] LAD_NAMESERVER_GET: calling NameServer_get(0x2f230, '_GateMP_TI_dGate'[17015278.677875] )... [17015278.677885] NameServer_getLocal: entry key: '_GateMP_TI_dGate' not found! [17015278.677893] NameServer_getRemote: no socket connection to processor 1 [17015278.677901] NameServer_getRemote: no socket connection to processor 2 [17015278.677908] NameServer_getRemote: no socket connection to processor 3 [17015278.677914] NameServer_getRemote: no socket connection to processor 4 [17015278.677921] value = 0x10 [17015278.677927] status = -5 [17015278.677934] DONE [17015278.677939] Sending response... [17015278.677950] Retrieving command... [17015278.678171] LAD_MESSAGEQ_CREATE: calling MessageQ_create(0x2ead8, 0x2eaf8)... [17015278.678185] MessageQ_create: creating 'OCL:MsgQ:1585' [17015278.678198] NameServer_add: Entered key: 'OCL:MsgQ:1585', data: 0x80 [17015278.678206] MessageQ_create: returning obj=0x2f658, qid=0x80 [17015278.678214] status = 0 [17015278.678220] DONE [17015278.678226] Sending response... [17015278.678239] Retrieving command... [17015278.678285] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x2f570, 'OCL:DSP1:MsgQ')... [17015278.678296] NameServer_getLocal: entry key: 'OCL:DSP1:MsgQ' not found! [17015278.678304] NameServer_getRemote: no socket connection to processor 1 [17015278.678346] NameServer_getRemote: no socket connection to processor 2 [17015278.678353] NameServer_getRemote: no socket connection to processor 3 [17015278.678360] NameServer_getRemote: no socket connection to processor 4 [17015278.678366] value = 0x80 [17015278.678372] status = -5 [17015278.678378] DONE ... [17243489.757855] Sending response... [17243489.760522] Retrieving command... [17243489.760624] LAD_MESSAGEQ_DESTROY: calling MessageQ_destroy()... [17243489.760634] MessageQ_destroy: entered, refCount=1 [17243489.760643] MessageQ_delete: deleting 0x2f658 [17243489.760660] MessageQ_delete: returning 0 [17243489.760673] MessageQ_destroy: exiting, refCount=0 [17243489.760680] status = 0 [17243489.760687] DONE [17243489.760694] Sending response... [17243489.760704] Retrieving command... [17243489.763787] EOF detected on FIFO, closing FIFO: /tmp/LAD/LADCMDS [17243489.763839] opening FIFO: /tmp/LAD/LADCMDS
I found the following IPC troubleshooting FAQ:
processors.wiki.ti.com/.../IPC_3.x_FAQ
This is because LAD is looking for the default GateMP instance, and did not find it. On the DRA7xx, verify
- SharedRegion 0 is defined on DSP1. GateMP implementation on the host requires use of SharedRegion
- DSP1 is the owner of SharedRegion 0
- Ipc_start() is called on DSP1. This ensures the default GateMP instance is created.
At the end of the log I am getting an error that is close to the one shown in the link above, but I don’t know how to proceed with the verification steps outlined in the FAQ.
These initial errors appear to be more like:
processors.wiki.ti.com/.../IPC_3.x_FAQ
Internally, LAD tries to connect to sockets created by the rpmsg_proto kernel driver. This connection can fail for several reasons, so this is a common "general" error code. Some examples that generate this error condition are below.
- Failure to load the slave image. If the slave executable fails to load, the sockets to communicate with it are not created. Make sure the slaves you need to communicate with are already loaded.
- Failure to provide the correct KERNEL_INSTALL_DIR when building the user-side IPC libraries. Some customers reported this error when using Linux 3.9+ kernels (where the AF_MAX value in socket.h has increased). This created a mismatch in what AF_RPMSG was set to in user libs and the kernel, and the connect call failed. The solution was to require users set KERNEL_INSTALL_DIR when building the user-space libraries so IPC can interrogate the kernel version and set AF_RPMSG appropriately.
- Failure to configure the slave image correctly. Internally, the slave-side MessageQ transport (TransportRpmsg) broadcasts its availability to the Linux host. If this broadcast doesn't occur (e.g., b/c the slave wasn't configured with TransportRpmsg), the socket connection will fail with error 22.
- General slave-to-host interrupt failure. Some devices (e.g. DRA7XX) have interrupt crossbars that must be configured correctly as part of system boot (e.g. uboot). If these crossbars aren't initialized correctly, the slave-side broadcast that TransportRpmsg is available won't be received, the underlying socket connection won't be available, and the first sign of failure will be LAD trying to connect to the socket.
Any help or advice that I could get to test the following functionality of the DSP would be much appreciated, e.g.:
- If the slave executable fails to load, the sockets to communicate with it are not created,
- How can I determine if the slave executable is loaded?
- How can I verify that there is no "mismatch in what AF_RPMSG was set to in user libs and the kernel"?
- How can I ensure the slave is configured with: TransportRpmsg?
- And lastly, this seems to be the most plausible issue: how can I ensure that the interrupt crossbar is configured correctly?
Thanks,
Weston