This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IPC_start fails with Nameserver issue

I have been trying to get the IPC demo working on the keystone2 and I have been having some difficulty in getting MessageQApp to work.  

My procedure from a reboot is the following:

1) Start the DSP

root@keystone-evm:/lib/firmware# mpmcl reset dsp0
reset succeeded
root@keystone-evm:/lib/firmware# mpmcl reset dsp1
reset succeeded
root@keystone-evm:/lib/firmware# mpmcl load dsp0 messageq_single.xe66
load succeeded
root@keystone-evm:/lib/firmware# mpmcl run dsp0                       
run succeeded

Checking the trace buffer seems okay:

root@keystone-evm:/usr/local/bin# cat /debug/remoteproc/remoteproc0/trace0
3 Resource entries at 0x800000
messageq_single.c:main: MultiProc id = 1
registering rpmsg-proto service on 61 with HOST
tsk1Fxn: created MessageQ: SLAVE_CORE0; QueueID: 0x10000
Awaiting sync message from host...
root@keystone-evm:/usr/local/bin#

This is interesting.  Note that I do get the following log messages which indicate to me that some basic communication is happening between the ARM and the DSP.  

[   76.862704]  remoteproc0: powering up 2620040.dsp0
[   76.867348] virtio_rpmsg_bus virtio0: rpmsg host is online
[   76.871861]  remoteproc0: registered virtio0 (type 7)
[   76.876450] virtio_rpmsg_bus virtio0: creating channel rpmsg-proto addr 0x3d
[   76.882268] rpmsg_proto rpmsg0: inserting rpmsg src: 1024, dst: 61

2) Start the ARM

root@keystone-evm:/usr/local/bin# ./MessageQApp
Ipc_start: NameServer_setup() failed: -1
Using numLoops: 100; procId : 1
Ipc_start failed: status = -1
root@keystone-evm:/usr/local/bin#

Looking at the LAD output .....

root@keystone-evm:/usr/local/bin# cat /var/tmp/LAD/lad.txt

Initializing LAD...
    opening FIFO: /tmp/LAD/LADCMDS
Retrieving command...

LAD_CONNECT:
    client FIFO name = /tmp/LAD/1817
    client PID = 1817
    assigned client handle = 0
    FIFO /tmp/LAD/1817 created
    FIFO /tmp/LAD/1817 opened for writing
    sent response
DONE
Retrieving command...
LAD_MULTIPROC_GETCONFIG: calling MultiProc_getConfig()...
MultiProc_getConfig() - 9 procs
        Proc 0 - "HOST"
        Proc 1 - "CORE0"
        Proc 2 - "CORE1"
        Proc 3 - "CORE2"
        Proc 4 - "CORE3"
        Proc 5 - "CORE4"
        Proc 6 - "CORE5"
        Proc 7 - "CORE6"
        Proc 8 - "CORE7"
    status = 0
DONE
Sending response...
Retrieving command...
LAD_NAMESERVER_SETUP: calling NameServer_setup()...
NameServer_setup: entered, refCount=0
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: socket failed: 97, Address family not supported by protocol
NameServer_setup: creating listener thread
NameServer_setup: exiting, refCount=1
    status = -1
DONE
listener_cb: Entered Listener thread.
Sending response...
NameServer: waiting for unblockFd: 2, and socks: maxfd: 3
Retrieving command...
    EOF detected on FIFO, closing FIFO: /tmp/LAD/LADCMDS

    opening FIFO: /tmp/LAD/LADCMDS
root@keystone-evm:/usr/local/bin#

Two other items  are:

  1. the NameServerApp works as long as it is the first to run.  MessageQApp failures cause the LAD daemon to get into a funky state.
  2. ping_rpmsg does not work.  I get the following error.

root@keystone-evm:/usr/local/bin# ./ping_rpmsg
socket failed: Address family not supported by protocol (97)

It seems that if I can get past this socket problem, the system ought to work. 

  • Robert,

    This is turning out to be a sticky problem (I saw your post to the other thread asking William if he had found a solution).

    I guess we need to dig down into the dirty details...

    You say you have the kernel socket.h patch.  Would you please attach the file <linuxkernel>/include/linux/socket.h?

    Also, could you attach <linuxkernel>/include/net/rpmsg.h?

    Those files are for the kernel, and there needs to be a corresponding user-side definition for AF_RPMSG. The file <ipc>/linux/include/net/rpmsg.h defines AF_RPSMG as either 40 or 41, depending on the kernel version.  Do you know your kernel version?  The toplevel Makefile in your kernel source tree contains it at the top of the file, here's mine for reference:
    VERSION = 3
    PATCHLEVEL = 8
    SUBLEVEL = 13
    Sometimes it's hard to know which one is being used, so would you be able to print out AF_RPMSG from a program?:
        printf("%d\n", AF_RPMSG);

    Since the ping_rpmsg application is also failing in this way, it might be a good candidate for placing the above printf("%d\n", AF_RPMSG), that way you'll know that you're printing the value from a program that is failing when trying to use it.

    It's odd that NameServerApp works (when it is the first to run).  It does the exact same LAD_connect() (unless you've rebuilt it after changing it to not #define USE_NSD, out-of-the-box NameServerApp.c #defines USE_NSD which causes the LAD_connect() to be called).

    I recall that Keystone has the rpmsg_proto driver code builtin to the kernel (as opposed to being a loadable module, such as with OMAP5).  Can you inspect your boot logging and check if there is some kind of failure with setting up rpmsg_proto?  The rpmsg_proto_init() function can fail with one of the following prints:
        pr_err("proto_register failed: %d\n", ret);
        pr_err("sock_register failed: %d\n", ret);
        pr_err("register_rpmsg_driver failed: %d\n", ret);

    Regards,

    - Rob

  • Hi, Robert,

    I saw you were running MessageQApp. Could you try /usr/bin/MessageQBench instead for keystone-II?

    Rex

  • Hi Rex --

    I tried that as well.

    I did make some progress.  I searched and found an FAQ that pointed to a mismatch between kernel versions regarding AF_RPMSG..... It turns out that there is a header file in the ipc directory that explicitly assigns the protocol number based on kernel version.   The IPC library was choosing wrong and selecting 40 rather than 41.

    However, I now get the standard LAD error of 22, cannot connect.  Based on the log messages that appear on the ARM when I start the DSP applications, I am assuming that the DSP apps are configured correctly.  This behavior is the same with the default kernel supplied or my latest version. 

  • One more thing, there are no boot log errors with respect to rpmsg.

  • So, getting back to first principles, I  reinstalled the default file system.

    I copied the DSP executables (I built these) to the /lib/firmware directory and started messageq_single.

    I ran the /usr/bin/MessageQBench and it worked.  I took my built version of MessageQBench and it worked.  I only copied over the executable and none of the libraries.

    Next, I installed the ipc stuff via make install prefix=/..,.../usr/local.  That worked.

    Finally, I replaced the lad daemon (frpm /usr/local/bin to /usr/bin).  That worked.

    So, I can only conclude that something when awry in my root file system. 

    so, it *appears* to be working now with my updated kernel (need for RIO support).  I am currently using K2_LINUX_03.10.10_RIO_14.04

  • Hi, Robert,

    I'm glad that it works for you now. Yes, the IPC and kernel need to be from the same MCSDK release.

    One note to make on the RIO branch is that v3.8/rio-dev-dio is for kernel v3.8 and is more up-to-date than v3.10 because of more activities on v3.8 kernel. They should have been merged to v3.10/rio-dev-dio, but just have not happened yet.

    Rex

  • So, this is what I did at a high level

     

    1. Build the linux kernel with the following tag:   K2_LINUX_03.10.10_RIO_14.04

    2. Add the uio stuff to the device table (but I think that is not really needed) and build that.

    3. Modify a file in the ipc directory (linux/include/net/rpmsg.h) to change the AF_RPMSG to 41.   There is a check that compares linux version numbers that fails which causes the protocol errors. [Note using default arm cross compiler here]

    4. I used the default Linux file system from TI

    5. Copy DSP executables to /lib/firmware

    6. Install ARM executables to /usr/local/bin

      1. In the ipc directory, you do the following:  sudo make install prefix=/home/klink/rootfs/usr/local

      2. You will need to remove the lad_xxxx daemon from /usr/bin, and copy the one over from /usr/local/bin (and reboot)

     

    That’s it.  I think the fundamental problem with the IPC stuff was the protocol mismatch even with using the default Linux. 

    I had an additional problem that something went wrong with my rootfs (which was the TI default one).  I may do a post-partum on that later.