This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

ARM Applications using Syslink at 100% CPU Utilization. Syslink interrupt count don't increment.

Hi,

We are using OMAP L138 and ipc_1_25_03_15, bios_6_35_04_50, xdctools_3_25_03_72, syslink_2_21_02_10.

We are observing that the ARM tasks that use syslink suddenly start using 100% of CPU between them. The DSP functionality is also dead when this happens. The DSP may be locked or dead. In trying to understand the behavior if we powerdown the DSP the same behavior is observed. The ARM tasks using the syslink use 100% CPU between them.

cat /proc/interrupts shows that there are no interrupts for syslink.

28:         61     cp_intc  SYSLINK This value remain same.

In the setup the ARM and DSP talk over simplex and duplex channels.

For each channel ARM Creates the Heap. No separate Gate is created for each Heap.

Each core creates it own Rx MessageQ and registers with the heap.

Each Core Opens the remote Tx MessageQ.

The Notify Driver and Transport mechanism used are:

SYSLINK_NOTIFYDRIVER=NOTIFYDRIVERSHM
SYSLINK_TRANSPORT=TRANSPORTSHM

The behavior is random and has happened after varying longs hours of operation and sometimes even after few minutes.

I do not have any trace log at the moment.

Where should we start looking for the problem? Why and how is that using syslink is providing a tight coupling between the two cores?

Even if the DSP is dead the ARM should still continue with the task sleeping if the Gate is not available or execute normally but report send-to-DSP errors because heap is out of memory when DSP is not reading off from the MessageQ.

Thanks.

Taran Tripathi

  • Hi Taran,

    Does the CPU load go to 100% upon entering any particular function?  I wonder if the DSP has crashed while holding a GateMP gate.  In that case, the Arm may try to acquire the gate and spin.

    Best regards,

        Janet

  • Hi Janet,

    Thanks for responding. We believe it to be the case. Our ARM applications are mostly pending on select(). One of the select() fd being a "syslink device" whose peek method queries the Rx MessageQ MessageQ_count() to get the number of unprocessed Messages in the Q.

    At the moment we are trying to isolate and fix the DSP lockup problem.

    Is there a way to decouple and prevent this behavior? Can we change the behavior to if gate not available, return false and drop the data instead of having the ARM application just spinning there?

    Thanks,

    Taran Tripathi

  • Hi Taran,

    I think the MessageQ_count() function is using Gate_enterSystem(), which disables interrupts while getting the count.  So MessageQ_count() should not be blocking on a hardware spinlock.  How did you implement the select() function?  Is it calling MessageQ_get()?

    Thanks,

        Janet

  • Hi Janet,

    The select() does not use MessageQ_get(). It uses MessageQ_count(). MessageQ_get() is used only in the read.

    Our select() function checks the MessageQ_count and if not >0 does a poll_wait() on a wait_queue_head_t. The wait_queue_head_t is created during the driver module initialization whereHeapBufMP and MessageQ for the channels are created and the ARM driver specifics (char device create and init) are done.

    The wait_queue_head_t is pulsed (wake_up_interruptible) in a function when the MessageQ_count returns >0 for the channel. This function is registered with the Notify using the Notify_registerEventSingle.

    Flow then would be:

    DSP sends data, notify module on ARM fires, our functions gets invoked as handler. Check if there is a message in the MessageQ. Yes then pulse the wait_queue_head_t.

    The ARM Application through select() pends on the wait_queue_head_t, returning POLLIN | POLLRDNORMAL when MessageQ_count >0.

    Let me know if you need more information/details.

    Thanks,

    Taran Tripathi

  • Taran,

    Would you explain the overall goal to help me understand what your application is doing. I'm not clear on what you mean by "simplex" and "duplex" channels. Are you trying to write your own IPC transport?

    Why is your application thread calling select instead of simply calling MessageQ_get?

    It sounds like the DSP allocates a message and then sends it to the ARM. Is it using MessageQ_put?

    --------

    To help understand what is happening on the ARM, you can turn on some SysLink trace. Have a look at the SysLink User Guide to see the details.

    ~Ramsey

  • Hi Ramsey,

    Simplex: Unidirectional channels (either ARM or DSP sends on this channel)

    Duplex: Bi-directional channel (Both ARM and DSP can send on this channel).

    A channel is a logical entity. Refer figure 1 in the attachment.

    Initialization involves:

    1. ARM creating a HeapBufMP for each ‘channel’
    2. ARM/DSP each core creates a MessageQ if the core is receiving on that channel.
    3. Registers HeapBufMP with the MessageQ.
    4. Each core then MessageQ_open the MessageQ on the remote core.

    It is not our own transport but a wrapper over syslink which does input validation. send() validates that the message are of HeapBufMP.blocksize and not more, there are available blocks with the Heap. poll() gets the MessageQ_count(). recv() does MessageQ_get(). This wrapper also checks for the ARM/DSP read/write permission on that channel.  For example in figure 1, a DSP write() on the channel ‘command’ will fail because it is read only on DSP.

    So the ARM-DSP Driver in figure 2 provides for the Linux kernel module providing each 'channel' as a char device whereas the box below provides for permission checks and the input validation. We can get rid of the abstraction library but then expose syslink to everybody who is reading/writing to the other core.

    The software architecture drive the need for select(). Our Applications get/send data using sockets. So communication with DSP on a ‘channel’ has to be provided as if communicating with a device. that calls for providing the 'channel' as a device and then being able to select() on its file descriptor. Using MessageQ_get() will either block for timeout period or will always execute returning no messages. select() in our design is means to relinquish CPU.

    Yes both DSP and ARM do a MessageQ_alloc(), fill in the data and then use MessageQ_put().

    Other than HeapBufMP creation during the initialization, the abstraction library at runtime uses MessageQ_alloc, MessageQ_send , MessageQ_count, MessageQ_get and MessageQ_Free and in this order for the message to start from one core and reach the other core.

    I haven't been able to get my hands on the target so I cannot get the trace logs. I will post is as soon as I have them.

    TRACEENTER=1 gives a much verbose log, should I enable it?

    4760.Document2.pdf

    Thanks,

    Taran Tripathi

  • Taran,

    Okay, I'm starting to understand the setup. Thank you for taking the time to explain it and the picture helps. I have some questions and comments.

    In Figure 2, the bottom box labeled "SysLink Module" should be labeled "IPC Module". In the software you are currently using, each side was developed independently (and by different teams). The ARM side is called SysLink and the DSP side is called IPC.

    In your description above, you talk about the ARM side abstraction layer and what it does at runtime. I may have misunderstood you, but it seems you are saying that for an outbound message (i.e. ARM to DSP), the ARM side calls MessageQ_free after sending the message. This is not correct. Here is what I think should be happening.

    1. Application (on ARM, in user space) sends a message to the DSP by calling send on a socket.

        send(socket, buffer, length, flags)

    2. Your driver handles the send call by allocating a MessageQ message, copies the data from user space into the message, and then sends the message to the DSP.

        msg = MessageQ_alloc(heapId, size)
        <copy user data into msg>
        MessageQ_put(queueId, msg)

    Note: there is no call to MessageQ_free for an outbound message.

    For an inbound message (i.e. DSP to ARM) your driver wakes any user thread currently sleeping on a wait_queue.

    1. Application (on ARM, in user space) calls select. The thread ends up in the kernel and then blocks waiting for a message.

        count = select(nfds, readfds, NULL, NULL, timeout)
        ---- thread thunks into kernel driver ----
        MessageQ_count(handle)
        <if no messages, block on wait_queue>

    2. When a new inbound message arrives, your kernel thread wakes up and signals the waiting user thread.

        <notify invokes your callback>
        MessageQ_count(handle)
        <figure out which message queue has new message>
        <wake sleeping user thread>

    3. Application thread wakes in kernel and returns to user space.

    4. Application calls read on file descriptor returned from select call.

        read(fd, &buf, size)
        ---- thread thunks into kernel driver ----
        MessageQ_get(handle, &msg, 0);
        <copy data into user buffer provided by read function>
        MessageQ_free(msg)

    Have I understood your setup correctly?

    I'm not clear on how the DSP signals the ARM via Notify to invoke your ARM callback in the driver. But maybe that is not important right now.

    So, when you observe the CPU 100% usage and subsequent lockup, you suspect that MessageQ_count is spinning and not returning?

    Janet had mentioned above that MessageQ_count will disable interrupts, count the number of messages in the queue, and then re-enable interrupts. You can see this by looking for the implementation in the following file:

    syslink_2_21_02_10/packages/ti/syslink/ipc/hlos/knl/MessageQ.c

    I suspect a problem with the queue. Messages are stored in a queue by using a linked list. MessageQ_count walks the list to count the messages. If the list has a bad loop in it, then the function will never return. I suggest you add a print statement inside the loop and see if you end up spinning here forever.

    Int MessageQ_count(MessageQ_Handle handle)
    {
        /* ... */
        key = Gate_enterSystem ();
    
        List_traverse (elem, (List_Handle) &obj->highList) {
            printk(KERN_ALERT "MessageQ_count: highList count=%d\n", count);
            count++;
        }
    
        List_traverse (elem, (List_Handle) &obj->normalList) {
            printk(KERN_ALERT "MessageQ_count: normalList count=%d\n", count);
            count++;
        }
    
        Gate_leaveSystem (key);
    }

    You might be interested in looking at the implementation of Gate_enterSystem. You can find it here:

    syslink_2_21_02_10/packages/ti/syslink/utils/hlos/knl/Gate.c

    The implementation of List_traverse is a macro:

    syslink_2_21_02_10/packages/ti/syslink/utils/List.h

    Let me know what this test yields and then we can take the next step.

    PS. You are correct; enabling trace yields a lot of output. I suggest just adding print statements as needed.

    ~Ramsey