This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSP Block due to MessageQ_alloc/free on ARM

Other Parts Discussed in Thread: SYSBIOS

I see an issue on Device DM814x and Linux is running on ARM Cortex A8.

Tool versions on DSP are bios_6_33_05_46, ipc_1_24_03_32, syslink_2_20_00_14, xdctools_3_23_03_53

Multiple processes are running on ARM, with one process taking 40% CPU Load.

There is another process on ARM which controls the DSP and there is Syslink MsgQ and RingIO Communication between ARM and DSP.

We have a HeapMemMP created in the Shared Region(No GateMP is created) and registered to MessageQ with HeapId 1.

We send messages between ARM thread and DSP Task

ARM allocates memory sends the message to DSP and DSP frees the memory.

DSP allocates memory and sends the message to ARM and ARM frees the memory.

We are facing problem with the DSP getting blocked for 2-6 ms sometimes and missing real time

If the process which takes 40% cpu load on ARM is not running this issue is not observed.

If I change some messagess using messageq_alloc to static allocation, number of times this issue happens is reduced.

I see from GateMP doc, that when a remote processor has acquired the Gate, GateMP_enter enters a spinlock.

Does this mean higher priority task on dsp also gets blocked? What can I do to avoid blocking DSP due to this issue?

Let me know if you need more information.

-Kishor

  • Kishor,

    It sounds like the same HeapMemMP instance is used for sending messages in both directions. This puts heavy contention on the heap instance because both sides are doing alloc and free. You could try creating two heap instances, one for sending messages in each direction. This would reduce the contention.

    Creating a heap instance without a gate means the heap will inherit the default GateMP instance. This means that if any thread, running on any processor, takes the gate, it will cause the ARM and DSP tasks to spin waiting on the same gate. The default GateMP instance is used in many places (Notify, ListMP, etc.). I suggest you create a GateMP instance specifically for the heap you are using. This will reduce the contention. I would create a gate instance for each heap instance.

    ~Ramsey

  • Ramsey,

    We have a highest priority task on DSP which does not call any IPC/Syslink API.

    It only pends on a SYS/BIOS Mailbox which is posted by a Hwi.

    We see that this task also gets blocked during this case.

    We see from BIOS logs that Hwi are happening and the task is ready to run but it does not run.

    Why do you think this will happen?

    Also sometimes we saw the task running but it takes too long to run. It runs every 1 ms and usually finishes processing in 0.2 ms.

    But in this case it takes >2ms to finish.  We put Hwi_disable and Hwi_enable around the processing of the task and still we were observing this behavior sometimes.

    -Kishor

  • Kishor,

    Perhaps the scheduler is disabled. Unfortunately, we don't have any log events to report this. You could add some Log_print statements in the following functions to add your own log events. This would help you to know if the scheduler has been disabled.

    Task_disable
    Task_enable
    Task_restore
    Swi_enable
    Swi_disable

    Then set your BIOS lib type to custom in your config file to rebuild the SYS/BIOS sources.

    BIOS.libType = BIOS.LibType_Custom;

    As for the long running time, it's hard to guess without knowing what the task is doing. Here are some things to consider.

    1. Cache maintenance. Maybe in the long run, there is a lot of cache write-back and reloading, in the fast run there might be lots of cache hits.

    2. If the task is writing to L3 addresses, there could be lots of bus contention.

    3. Is the task busy waiting on some peripheral?

    ~Ramsey

  • Ramsey

    Attached is the log I have from sysbios rta agent when the problem is observed.

    8880.bios_log.txt

    2678.task_rov.txt

    You might be interested from the logid 147195510 ti.sysbios.knl.Swi,LM_begin

    This  swi ends only at 147195540 after 4ms (DSP is running at 600MHz)

    Task with pri 15 is readu to run at 147195514.

    So it seems this Swi is taking too long??

    This task does some sort bit formatting 14 bit to 16 bit conversion and Sample rate conversion. Most of the code/data is in DSP L2 IRAM.

    --Kishor

  • Ramsey,

    Can you let me know if Preemption is disabled in GateMP in Syslink Kernel Module?
    This issue is observed only if Kernel Preemption is enabled by Linux Kernel Config CONFIG_PREEMPT=y
    By default this option is disabled in the DM814x Linux PSP Kernel Release.
    I understand that GateMP is used even for MessageQ send/receive apart from Message_alloc/free.
    I added additional GateMP instance for the HeapMemMP but it did not solve the issue.

    How can I change the GateMP params to disable Kernel Preemption?

  • Hi Kishor,

    If you have CONFIG_PREEMPT_RT enabled, then Gate_enterSystem() will be interruptable.  Otherwise,
    Gate_enterSystem() disables interrupts.  Here is the code for the kernel side of Gate_enterSystem():

    IArg  Gate_enterSystem (void)
    {
        unsigned long flags;

    #if defined (CONFIG_PREEMPT_RT)
      if( staticOwner != get_current() ) {
        mutex_lock_interruptible(&staticMutex);
        staticOwner = get_current();
      }
      staticDepth++;
    #else
        local_irq_save (flags);
    #endif /*!defined (CONFIG_PREEMPT_RT) */

        return (IArg)flags;
    }

    So maybe what's happening is that a lower priority thread enters the MP gate and is then
    preempted.  Meanwhile, a thread on the DSP tries to enter the same MP gate, but cannot
    do so until the ARM side thread has released it.

    From the MessageQ code (ti/syslink/ipc/hlos/knl/MessageQ.c), it does not appear that MessageQ_get()
    and MessageQ_put() are using GateMP, so it may be some other function you are calling.  Could it be
    MessageQ_alloc()?

    You can try to raise your thread's realtime priority before entering the system gate to see if the problem
    goes away.  Here is some example code for changing the thread priority:

            #include <sched.h>

            struct sched_param params;
            Int retc = 0;

            retc = sched_getparam(getpid(), &params);
            params.sched_priority = 55;

            retc = sched_setparam(getpid(), &params);

    You can then lower the threads priority back to what it was after making the call that entered
    the GateMP.

    Best regards,

        Janet

  • Janet,

    I use syslink_2_20_00_14.  Other tool versions are mentioned in the first post.
    CONFIG_PREEMPT_RT option is not there in that code.  I see that in syslink_2_20_2_20 CONFIG_PREEMPT_RT is present.
    So it seems interrupts are disabled during GateMP.
    So how can preemption happen during GateMP?
    Are you suggesting that preemption happens in User mode ?
    Is GateMP used in User mode?
    I put some printf in Gate_enterSystem in utils/hlos/usr/Gate.c but it seems to enter there only during initialisation and after that it does not enter here.
    Only in kernel mode (utils/hlos/knl/Gate.c) prints are seen when MessageQ alloc/put is happening.
    I see that MessageQ_put calls ListMP functions which inturn uses GateMP.  Is it right?
    I would like to avoid trying changing task priorities as these functions are called at multiple places.
  • Janet,

    I found that interrupts are not disabled and preemption can happen when HW Spin lock is acquired.
    I would like to disbable preemption when the HW Spin Lock is acquired.
    I made the below change in knl/GateMP.c and with this change I did not observe the blocking problem on the DSP.
    Can you provide a cleaner way on how to disable Preemption when HW Spin Lock is acquired?

    snip from GateMP.c

    <<<<

    unsigned long gateMP_flags=0;
    /*
     *  ======== GateMP_enter ========
     */
    IArg GateMP_enter (GateMP_Object * obj)
    {
        IArg key;

        local_irq_save (gateMP_flags);
        
        GT_1trace (curTrace, GT_ENTER, "GateMP_enter", obj);
        key = IGateProvider_enter(obj->gateHandle);
        GT_1trace (curTrace, GT_LEAVE, "GateMP_enter", key);

        return (key);
    }

    /*
     *  ======== GateMP_leave ========
     */
    Void GateMP_leave (GateMP_Object * obj, IArg key)
    {
        GT_2trace (curTrace, GT_ENTER, "GateMP_leave", obj, key);
        IGateProvider_leave(obj->gateHandle, key);
        GT_0trace (curTrace, GT_LEAVE, "GateMP_leave");

        local_irq_restore ((unsigned long) gateMP_flags);
    }

    >>>>

  • Janet,

    With the above change disabling interrupts, I get a stack dump when the process is started but continues to work.
    Can you provide some insight why does this happen?

    ------------[ cut here ]------------
    WARNING: at kernel/softirq.c:159 local_bh_enable+0x54/0xd4()
    Modules linked in: radio_pll_pinmux syslink
    Backtrace:
    [<c004ae90>] (dump_backtrace+0x0/0x110) from [<c04071b8>] (dump_stack+0x18/0x1c)
     r7:00000000 r6:c0076184 r5:c04dd1db r4:0000009f
    [<c04071a0>] (dump_stack+0x0/0x1c) from [<c00701c0>] (warn_slowpath_common+0x54/0x6c)
    [<c007016c>] (warn_slowpath_common+0x0/0x6c) from [<c00701fc>] (warn_slowpath_null+0x24/0x2c)
     r9:00000001 r8:d5400980 r7:00000004 r6:ccb84b00 r5:c056e828
    r4:c05c28c0
    [<c00701d8>] (warn_slowpath_null+0x0/0x2c) from [<c0076184>] (local_bh_enable+0x54/0xd4)
    [<c0076130>] (local_bh_enable+0x0/0xd4) from [<c00690f4>] (omap_mbox_msg_send+0xdc/0xec)
     r5:c056e828 r4:00000000
    [<c0069018>] (omap_mbox_msg_send+0x0/0xec) from [<c032fe04>] (notify_shm_drv_send_event+0x1c8/0
    x208)
     r5:00000080 r4:00000000
    [<c032fc3c>] (notify_shm_drv_send_event+0x0/0x208) from [<c032d594>] (notify_send_event+0x114/0
    x26c)
    [<c032d480>] (notify_send_event+0x0/0x26c) from [<bf03bce0>] (Notify_sendEvent+0x64/0x94 [sysli
    nk])
    [<bf03bc7c>] (Notify_sendEvent+0x0/0x94 [syslink]) from [<bf054de8>] (NameServerRemoteNotify_ge
    t+0x3a4/0x57c [syslink])
    [<bf054a44>] (NameServerRemoteNotify_get+0x0/0x57c [syslink]) from [<bf02cfc4>] (NameServer_get
    +0x268/0x3bc [syslink])
    [<bf02cd5c>] (NameServer_get+0x0/0x3bc [syslink]) from [<bf076ff8>] (_ClientNotifyMgr_create+0x
    5c4/0xcb8 [syslink])
    [<bf076a34>] (_ClientNotifyMgr_create+0x0/0xcb8 [syslink]) from [<bf0777ac>] (ClientNotifyMgr_c
    reate+0xc0/0x114 [syslink])
    [<bf0776ec>] (ClientNotifyMgr_create+0x0/0x114 [syslink]) from [<bf076238>] (RingIOShm_create+0
    x86c/0x970 [syslink])
     r4:00000180
    [<bf0759cc>] (RingIOShm_create+0x0/0x970 [syslink]) from [<bf06fb6c>] (RingIO_create+0x160/0x2f
    0 [syslink])
    [<bf06fa0c>] (RingIO_create+0x0/0x2f0 [syslink]) from [<bf084e18>] (RingIODrv_ioctl+0x2d8/0x10c
    0 [syslink])
     r5:c028f355 r4:00000000
    [<bf084b40>] (RingIODrv_ioctl+0x0/0x10c0 [syslink]) from [<c00e1ad0>] (vfs_ioctl+0x28/0x44)
     r8:41c8dcf8 r7:0000001d r6:0000001d r5:cb8a4a00 r4:cc025060
    [<c00e1aa8>] (vfs_ioctl+0x0/0x44) from [<c00e2264>] (do_vfs_ioctl+0x558/0x5a0)
    [<c00e1d0c>] (do_vfs_ioctl+0x0/0x5a0) from [<c00e2304>] (sys_ioctl+0x58/0x7c)
    [<c00e22ac>] (sys_ioctl+0x0/0x7c) from [<c0046f40>] (ret_fast_syscall+0x0/0x30)
     r8:c00470e8 r7:00000036 r6:001058d8 r5:c028f355 r4:41c8dcf8
    ---[ end trace e188b6cc692f473c ]---

  • Hi Kishor,

    Could you try replacing the local_irq_save() and local_irq_restore() calls with preempt_disable() and preempt_enable()
    in GateMP_enter() and GateMP_exit(), and see if that fixes the problem?  We looked into the kernel warning, and that is
    happening because interrupts are disabled.

    Thanks,

        Janet

  • Hi Kishor,

    I dug through the Syslink code some more, and I have something else you can try.  When you create the HeapMemMP
    heap that you will register with MessageQ, you are not creating your own gate for it,  correct?  If that is the case, HeapMemMP
    will use GateMP's default gate (as Ramsey mentioned in an earlier post).  It looks like the default gate for GateMP is a mutex
    gate, so that probably wouldn't keep you from being preempted.  Could you try creating a GateMP gate for the heap that
    uses GateMP_LocalProtect_INTERRUPT, and see if that helps?  Here is some sample code for creating the gate:

        GateMP_Params   gateParams;

        GateMP_Params_init (&gateParams);

        gateParams.name             = GATEMP_NAME;  /*  Your GateMP name */
        gateParams.regionId         = SHARED_REGION_1;
        gateParams.localProtect     = GateMP_LocalProtect_INTERRUPT;
        gateParams.remoteProtect    = GateMP_RemoteProtect_SYSTEM;

        gateMPHandle = GateMP_create (&gateParams);

        HeapMemMP_Params    heapParams;

        HeapMemMP_Params_init (&heapParams);

        heapParams.name         =   GPP_HEAPMEMMP_NAME;  /* Your heap name */
        heapParams.regionId     =   SHARED_REGION_1;
        heapParams.gate         =  gateMPHandle;

    If this works, it would be a better way to fix the problem than adding preempt_disable()/preempt_enable()
    calls to GateMP_enter()/GateMP_exit(). But I haven't actually tried this out.

    Thanks,

        Janet


  • Janet,
    I understand GateMP is used with MessageQ_put/get as well(From ListMP calls) apart from MessageQ_alloc/free. Is this right?
    I tried create a new GateMP with HeapmemMP in Interrupt option but it did not help much.
    I have also used to Heaps with different gates for each direction but there is not much improvement.
    When preempt_disable/enable option is used I get Kernel exception and our Application crashed at the same point where Kernel warning was there(RingIO_create)
    -Kishor
  • I'm only seeing Gate_enterSystem() calls in NotifyDriverShm_sendEvent()  when sending to a remote proc.  It looks like
    it waits for a previous event on the remote pocessor to be cleared.

        Janet

  • Kishor,

    Sorry, but I was looking at the wrong MessageQ transport function in my last post.  The default transport function,
    TransportShm, uses ListMP for the messages.  I was looking at the code in TransportShmNotify.c.  You might be
    able to get better performance on the MessageQ_put() calls by switching to TransportShmNotify.  Here is a link
    that explains the SysLink Notify drivers and transports, and how you can configure them:

    http://processors.wiki.ti.com/index.php/SysLink_Notify_Drivers_and_Transports

    Take note, though, that you will need to use the NotifyShmDrv if you are using a TI81XX device, but you may change
    the transport.  There are also some limitations to using TransportShmNotify.  The sender will have to wait until the
    receiver has moved the message.

    Best regards,

        Janet

  • Janet,

    We'll have to make lot of changes in our application to have only one message on the Queue to move to NotifyShmDrv.
    I have avoided using MsgQ for the most frequent messages between ARM and DSP.
    With this change along Interrupt disable in GateMP the problem is not observed.
    However I would prefer to use preempt_disable instead of Interrupt disable in GateMP.
    I would like to know why the preempt_disable/enable in GateMP_enter/leave causes a crash.

    Attached is the exception log with Syslink TRACEENTER=1

    Can you check this and see if you can find why it crashes? Let me know if you need some other logs.

    3465.syslink_trace.log

    -Kishor
  • Kishor,

    From the trace, it looks like the error:

    Leaving GateMPDrv_drvioctl
        osStatus    [0x0]
    BUG: scheduling while atomic: RadioApp/1259/0x00000002
    Modules linked in: radio_pll_pinmux syslink
    Backtrace:
    [<c004ae90>] (dump_backtrace+0x0/0x110) from [<c04071b8>] (dump_stack+0x18/0x1c)
     r7:00000036 r6:00106404 r5:cc011600 r4:00000000
    [<c04071a0>] (dump_stack+0x0/0x1c) from [<c006aad8>] (__schedule_bug+0x54/0x60)
    [<c006aa84>] (__schedule_bug+0x0/0x60) from [<c04074fc>] (schedule+0x78/0x3c0)
     r5:cc011600 r4:4242f7ec
    [<c0407484>] (schedule+0x0/0x3c0) from [<c0046fa0>] (ret_slow_syscall+0x0/0x10)

    is occurring because the kernel is calling 'schedule()' while preemption is disabled.   It looks like GateMP_enter()
    is called in user mode, which goes down to the kernel mode GateMP_enter() (where preemption is disabled).
    Then when leaving the GateMP_drvioctl, the kernel calls schedule().  So it seems to me that calling schedule()
    with preemption disabled is not allowed.  That was the only place in the trace where I could find GateMP_enter()
    called from user mode.

    It was hard to tell from the trace what was really happening.  It looked to me like maybe one thread was blocked
    waiting for a message, and another was calling ClientNotifyMgr_create().

    Best regards,

        Janet

  • Janet,

    Thanks for pointing this out.
    I found that RingIO_registerNotifier calls GateMP_enter in user mode which seems to cause this crash.
    To work around this, I created copy of GateMP_enter,GateMP_leave functions and named them GateMP_enter_user and GateMP_leave_user.
    And changed GateMpDrv.c to call a GateMP_enter_user and GateMP_leave_user.
    I used preempt_disable and preempt_enable only if GateMP is called from Kernel mode.
    With this change I don't see the crash.
    But sometimes I see below warnings. I suppose this happens because ISR tries to acquire the Gate when preemption is disabled.
    Let me know what do you think about this.
    BUG: scheduling while atomic: kworker/0:1/23/0x00000002
    Modules linked in: radio_pll_pinmux syslink
    Backtrace:
    [<c004ae90>] (dump_backtrace+0x0/0x110) from [<c04071b8>] (dump_stack+0x18/0x1c)
     r7:000b0ae6 r6:000003e8 r5:cc96a000 r4:00000000
    [<c04071a0>] (dump_stack+0x0/0x1c) from [<c006aad8>] (__schedule_bug+0x54/0x60)
    [<c006aa84>] (__schedule_bug+0x0/0x60) from [<c04074fc>] (schedule+0x78/0x3c0)
     r5:cc96a000 r4:000b0ece
    [<c0407484>] (schedule+0x0/0x3c0) from [<c0407fec>] (schedule_timeout+0x1c0/0x1f8)
    [<c0407e2c>] (schedule_timeout+0x0/0x1f8) from [<bf034224>] (OsalMutex_enter+0x120/0x1c4 [syslink])
     r7:d0aff018 r6:cc96c000 r5:d0aff000 r4:cc96a000
    [<bf034104>] (OsalMutex_enter+0x0/0x1c4 [syslink]) from [<bf02b698>] (GateMutex_enter+0x64/0xa4 [syslink])
     r8:00000000 r7:d5401880 r6:00000000 r5:bf0fb7d8 r4:d0afc000
    [<bf02b634>] (GateMutex_enter+0x0/0xa4 [syslink]) from [<bf0383e8>] (SharedRegion_getId+0xf0/0x1e4 [syslink])
     r5:d5402f00 r4:d0afc000
    [<bf0382f8>] (SharedRegion_getId+0x0/0x1e4 [syslink]) from [<bf0412f0>] (ListMP_getHead+0x17c/0x2a4 [syslink])
     r7:d5401880 r6:00000000 r5:d53ec000 r4:d5402f00
    [<bf041174>] (ListMP_getHead+0x0/0x2a4 [syslink]) from [<bf05ab2c>] (_TransportShm_notifyFxn+0xb4/0xfc [syslink])
     r8:00000002 r7:d5401880 r6:00000000 r5:d53e9000 r4:bf0fb7d8
    [<bf05aa78>] (_TransportShm_notifyFxn+0x0/0xfc [syslink]) from [<c032cde0>] (notify_exec+0x9c/0xb0)
     r5:cc322c00 r4:00000002
    [<c032cd44>] (notify_exec+0x0/0xb0) from [<c032f41c>] (notify_shmdrv_isr_callback+0x98/0xac)
     r6:ccb84600 r5:d5401780 r4:00000000
    [<c032f384>] (notify_shmdrv_isr_callback+0x0/0xac) from [<c032f45c>] (notify_shmdrv_dsp_isr+0x2c/0x3c)
     r7:00000000 r6:fffffffe r5:00000000 r4:00000002
    [<c032f430>] (notify_shmdrv_dsp_isr+0x0/0x3c) from [<c040bdbc>] (notifier_call_chain+0x34/0x78)
     r5:00000000 r4:00000000
    [<c040bd88>] (notifier_call_chain+0x0/0x78) from [<c008fb88>] (__blocking_notifier_call_chain+0x54/0x6c)
    [<c008fb34>] (__blocking_notifier_call_chain+0x0/0x6c) from [<c008fbc0>] (blocking_notifier_call_chain+0x20/0x28)
     r8:cc87d400 r7:cc96df34 r6:cc96c000 r5:00000004 r4:ccaaebc0
    [<c008fba0>] (blocking_notifier_call_chain+0x0/0x28) from [<c0068cb0>] (mbox_rx_work+0x54/0x118)
    [<c0068c5c>] (mbox_rx_work+0x0/0x118) from [<c008446c>] (process_one_work+0x210/0x358)
     r7:cc913300 r6:c0572670 r5:c057276c r4:ccaaebd4
    [<c008425c>] (process_one_work+0x0/0x358) from [<c0084a5c>] (worker_thread+0x1e0/0x310)
    [<c008487c>] (worker_thread+0x0/0x310) from [<c008a0c4>] (kthread+0x8c/0x94)
    [<c008a038>] (kthread+0x0/0x94) from [<c0073ad4>] (do_exit+0x0/0x680)
     r7:00000013 r6:c0073ad4 r5:c008a038 r4:cc837ee8
    Enabled McASP Transmit
    Enabled McASP Transmit
    BUG: scheduling while atomic: RadioApp/1259/0x00000002
    Modules linked in: radio_pll_pinmux syslink
    Backtrace:
    [<c004ae90>] (dump_backtrace+0x0/0x110) from [<c04071b8>] (dump_stack+0x18/0x1c)
     r7:000b0c35 r6:000003e8 r5:cc015b80 r4:00000000
    [<c04071a0>] (dump_stack+0x0/0x1c) from [<c006aad8>] (__schedule_bug+0x54/0x60)
    [<c006aa84>] (__schedule_bug+0x0/0x60) from [<c04074fc>] (schedule+0x78/0x3c0)
     r5:cc015b80 r4:000b101d
    [<c0407484>] (schedule+0x0/0x3c0) from [<c0407fec>] (schedule_timeout+0x1c0/0x1f8)
    [<c0407e2c>] (schedule_timeout+0x0/0x1f8) from [<bf034224>] (OsalMutex_enter+0x120/0x1c4 [syslink])
     r7:d0b0e018 r6:cc218000 r5:d0b0e000 r4:cc015b80
    [<bf034104>] (OsalMutex_enter+0x0/0x1c4 [syslink]) from [<bf02b698>] (GateMutex_enter+0x64/0xa4 [syslink])
     r8:0000ffff r7:fa0ca800 r6:d5402f80 r5:bf0fb7d8 r4:d0b0b000
    [<bf02b634>] (GateMutex_enter+0x0/0xa4 [syslink]) from [<bf0564b4>] (GateHWSpinlock_enter+0x94/0x1ac [syslink])
     r5:d53ce000 r4:d0b0b000
    [<bf056420>] (GateHWSpinlock_enter+0x0/0x1ac [syslink]) from [<bf04b580>] (GateMP_enter+0xac/0xf4 [syslink])
     r7:d53e9000 r6:d5402f80 r5:d53cb000 r4:d53ce000
    [<bf04b4d4>] (GateMP_enter+0x0/0xf4 [syslink]) from [<bf05b03c>] (TransportShm_put+0x1dc/0x2f4 [syslink])
     r5:d5402f80 r4:00000000
    [<bf05ae60>] (TransportShm_put+0x0/0x2f4 [syslink]) from [<bf03ddd8>] (MessageQ_put+0x258/0x4e4 [syslink])
     r8:0000ffff r7:00000000 r6:00000001 r5:d5402f80 r4:d53e6000
    [<bf03db80>] (MessageQ_put+0x0/0x4e4 [syslink]) from [<bf080644>] (MessageQDrv_ioctl+0x11c/0x9c0 [syslink])
    [<bf080528>] (MessageQDrv_ioctl+0x0/0x9c0 [syslink]) from [<c00e1ad0>] (vfs_ioctl+0x28/0x44)
    [<c00e1aa8>] (vfs_ioctl+0x0/0x44) from [<c00e2264>] (do_vfs_ioctl+0x558/0x5a0)
    [<c00e1d0c>] (do_vfs_ioctl+0x0/0x5a0) from [<c00e2304>] (sys_ioctl+0x58/0x7c)
    [<c00e22ac>] (sys_ioctl+0x0/0x7c) from [<c0046f40>] (ret_fast_syscall+0x0/0x30)
    -Kishor
  • Hi Kishor,

    I think what is happening here is that you have threads trying to enter an OsalMutex gate, but another thread
    is holding the gate.  GateMP_enter() tries to acquire a local gate (OsalMutex) but before that, it has called
    preempt_disable().  So if another thread is holding the OsalMutex, GateMP_enter() will time out trying to
    acquire the local gate.  Looking at the file ti/syslink/utils/hlos/knl/osal/OsalMutex.c, you can see that
    OsalMutex_enter() will call schedule_timeout() if it is blocked.

    So, I think calling preempt_disable() in the GateMP_enter() code is not a good idea after all.

    Going back to the suggestion of using TransportShmNotify instead of TransportShm, I don't understand
    why you would need to change the application to have only one message on the queue.  I think the isr will
    just transfer one message to the queue, instead of going through the list of messages to put on queues.
    So, I think there would be just some configuration code that you would need to add, as described in

    http://processors.wiki.ti.com/index.php/SysLink_Notify_Drivers_and_Transports#Shared_Memory_Notify_Transport

    This would allow the Arm side to get messages to the DSP faster, since it avoids going through ListMP.  And, if
    the DSP is really blocked waiting for messages from the Arm, the Arm side may not have to spin waiting for the receiver
    to move the message to the queue.  Maybe this wording in the link above is a little confusing:

    Using NotifyDriverShm will only allow one message at a time to be placed in the MessageQ.

    (This means using NotifyDriverShm with TransportShmNotify)  You can have multiple messages on the queue,
    it's just that the isr will just put one message on it. Anyway, I think it's still worth trying out to see if performance
    improves.

    Best regards,

        Janet

  • Janet,

    I tried using  TransportShmNotify.

    I faced compile errors in Syslink for missing Header files when I changed to this Transport.
    After fixing it, when I ran it after few seconds I get assert for MessageQ_INVALIDMESSAGEQ in knl/MessageQ_put. This assert does not come in usr/MessageQ_put which is strange.
    Then I removed MessageQ use for the most Frequenct messages in our app(Request + Response from ARM to DSP on average every 5 ms)
    With this I did not observe this assert issue and I didn't face DSP Blocking problem over 2hr of testing.
    I have below questions:
    1. What is the level of testing done with TransportShmNotify?
    2. I understand Sender will spin until Receiver receives the Message which can increase CPU consumption of Sender Task. This time depends on the Interrupt latency of the Receiver?
    If we decide to use TransportShmNotify,  how can we guarantee to fix this problem?
    One place I know GateMP is used is MessageQ_alloc/free with HeapMemMP. If I create a gateMP with GateMP_LocalProtect_INTERRUPT this cannot get preempted.
    Apart from MessageQ we have 2 DSP to ARM RingIO where DSP is the writer and we use Callback notification. How can we disable preemption in this case?
    -Kishor
  • Hi Kishor,

    When you rebuilt Syslink with TransportShmNotify, what were the missing header files?  I tried this with syslink 2.20.02.20
    and the MessageQ example for TI814X, and did not get any build errors.

    The reason you may have only seen the kernel assert for MessageQ_INVALIDMESSAGEQ, is that these trace messages
    come out immediately.  The user assert will come out after.  So the user assert is probably there.

    Testing for TransportShmNotify is the same as for TransportShm.  The same tests were run for both transports.

    The answer to question 2, is yes, the sender will have to wait until the receiver has gotten the message, if you use
    TransportShmNotify.

    From the HeapMemMP code, it looks like if you create a heap without a gate, it uses the GateMP default gate,
    which has local protection TASKLET (which ends up being a Linux mutex), and remote protection SYSTEM (which
    uses the hardware spin lock).  If you create a gate with localProtect INTERRUPT, you will do a local_irq_save()
    (disabling interrupts) before acquiring the hardware spin lock.  This is what you tried to do initially to fix the
    problem, by putting the local_irq_save() in the GateMP_enter() call.  But when you tried creating a gate with
    localProtect INTERRUPT for the HeapMemMP heap, you did not get much improvement, right?

    I'm not too familiar with RingIO, so I'll have to dig into that code a bit.  Where is it that you want to disable preemption?
    Is it in the callback notification?

    Best regards,

        Janet

  • Janet,
    After Changing to TransportShmNotify with SR0 uncached the problem is fixed.
    I faced DSP abort during long run when using a separate gate with HeapMemMP or RingIO. So I'm not using it for now. I'll check this later.
    Thanks for your support.

    -Kishor

  • I think I am observing the same issue. A periodic EDMA interrupt that is scheduled to fire every 750 microseconds on the DSP is sometimes delayed to be as long as 1200 microseconds. I am measuring times in the DSP's EDMA HWI interrupt routine. Can you elaborate on the background of what you think is happening and why changing the MessageQ's Gate protection method will have an effect.

    For the record, DSP is doing MessageQ_alloc(), MessageQ_put() and MessageQ_free().

    Is the theory that the ARM side does a MesageQ_alloc() and the DSP side tries to do a MessageQ_alloc() at the same time and the DSP spins in a loop with HWIs disabled?

    Is there documentation somewhere on what syslink operations disable interrupts? I can't seem to find anything written down.