This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Semaphore_pend not returning

Other Parts Discussed in Thread: AM3352, SYSBIOS

I am using SYS/BIOS 6.34.2.18

I am currently using the Semaphore_pend to drive traffic over a USB bulk endpoint.  I send one message and then wait for an interrupt to set a Semaphore_post before sending the next message.  I set up the pend as follows:

if (Semaphore_pend(semUSBTransferComplete, 5000) == FALSE) //wait up to 5 seconds before timing out

My problem is the semaphore never times out and even though the semaphore has been set it doesn't trigger.

I have verified the bios clock is still active and the semaphore has been set by reading the clock tick and checking the semaphore count in an interrupt routine after I know the error has occurred.  I used the following code to do this:

    tickCount=Clock_getTicks();

    count=Semaphore_getCount(semUSBTransferComplete);

The tickCount is changing and the semaphore count is 1 in the interrupt routine, so I know the Semaphore_post happened.

 

I send out 2 types of messages, the first is 32 bytes and the second can be very large.  The problem only occurs on the 32 byte message on occasion.  It seems when I start the USB transfer and then go to execute the semaphore wait something gets corrupted in the semaphore.  I am guessing the code for the USB posting the semaphore is being executed during the code setting up the semaphore_pend.  If the code for the post somehow gets set during the setup, maybe the blocking of the task does not get setup correctly.

I can make the problem go away by inserting a small delay between starting the USB and setting up the Pend.  This would allow the Post to completely occur before the Pend gets started.  The larger messages would take longer the finish so the small delay will still allow the Pend to be setup before the USB could complete, therefore the Post would happen after the task is in a pending state.

Inserting a small delay doesn't seem to be a great way to fix the problem since the USB could be paused equivelent to the delay.

 

Any help with this issue would be greatly appreciated.

Regards,

Bill

  • HI William,

    What target are you using?

    Semaphores_pend() block when the count == 0. The count value should in the ISR should be 0 before you perform a Semaphore_post().

    What's your Semaphore starting value? I think you want it to be 0.

  • Tom,

    We are using the AM3352.  I start with the count at 0 and in my interrupt routine post the semaphore.

    Yesterday I conducted another test to see if setting the semaphore a second time would release the semaphore.  It did not.  I also checked the task running during the interrupt and it was the task pending the semaphore.

    Looking at the code for the semaphore_pend it appears there is a portion of the code which will shut off scheduling in sysbios.   It seems like scheduling was never reenabled.  Is there a way I can tell if the scheduler is shut off?

     

  • Bill,

    On which core are you running this? Does the Swi Example work? Can you post your .cfg and you .c file?

    William Kozlowski said:
    I can make the problem go away by inserting a small delay between starting the USB and setting up the Pend.  This would allow the Post to completely occur before the Pend gets started.  The larger messages would take longer the finish so the small delay will still allow the Pend to be setup before the USB could complete, therefore the Post would happen after the task is in a pending state.

    Doing this just means that the Task isn't being put into a blocked state (as the Semaphore has already been posted). If interrupts are disabled, that will prevent the scheduler from switching to next highest ready Task.

    In ROV, you can check the Clock -> Module tab and see if the ticks are incrementing ~10000 ticks if you let it run for ~10s between halting the core. There is also the Hwi view, which has a "Detailed" tab that can tell you if the interrupt for the Timer driving the system tick is enabled. (It should be enabled)

    If you are using the BIOS.LibType = LibType_Instrumented, you should also be able to look at the Logs and see how your Semaphores are being incremented/decremented.

    Are you calling a Task_disable() without a Task_restore()? Or a Hwi_disable() without a Hwi_restore()?

  • Tom,

    This code is running in our actual product and not on a development board.  It is part of a much larger program.  We have very good tracing capabilities with timing information.

    The majority of time the code works fine.  We have found a particular PC which on occasion causes this issue.  I believe it is a timing issue.  Our USB code is based on the Starterware code.  I just realized the starterware code (version 5)  will lock up in a HWI for extended periods of time (25MS) waiting for a USB transaction to complete.  I will be moving this code out of the interrupt routine shortly.  Hopefully this is part of the issue.

    The clock is still advancing since interrupts are still working and I'm reading the tick count and it's incrementing.  The semphore has also been incremented.  In this interrupt routine I check to see which task is running and it indicates it is the task which should be waiting for the semaphore.

    I believe there has been a task_disable and therefore no other task will operate.

    By the way I tried to replace the semaphore post with a message to another task which would then do a semaphore_post.  This message never reaches the task so therefore I believe we are locked into the currently running task.

    I never call task_disable.  The operating system seems to in Semaphore_pend.

  • I've just looked at the code for the Task_restore function

    /*
     *  ======== Task_restore ========
     */
    Void Task_restore(UInt tskKey)
    {
        if (tskKey == FALSE) {
            Hwi_disable();
            if (Task_module->workFlag
                && (!BIOS_swiEnabled || (BIOS_swiEnabled && Swi_enabled()))) {
                Task_schedule();
            }
            Task_module->locked = FALSE;
            Hwi_enable();
        }
    }

    Shouldn't the Hwi_disable() and Hwi_enable be storing a key at this point.  If interrupts are disable coming into a Task_restore they will not be coming out.

     

  • Bill,

    Sorry for the long wait. I'm waiting to hear back from our local SYS/BIOS kernel expert.

    Is the semaphore issue happening to just this particular semaphore instance or to all semaphores in your system? Was it created statically or dynamically? Have you gotten around to try ROV at all? Can you use instrumented BIOS libraries (or custom libraries with Asserts enabled)?

  • Bill,

    From the description, it does sound like Semaphore_pend() was called with the Task or Swi scheduler already disabled.

    When that happens, the task will NOT block on the Semaphore as it should and the internal Task scheduling state variables become corrupted.

    In 6.35.00 and newer versions of SYS/BIOS we put an Assert in Semaphore_pend() to check for this condition as this issue has come up before.

    A common way for this situation to occur is if your application explicitly calls Task_disable() or Swi_disable() and then calls Semaphore_pend() before calling Task_restore() or Swi_restore().

    A more subtle way for this to occur is if your application uses a GateSwi or GateTask to arbitrate access to a resource. These Gates are wrappers for Task_disable() and Swi_disable(). Calling Semaphore_pend() after GateTask_enter() or GateSwi_enter() is usually fatal.

    Another thought: are you sure that the Semaphore is not being posted faster than the task can process the associated work?

    Alan

  • We are still in testing but I believe I have fixed this issue.  The starterware for USB transactions has a routine dmaTxCompletion which can block the operating system in an interrupt for an extended period of time.  In our case the PC we were using would occasionally take 25 ms to accept the packet of data from our board.  The completion interrupt would fire when the last packet was sent to the FIFO but since the dmaTxCompletion was checking to see if the data was actually read from the FIFO before returning from the interrupt the interrupt routine blocked for 25ms.  It appeared from other posts I found the scheduler of sysbios would have issues when interrupts blocked for over 1 tick time.

    I split this routine to acknowledge the interrupt in the interrupt routine but did the wait for the data removed at the task level and this seems to have fixed the problem.

    I'm not sure what the Linux version of the USB code is like but the starterware version of the USB stack has numerous locations where the code will block for extended periods if the data from the FIFO is not read out in a timely manner.

    I also still think the Task_disable is not restoring HWI correctly but for now this issue seems to be resolved.

    Thanks,

    Bill