This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CC3220MODA: sendto() seems to block when station loses power

Part Number: CC3220MODA

Hi,

We have a CC3220 based AP connected to a CC3220 based station. The sendto() function is used on the AP to send UDP data to the station.

We have noticed that, on rare occasions, that when the station's power is removed the sendto() function does not return, it blocks. This only happens of power removal not power on.

Is this possible or is there something else going on?

Regards,

Tunstall_User

  • Hi,

    Do you see the sendTo blocked forever, or just temporarily?

    Because the AP tries to send a packet to the station (that haven't disconnected gracefully) the is unavailable, it will keep re-transmitting until it will decide that the connection. So there would be some delay (due to the L2/wifi connection and not due to the L4/UDP).

    Br,

    Kobi 

     

  • Kobi,

    It blocked 'forever' though we have only waited around 5 minutes. I assume 5 minutes would be considered as 'forever' from a software point of view.

    It is more noticeable in our application as after the sendTo() call a semaphore is being released.

    When it blocks, we can see in the CCS8 debugger that the send UDP task is stuck at the sendto() call line and the other task is blocked on the semaphore.

    Funnily enough, the other task that is blocked on the semaphore is the sl_task which is stuck at the start of the disconnect code as it tries to grab the semephore, so the DISCONNECT event has been sent by the NWP to the cpu.

    Regards,

  • This sounds like a bug. 

    I'll check this and report back once I find anything.

    Thanks,

    Kobi  

  • Kobi,

    The AP sendto() command is surrounded by a semaphore, which it releases when the sendto() returns.

    If the station dies and the DISCONNECT event comes in, the sl_task code processing the event starts by trying to grab the semaphore.

    Now, if this event comes in while the AP is in the middle of the sendto() call the attempt by the sl_task to grab the semaphore is held up on a sem_wait() until the sendto() completes.

    In the occasions when we do see the blocking, the sendto() task is blocked on the sendto() and the sl_task is blocked on the sem_wait() call. The sendto() never seems to complete thus the deadlock continues.

    One way to explain what we are seeing is if the NWP event code is single threaded. Thus it calls the SimpleLinkWlanEventHandler() in the user application and waits for the function to complete. While it is waiting it cannot deal with any user application internal events eg sendto() completing thus no sendto() processing so no semaphore release and a locked sl_task.

    [Interesting, when the blocking does occur the recvfrom() task also blocks on the recvfrom() even though the socket has a 1 sec timeout.]

    To overcome this I tried moving the sendto() out of the semaphore block to see if it would fix the problem. So far it has.

    So does the NWP event processing loop pause until user application posted events return?

    Regards,

    tunstall_user

  • Hi,

    "To overcome this I tried moving the sendto() out of the semaphore block to see if it would fix the problem. "

    Are you referring to a semaphore in your application code or within sl_SendTo driver?

    Have you tried to use a non-blocking socket?

    Till now, I couldn't reproduce this (sl_SendTo always returns and the disconnect event comes later).

    Does it happens for you constantly? 

    Br,

    Kobi

  • Kobi,

    The semaphore is in my application code, it protects a data structure that holds station info eg ip address. The CONNECT/DISCONNECT station events also update this structure when they come in via the sl_task. Hence the semaphore.

    The sockets were initially blocking so I assumed that making them non-blocking would solve the problem. But as I mentioned above, both the 1 sec timeout on the recvfrom() socket and making the sendto() socket non-blocking did not solve the problem. The sl_task got stuck on the sem_wait() and the recvfrom/sendto blocked. The only explanation I could think of to explain this behavior was the single threaded event processing on the NWP side.

    And I'm not surprised you cannot reproduced it. I never noticed it until one of the testers had a non-responsive AP and called me over.

    In our application we have a station and an AP, both continuously sending and receiving UDP packets to each other. The AP receives audio from a hub and sends it to the station and the station sends mic audio back to the AP and the AP forwards it to the hub.

    Both units use a 3104 codec and send/receive 20ms audio packets. So for the AP, it waits for the codec to fill a 20ms buffer and then sends the audio packet to the station and then repeats. The send takes approx 1.5 ms. Initially the semaphore surrounded the sendto() so the semaphore was locked for approx 1.5ms.

    The problem only occurs if the DISCONNECT event comes in during this 1.5ms, if later or earlier everything is fine. When I try to reproduce it I catch it only 1 in 20 station power cycles. So it is tricky.

    By the way I use BSD calls not the texas sl_xxx, I know the BSD calls map to sl_xx calls but I mention it 'just in case'.

    I have added the CCS8 callstack when the tasks blocked, step 4 in the sl_Task list is where the sem_wait() block occurs in my code. And step 6 in the lower callstack is the blocked sendto(), the semaphore was acquired just before the sendto call.

    Regards

  • The NWP is multi-threaded, but on the host MCU we have only one task ( which is the sl_Task) that handles the NWP events.

    If you block (on semaphore) within the SimpleLinkWlanEventHandler, the sl_Task will not be able to process the next event (which will probably be the Response Event to the sl_SentTo) so the sl_SendTo (which waits for the Response Event) will not complete...

    The solution is to remove the critical section protection (i.e. semaphore use) from the SimpleLinkWlanEventHandler and instead use this callback to trigger (using mailbox or by posting a semaphore) another thread that will handle the disconnection event.