Tool/software: Linux
I have some code that runs on other ARMv7 and ARMv5 systems but is running into pthread_cond_timedwait() hangs on the AM5728. I provide some relevant background, sample code, GDB output, and strace output below.
Background:
- Processor SDK 03.03.00.04
- Linux 4.4.41 within the Processor SDK
- Compiler/libc from ELDK 5.8
Code Snippet (open source POCO C++ Library):
The hangup is at the pthread_cond_timedwait() call:
bool EventImpl::waitImpl(long milliseconds)
{
int rc = 0;
struct timespec abstime;
struct timeval tv;
gettimeofday(&tv, NULL);
abstime.tv_sec = tv.tv_sec + milliseconds / 1000;
abstime.tv_nsec = tv.tv_usec*1000 + (milliseconds % 1000)*1000000;
if (abstime.tv_nsec >= 1000000000)
{
abstime.tv_nsec -= 1000000000;
abstime.tv_sec++;
}
if (pthread_mutex_lock(&_mutex) != 0)
throw SystemException("wait for event failed (lock)");
while (!_state)
{
if ((rc = pthread_cond_timedwait(&_cond, &_mutex, &abstime)))
{
if (rc == ETIMEDOUT) break;
pthread_mutex_unlock(&_mutex);
throw SystemException("cannot wait for event");
}
}
if (rc == 0 && _auto) _state = false;
pthread_mutex_unlock(&_mutex);
return rc == 0;
}
GDB Snippet:
From the snippet below you can deduce:
- The mutex was successfully obtained (else we wouldn't get to the pthread_cond_timedwait() call) by the sample code above and then passed to pthread_cond_timedwait()
- As documented (man 3 pthread_cond_timedwait), pthread_cond_timedwait() then atomically released the mutex (you can see __lock is 0) and caused the calling thread to block on the condition variable
(gdb) bt
#0 0xb5f56398 in __pthread_cond_timedwait (cond=<optimized out>, mutex=<optimized out>, abstime=0xb442ecec) at pthread_cond_timedwait.c:198
#1 0xb6543294 in Poco::EventImpl::waitImpl (this=0x798f2e8, milliseconds=998) at src/Event_POSIX.cpp:105
#2 0x01654088 in Poco::Event::tryWait (this=0x798f2e8, milliseconds=998) at open_source/poco/poco-2015.1-cscb/include/Poco/Event.h:112
(gdb) frame 2
#2 0x01654088 in Poco::Event::tryWait (this=0x798f2e8, milliseconds=998) at open_source/poco/poco-2015-cscb/include/Poco/Event.h:112
112 return waitImpl(milliseconds);
(gdb) print *this
$47 = {
<Poco::EventImpl> = {
_auto = true,
_state = false,
_mutex = {
__data = {
__lock = 0,
__count = 0,
__owner = 0,
__kind = 0,
__nusers = 1,
{
__spins = 0,
__list = {
__next = 0x0
}
}
},
__size = '\000' <repeats 16 times>, "\001\000\000\000\000\000\000",
__align = 0
},
_cond = {
__data = {
__lock = 0,
__futex = 29,
__total_seq = 15,
__wakeup_seq = 14,
__woken_seq = 14,
__mutex = 0x798f2ec,
__nwaiters = 2,
__broadcast_seq = 9
},
__size = "\000\000\000\000\035\000\000\000\017\000\000\000\000\000\000\000\016\000\000\000\000\000\000\000\016\000\000\000\000\000\000\000\354\362\23a\002\000\000\000\t\000\000\000\000\000\000",
__align = 124554051584
}
}, <No data fields>}
strace Snippet:
Thread 1033 below is the same thread as the GDB backtrace above. After the futex system call there is no more activity from thread 1033. You can see the futex value of 29 matches that of the pthread condition variable contained within the Poco::Event above.
1033 clock_gettime(CLOCK_MONOTONIC, {7567, 448404485}) = 0
1033 gettimeofday({1493658053, 414445}, NULL) = 0
1033 futex(0x798f30c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 29, {1493658054, 412445000}, ffffffff <unfinished ...>
The custom application that is executing this code doesn't get hung up in pthread_cond_timedwait() every time it is run - some runs will go 30+ minutes before the application is killed. On the other hand, sometimes the application won't run more than 5-10 seconds before getting hung up in pthread_cond_timedwait().
Any pointers/debug tips would be greatly appreciated.
Thanks