Why do pthreads’ condition variable functions require a mutex?

I’m reading up on pthread.h; the condition variable related functions (like pthread_cond_wait(3)) require a mutex as an argument. Why? As far as I can tell, I’m going to be creating a mutex just to use as that argument? What is that mutex supposed to do?

It's just the way that condition variables are (or were originally) implemented.

The mutex is used to protect the condition variable itself. That's why you need it locked before you do a wait.

The wait will "atomically" unlock the mutex, allowing others access to the condition variable (for signalling). Then when the condition variable is signalled or broadcast to, one or more of the threads on the waiting list will be woken up and the mutex will be magically locked again for that thread.

You typically see the following operation with condition variables, illustrating how they work. The following example is a worker thread which is given work via a signal to a condition variable.

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            do the work.
    unlock mutex.
    clean up.
    exit thread.

The work is done within this loop provided that there is some available when the wait returns. When the thread has been flagged to stop doing work (usually by another thread setting the exit condition then kicking the condition variable to wake this thread up), the loop will exit, the mutex will be unlocked and this thread will exit.

The code above is a single-consumer model as the mutex remains locked while the work is being done. For a multi-consumer variation, you can use, as an example:

thread:
    initialise.
    lock mutex.
    while thread not told to stop working:
        wait on condvar using mutex.
        if work is available to be done:
            copy work to thread local storage.
            unlock mutex.
            do the work.
            lock mutex.
    unlock mutex.
    clean up.
    exit thread.

which allows other consumers to receive work while this one is doing work.

The condition variable relieves you of the burden of polling some condition instead allowing another thread to notify you when something needs to happen. Another thread can tell that thread that work is available as follows:

lock mutex.
flag work as available.
signal condition variable.
unlock mutex.

The vast majority of what are often erroneously called spurious wakeups was generally always because multiple threads had been signalled within their pthread_cond_wait call (broadcast), one would return with the mutex, do the work, then re-wait.

Then the second signalled thread could come out when there was no work to be done. So you had to have an extra variable indicating that work should be done (this was inherently mutex-protected with the condvar/mutex pair here - other threads needed to lock the mutex before changing it however).

It was technically possible for a thread to return from a condition wait without being kicked by another process (this is a genuine spurious wakeup) but, in all my many years working on pthreads, both in development/service of the code and as a user of them, I never once received one of these. Maybe that was just because HP had a decent implementation :-)

In any case, the same code that handled the erroneous case also handled genuine spurious wakeups as well since the work-available flag would not be set for those.

A condition variable is quite limited if you could only signal a condition, usually you need to handle some data that's related to to condition that was signalled. Signalling/wakeup have to be done atomically in regards to achieve that without introducing race conditions, or be overly complex

pthreads can also give you , for rather technical reasons, a spurious wakeup . That means you need to check a predicate, so you can be sure the condition actually was signalled - and distinguish that from a spurious wakeup. Checking such a condition in regards to waiting for it need to be guarded - so a condition variable needs a way to atomically wait/wake up while locking/unlocking a mutex guarding that condition.

Consider a simple example where you're notified that some data are produced. Maybe another thread made some data that you want, and set a pointer to that data.

Imagine a producer thread giving some data to another consumer thread through a 'some_data' pointer.

while(1) {
    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    char *data = some_data;
    some_data = NULL;
    handle(data);
}

you'd naturally get a lot of race condition, what if the other thread did some_data = new_data right after you got woken up, but before you did data = some_data

You cannot really create your own mutex to guard this case either .e.g

while(1) {

    pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
    pthread_mutex_lock(&mutex);
    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

Will not work, there's still a chance of a race condition in between waking up and grabbing the mutex. Placing the mutex before the pthread_cond_wait doesn't help you, as you will now hold the mutex while waiting - i.e. the producer will never be able to grab the mutex. (note, in this case you could create a second condition variable to signal the producer that you're done with some_data - though this will become complex, especially so if you want many producers/consumers.)

Thus you need a way to atomically release/grab the mutex when waiting/waking up from the condition. That's what pthread condition variables does, and here's what you'd do:

while(1) {
    pthread_mutex_lock(&mutex);
    while(some_data == NULL) { // predicate to acccount for spurious wakeups,would also 
                               // make it robust if there were several consumers
       pthread_cond_wait(&cond,&mutex); //atomically lock/unlock mutex
    }

    char *data = some_data;
    some_data = NULL;
    pthread_mutex_unlock(&mutex);
    handle(data);
}

(the producer would naturally need to take the same precautions, always guarding 'some_data' with the same mutex, and making sure it doesn't overwrite some_data if some_data is currently != NULL)

POSIX condition variables are stateless. So it is your responsibility to maintain the state. Since the state will be accessed by both threads that wait and threads that tell other threads to stop waiting, it must be protected by a mutex. If you think you can use condition variables without a mutex, then you haven't grasped that condition variables are stateless.

Condition variables are built around a condition. Threads that wait on a condition variable are waiting for some condition. Threads that signal condition variables change that condition. For example, a thread might be waiting for some data to arrive. Some other thread might notice that the data has arrived. "The data has arrived" is the condition.

Here's the classic use of a condition variable, simplified:

while(1)
{
    pthread_mutex_lock(&work_mutex);

    while (work_queue_empty())       // wait for work
       pthread_cond_wait(&work_cv, &work_mutex);

    work = get_work_from_queue();    // get work

    pthread_mutex_unlock(&work_mutex);

    do_work(work);                   // do that work
}

See how the thread is waiting for work. The work is protected by a mutex. The wait releases the mutex so that another thread can give this thread some work. Here's how it would be signalled:

void AssignWork(WorkItem work)
{
    pthread_mutex_lock(&work_mutex);

    add_work_to_queue(work);           // put work item on queue

    pthread_cond_signal(&work_cv);     // wake worker thread

    pthread_mutex_unlock(&work_mutex);
}

Notice that you need the mutex to protect the work queue. Notice that the condition variable itself has no idea whether there's work or not. That is, a condition variable must be associated with a condition, that condition must be maintained by your code, and since it's shared among threads, it must be protected by a mutex.

Not all condition variable functions require a mutex: only the waiting operations do. The signal and broadcast operations do not require a mutex. A condition variable also is not permanently associated with a specific mutex; the external mutex does not protect the condition variable. If a condition variable has internal state, such as a queue of waiting threads, this must be protected by an internal lock inside the condition variable.

The wait operations bring together a condition variable and a mutex, because:

a thread has locked the mutex, evaluated some expression over shared variables and found it to be false, such that it needs to wait.
the thread must atomically move from owning the mutex, to waiting on the condition.

For this reason, the wait operation takes as arguments both the mutex and condition: so that it can manage the atomic transfer of a thread from owning the mutex to waiting, so that the thread does not fall victim to the lost wake up race condition.

A lost wakeup race condition will occur if a thread gives up a mutex, and then waits on a stateless synchronization object, but in a way which is not atomic: there exists a window of time when the thread no longer has the lock, and has not yet begun waiting on the object. During this window, another thread can come in, make the awaited condition true, signal the stateless synchronization and then disappear. The stateless object doesn't remember that it was signaled (it is stateless). So then the original thread goes to sleep on the stateless synchronization object, and does not wake up, even though the condition it needs has already become true: lost wakeup.

The condition variable wait functions avoid the lost wake up by making sure that the calling thread is registered to reliably catch the wakeup before it gives up the mutex. This would be impossible if the condition variable wait function did not take the mutex as an argument.

Why do pthreads’ condition variable functions require a mutex?

Related

Recent Posts