Summary: An Introduction to Programming with Threads

An Introduction to Programming with Threads

by Andrew D. Birrell
summarized by Adam M. Costello

[Note: The paper is presented in terms of the particular thread facility used in DEC SRC. The basic design has become conventional, so this summary uses a more neutral presentation, but retains the SRC terminology.]

Primary Concepts

A thread is a sequential flow of control. Each thread has its own program counter and call stack, but multiple threads may share an address space containing global variables, which can be used for communication among the threads.

Threads are one way (not the only way) to achieve concurrency, which is useful for:

taking advantage of multiple processors
getting work done while waiting for slow devices
responding to multiple user inputs
providing network services
deferring “housekeeping” activities (like balancing trees) until after a function has returned

One thread can create another via a fork operation. Some thread facilities also provide a join operation to wait for a thread to terminate.

To avoid incorrectness caused by multiple threads accessing the same data structures at the same time, programmers can use monitors, invented by Tony Hoare. Monitors are based on mutexes and condition variables.

A mutex is an object that can be held by at most one thread at a time. A thread attempting to lock (acquire) a mutex that is already held will block until the thread scheduler gives the mutex to the blocked thread.

A condition variable is like a mutex in that threads can block on them, but condition variables are not held. The wait(m,c) operation atomically unlocks a mutex and blocks on a condition variable. The signal(c) operation chooses one or more threads currently blocked on a particular condition variable, and unblocks them. The broadcast(c) operation is the same except that it unblocks all the threads waiting on the condition variable. When a waiting thread is signaled, the wait operation does not return until it has relocked the mutex (possibly blocking on the mutex).

A monitor consists of some shared data, a mutex, and zero or more condition variables. Each condition variable is always used with the same mutex, but I don't know whether that is a programming convention or an actual requirement.

A monitor is typically used as follows. A thread desiring access to a “resource” locks the associated mutex and examines the associated data. If the resource is “available”, the thread performs its task, then unlocks the mutex. If the resource was unavailable, the thread unlocks the mutex and blocks on a condition variable (via wait). When another thread alters the data in a way that might render the resource available, it reawakens the first thread (via signal or broadcast), which relocks the mutex and rechecks the shared data.

  lock(m)
    while not available(data)
      wait(m,c)
    endwhile
    access(data)
  unlock(m)

It is important to use while rather than if, because other threads may lock the mutex between the signal and the return from the wait, and because it makes the code locally verifiable.

Some thread facilities provide an alert operation to allow one thread to request than another terminate early.

Concurrency introduces new challenges in terms of safety (correctness), liveness (progress), and performance.

To insure safety, all shared mutable data must be associated with a mutex, and all threads must hold the mutex while reading or writing the data. For complex data structures, it is helpful to associate an invariant (a boolean function of the data structure) with each mutex. The truth of the invariant is the only assumption a thread may make about the data when it locks the mutex, and the thread is responsible for restoring the truth of the invariant before it unlocks the mutex (including waits).

To insure liveness, deadlock must be avoided. A deadlock is a state in which all threads are blocked, so none will ever proceed. This can happen when two threads each hold a resource, and each thread is trying to acquire the other resource. The easiest way to avoid this situation is to order the resources, and require all threads to acquire resources in order. Sometimes it is easier to impose this order if the resources are divided up at a finer granularity.

The other threat to liveness is starvation, a situation in which a thread waits forever for a resource. Avoiding starvation depends on the particular scheduling policies used by the thread facility, and on the policies used by the programming in deciding when to wait and signal.

Secondary Concepts

There are many potential threats to performance:

lock conflicts: Very similar to head-of-line blocking: A slow thread impedes the progress of other threads by holding needed resources. When the slow thread has lower priority than the blocked thread, it is called priority inversion.
spurious lock conflicts: A thread signals another while still holding the mutex that the signaled thread will immediately want. [Note: I think this could be dealt with by the runtime system, and therefore may not be a concern these days.]
spurious wake-ups: Some threads get signaled and immediately discover that they cannot make progress, so they wait again.
fairness: Which threads are granted access to resources, and how long do threads need to wait for them?

These threats can all be dealt with, at the cost of increasing complexity, which increases the risk of incorrectness, so the programmer should use restraint.

Tertiary Concepts

There are some common techniques for using threads:

pipelining: If a task to be performed on many objects can be divided into a sequence of equally expensive subtasks arranged in a pipeline (with a queue of objects between each pair of subtasks), each subtask can be assigned to one thread, and run on a separate processor.
up-calls: In situations featuring asynchronous events occurring at the lowest of multiple layers of abstraction (like network protocol stacks), it is inefficient to use a thread per layer, because too many context switches are required to process each input (packet). Another approach is to allow lower layers to call up into higher layers, so that each input is processed by one thread. This is generally much faster, but makes deadlock avoidance much more difficult.
version stamps: It can be difficult for a lower layer to invalidate data cached at a higher layer. One solution is to associate a version number with each set of data. Whenever a higher uses cached data to call a lower layer, it passes the version number. The lower layer can return a special out-of-date value if the version number does not match.
work crews: When there are so many tasks that could be done in parallel that forking a separate thread for each would be too expensive, a fixed-size pool of threads can be used. The tasks are enqueued, and whenever a thread finishes one task, it dequeues another.

Prepared by Adam M. Costello
Last modified: 1998-May-21-Thu 23:05:27 GMT