Concurrency and concurrency control

Concurrency is good: it means we can do multiple things at once (parallelism, multitasking). But concurrency introduces nondeterminism: the exact order in which things are done (the schedule) is not known in advance. Nondeterminism occurs because the schedule results from the interaction of a system and its scheduling policies with various asynchronous external processes, including physical processes, whose timings are not known or controlled by the system.

It is often the case that some schedules will lead to correct outcomes and some schedules will not. Thus it becomes necessary for programmers to express constraints to prevent the system from selecting or allowing a schedule that yields an incorrect outcome. This is concurrency control. Understanding concurrency is a difficult intellectual challenge. Controlling concurrency is an art and a craft. It is important to understand the problem of concurrency and the tools of the art of concurrent programming.

In earlier days concurrency control was something that operating system kernel designers worried about. To the extent that concurrency was a property of the hardware it was up to the kernel to control it. The process model of early operating systems (single-threaded processes with strong isolation interacting through kernel abstractions such as pipes, files, and process fork/wait) limited concurrent interactions to the kernel.

Today concurrency has leaked into application programs. Rich GUIs must respond to user events while other activities are in progress. Concurrency is inherent to distributed applications, which must process messages arriving asynchronously from peers. In particular, high-throughput servers must process concurrent requests from many clients.

Abstractions for concurrent programming

The problem of concurrency control is fundamental and independent of the various programming models that exist to express or provide concurrency: events, threads, processes. There is a perennial disagreement about which abstractions are easiest for programmers to use in writing correct concurrent programs. For example, the recent interest in the MapReduce abstraction for cluster computing ("cloud computing") is based in part on a belief that it enables even naive programmers to write correct parallel programs that scale to very large clusters. The problem of safe concurrent programming is even more important with the prevalence of multicore processors.

Environments for safe concurrent programming generally create higher-level constructs that minimize direct sharing of data structures in memory by multiple threads. Instead they pass data among threads using messages or data streams. For example, an event-driven program might consist of single-threaded objects that receive streams of events. The systems research community generates a steady stream of papers exploring the relative merits of various models for safe concurrent programming model from the standpoint of ease of use and performance.

However, these higher-level models use threads under the hood. And threads are still the most commonly encountered model for concurrent programming.

The best way to understand concurrency is to study threads and related abstractions (synchronization primitives) for controlling concurrency in multi-threaded programs.

Threads and synchronization primitives

Modern programming languages have support for threads (or other concurrency model) built in. For example, the Java programming language includes support for threads. Older languages like C and C++ can use thread support based in libraries such as Pthreads (POSIX threads). These languages and libraries provide a standard programming interface to threading facilities in the underlying hardware and operating system.

Java and Pthreads both support variants of the classical synchronization abstractions: mutexes (locks) and condition variables. Mutexes and condition variables derive from the notion of monitors introduced by CAR Hoare. There are several variants of semantics for these primitives (e.g., Mesa semantics vs. Hoare semantics for condition variables). Mutexes can also be viewed as a restricted form of semaphore (binary semaphores), which generalize to counting semaphores. Semaphores are the basis for synchronization in the Solaris operating system.

The best introductory materials on multi-threaded programming are the tutorial papers by Andrew Birrell:

It is important to understand the concurrency control abstractions and their variants, and use them to cope with the challenges of data races, thread synchronization, starvation, and deadlock. To gain this understanding it is necessary to study specific examples. Some canonical examples that you should be familiar with are bounded buffer (e.g., pipes or Soda Machine), reader/writer lock, rendezvous or Sleeping Professor, EventPair or ping-pong, barrier, process manager fork/exit/wait, and Dining Philosophers.

Kernel synchronization

You should also understand the architectural basis for these synchronization primitives on multiprocessor systems. Their implementation requires some form of special atomic instructions as a "toehold" for more general synchronization. Examples include test-and-set, compare-and-swap, fetch-and-add, and load-locked-store-conditional.

Multiprocessor kernels may use these atomic instructions directly for fast and simple primitives to synchronize among processors within the kernel (e.g., spinlocks). You should understand how spinlocks are implemented, performance implications of synchronization with spinlocks, the importance of interrupts and interrupt priority levels on kernel synchronization, and the interaction of spinlocks and interrupts.

Spinlocks can be a basis for implementing the higher-level primitives for synchronizing logical processes or threads (mutexes, condition variables, semaphores). The latter are blocking primitives that may transition threads to a blocked (suspended or sleeping) state to prevent the scheduler from running them until some event occurs. Blocking synchronization primitives must interact directly with the scheduling systems, e.g., using primitives such as the classical Unix kernel primitives sleep and wakeup. You should understand the performance implications of spinning vs. blocking synchronization.