# Quarantining Weakness (Extended Abstract) Compositional Reasoning Under Relaxed Memory Models\*

Radha Jagadeesan<sup>1</sup>, Gustavo Petri<sup>2</sup>, Corin Pitcher<sup>1</sup>, and James Riely<sup>1</sup>

<sup>1</sup> DePaul University
 <sup>2</sup> Purdue University

## 1 Introduction

In sequential computing, every method of an object can be described in isolation via preconditions and postconditions. However, reasoning in a concurrent setting requires a characterization of all possible interactions across method invocations. Herlihy and Wing [1990]'s notion of linearizability simplifies such reasoning by intuitively ensuring that each method invocation "takes effect" between its invocation and response events.

This approach had two basic shortcomings. Firstly, in Herlihy and Wing's definition of linearizability, the interfaces are not expressive enough to codify external calls emanating from the component. Thus, objects are closed and passive.

Secondly, the definitions are for a memory model with a global total order on memory operations, thus satisfying *sequential consistency* (SC). SC is not realized by all architectures or runtime systems [Adve and Gharachorloo 1996; Adve and Boehm 2010], motivating models of relaxed memory in hardware, such as TSO [Sewell et al. 2010], PSO [SPARC, Inc. 1994], Power [Sarkar et al. 2011], and runtime systems, such as Java [Manson et al. 2005; Sevcík 2008] and C++ [Boehm and Adve 2008; Batty et al. 2011]. This has motivated recent definitions of linearizability specific to the TSO [Burckhardt et al. 2012; Gotsman et al. 2012] and C11 [Batty et al. 2013] memory models.

We propose new definitions to address both of these limitations. Our methodology aims to keep the interfaces free of the intricacies of particular relaxed memory models. Our approach has the following characteristics.

(1) We model calls to component functions process-algebraically. This allows us to treat callbacks and to give a symmetric definition of composition between clients and libraries. Thus, our definitions encompass active components (that can evolve autonomously even without input from the environment) and open components (that invoke methods on components provided by the environment) and environment assumptions (pre/postconditions and the permitted sequences of method calls to a component).

(2) Our definitions are not specific to a particular memory model. Rather, we identify the criteria that a relaxed memory model needs to satisfy in order to fit into our framework: the examples that satisfy our criteria include SC, TSO, PSO and a variant of the Java Memory Model (JMM).

We establish an abstraction theorem: a component can safely be substituted for its interface in a non-interfering program. Moreover, for special classes of programs, we simplify the reasoning further by quarantining the effects of relaxed memory, allowing

<sup>\*</sup> Research supported by NSF 0916741.

programmers to program to sequential interfaces, even when the code has data races. Recall the definition of data race free (DRF) models: Informally, a program is DRF if no SC execution of the program leads to a state in which a write happens concurrently with another operation on the same location. A *DRF model* requires that the programmer's view of relaxed computation coincides with SC computations for programs that are DRF. TSO, PSO and the JMM are all DRF models. We establish the following.

(1) If a stateful component is DRF and the underlying memory model satisfies the DRF requirement, our notion of linearizability usually coincides with that of Herlihy and Wing, so classical techniques to verify linearizability can be used directly. Thus, in many cases, our definitions permit the use of standard proof techniques.

(2) If a client is DRF, and the underlying memory model satisfies the DRF requirement, the client can ignore all memory model subtleties when using a library that is linearizable as per our definitions, even if the library itself is *racy*. More precisely, it is sound for the client to reason solely with the sequential interface of the component, as in [Herlihy and Wing 1990].

*Rest of the paper.* In Section 2, we describe background material on linearizability in order to clarify the difficulties caused by relaxed memory. We discuss related work in Section 3 and develop our semantic framework in Sections 4–6. We define linearizability in Section 7 and provide several examples. In Section 8, we turn to techniques for establishing linearizability under relaxed memory using techniques developed for sequential consistency. In Section 9–10, we establish the basic properties of linearizability. Many definitions and all proofs are elided in this extended abstract.

## 2 Background: linearizability

To illustrate the issues that arise when reasoning compositionally, we describe the specification and implementation of a lock and a one-place buffer implemented using the lock.

Specifying the lock. To begin, we give the specification of a lock using an regular expression. We use regular expressions informally; the actual specifications are sets of traces. Let s and t be thread identifiers. Because we are interested in overlapping executions, we separate call and return into separate actions:  $\langle s?call f u \rangle$  represents a call by s to function f with argument u, and  $\langle s!ret f v \rangle$  represents the corresponding return with result v. (The ? and ! indicate that these are calls *in* to the lock and returns *out*; we shall see the symmetric case shortly.)

 $( \langle s?call rl \rangle \langle s!ret rl \rangle)^+ \langle t?call aq \rangle \langle t!ret aq \rangle)^*$ 

According to the specification, the lock is initially in its "acquired" state. Only after one or more calls to the "release" method rl, can the lock be "reacquired" using aq. This regular expression is not meant to refer to specific concrete thread names s and t. Rather, it is meant to convey the idea that calls and returns have matching thread names.

Let  $\Psi_{lock}$  be the prefix-closed set of traces that satisfy this regular expression. This is a "sequential" specification of the lock, in that no two function calls overlap.

We now turn to implementation of the lock. Here we use an *atomic* variable, which we define to be similar to volatile variables in Java, with an additional compare-and-set

(cas): w.cas(u,v) returns false if  $w \neq u$ , otherwise it returns true and sets w to v.

Initially, calls to aq will spin, only returning after another thread calls rl. In the vocabulary of [Lamport 1979], a call to rl *happens-before* the return from aq. The happensbefore relation allows a partial order to be recovered from the total order prescribed by a trace: actions of a single thread are ordered sequentially, but actions of different threads are unordered. Inter-thread order requires *synchronization*, which is we implement using atomic variables, such as w.

Every write to an atomic variable happens-before every subsequent read of the same atomic. An unsuccessful cas acts like a read, whereas a successful cas acts like both a read and a write. In traces, atomics produce three types of action: writes produce  $\langle s \text{ rel } w \rangle$  actions, reads produce  $\langle s \text{ acq } w \rangle$  actions, and successful cas produce  $\langle s \text{ cas } w \rangle$  actions; unsuccessful cas produce nothing. The happens-before relation orders every  $\langle \text{rel } w \rangle$  and  $\langle \text{cas } w \rangle$  with every subsequent  $\langle \text{acq } w \rangle$  and  $\langle \text{cas } w \rangle$ . These relations are based on the identity, w, of the atomic.

Let  $\Phi_{\text{lock}}$  be the set of implementation traces generated by the implementation code above. These include traces of the form

 $(\langle s?call rl \rangle \langle s rel w \rangle \langle s!ret rl \rangle)^+ \langle t?call aq \rangle \langle t cas w \rangle \langle t!ret aq \rangle)^*.$ 

(This regular expression is not exhaustive, since the implementation also generates overlapping function calls; however, it is sufficient for the discussion at hand.)

Herlihy and Wing [1990] propose linearizability as a way to relate the implementation of a concurrent component to its specification. An implementation is *linearizable* if for every trace of the implementation, there exists a trace in the specification such that (1) each thread makes the same method invocations in the same order, and (2) the order of non-overlapping invocations is preserved. We write  $\Phi_{lock} \models \Psi_{lock}$  to indicate that  $\Phi_{lock}$  is a valid implementation of  $\Psi_{lock}$  in this sense.

*Specifying the buffer.* We now give the specification and implementation of a one-place buffer using Lock. The buffer's sequential specification can be given as follows.

 $(\langle s?call put v \rangle \langle s!ret put \rangle \langle t?call get \rangle \langle t!ret get v \rangle)^*$ 

As before, let  $\Psi_{buf}$  be the prefix-closed set of traces that satisfy this regular expression. The implementation of the one place buffer uses two locks. We use subscripts to distinguish them. One of the locks has interface  $acq_{empty}/rel_{empty}$  (initially "released", with w==0) and the other has interface  $acq_{full}/rel_{full}$  (initially "acquired" with w==1). Thus, the buffer is initially empty. (Note that two "instances of a class" are represented here as two separate components.)

Let  $\Phi_{buf}$  be the set of traces derived from this implementation, including traces such as

 $(\langle s?call put v \rangle$ 

$$\begin{split} &\langle s! call \; acq_{empty} \rangle \langle s? ret \; acq_{empty} \rangle \langle s \; wr \; x \; v \rangle \langle s! call \; rel_{full} \rangle \langle s? ret \; rel_{full} \rangle \\ &\langle s! ret \; put \rangle \langle t? call \; get \rangle \\ &\langle t! call \; acq_{full} \rangle \langle t? ret \; acq_{full} \rangle \langle t \; rd \; x \; v \rangle \langle t! call \; rel_{empty} \rangle \langle t? ret \; rel_{empty} \rangle \end{split}$$

 $\langle t! ret put v \rangle \rangle^*$ .

This trace contains actions of the form  $\langle s! call f u \rangle$  which represent a call out to another component; likewise,  $\langle s?ret f v \rangle$  represents the corresponding return. In this case, the implementation is using services provided by other components.

We would like to be able to verify the correctness of Buffer using the sequential specification of Lock. That is, conclude  $\Phi_{buf} \otimes \Phi_{lock} \models \Psi_{buf}$  from  $\Phi_{buf} \otimes \Psi_{lock} \models \Psi_{buf}$ , where  $\otimes$  is a suitable notion of composition. Herlihy and Wing validate this approach under SC semantics. Burckhardt, Gotsman, Musuvathi, and Yang [2012] show that Herlihy and Wing's results fail for relaxed memory models and adapt them to TSO. Here we provide a different solution to that problem.

Traditional linearizability fails here, because it is impossible to establish the premise  $\Phi_{buf} \otimes \Psi_{lock} \models \Psi_{buf}$ . To see why, observe that any reasonable definition  $\Phi_{buf} \otimes \Psi_{lock}$  admits the following trace under relaxed memory. (For brevity, the calls to the locks are shown as elipses.)

$$\langle s?call put 1 \rangle \cdots \langle s wr \times 1 \rangle \cdots \langle s!ret put \rangle \langle t?call get \rangle \cdots \langle t rd \times 1 \rangle \cdots \langle t!ret get 1 \rangle$$

$$\langle r?call put 2 \rangle \cdots \langle r wr \times 2 \rangle \cdots \langle r!ret put \rangle \langle t?call get \rangle \cdots \langle t rd \times 1 \rangle \cdots \langle t!ret get 1 \rangle$$

$$(†)$$

The final call to get returns a stale value. The race on variable x is not resolved, and thus the earlier write on x remains visible.

Of course, if one looks at the specification of Lock, the problem is immediately apparent: it's too weak! In relaxed models, data structures have memory effects which are not captured by their functional interface. Indeed, the documentation in APIs such as java.util.concurrent [Sun Microsystems 2004] pays significant attention to exactly this fact. These APIs detail the happens-before behavior of the methods using happensbefore edges that go from the beginning of one method activation to the end of another (or a set of others); that is, from call to return.

We allow happens-before to be captured in specifications by introducing names, a, on actions. Each  $\langle ?call \rangle$  gets a unique name, and each  $\langle !ret \rangle$  gets a set of names. The interpretation is that  $\langle s?call f \vec{u} a \rangle$  happens-before  $\langle t!ret f \vec{v} A \rangle$  if  $a \in A$ .

With this addition, Lock can be specified as follows

 $(\langle r: call r \rangle \langle r: ret r \rangle)^* \langle s: call r \rangle \langle s: ret r \rangle \langle t: call aq \rangle \langle t: ret aq \{a\} \rangle)^*$ 

This specification is now strong enough to deduce happens-before edges from each put to get that it enables, and vice versa. Thus, in trace (†) above, the write to x in the first put is no longer visible to the second get. More generally, we are able to establish  $\Phi_{buf} \otimes \Psi_{lock} \models \Psi_{buf}$ .

## 3 Related work

We discuss the most closely related papers here, referring to others in context. Herlihy and Wing [1990] defined linearizability. From a client perspective, the set of linearizations of a linearizable object is an operational refinement of the object [Filipovic, O'Hearn, Rinetzky, and Yang 2010], i.e. the client is unable to distinguish the implementation from the specification. Thus, a client of a linearizable object can take an atomic view of method invocations. The verification method for object linearizability relies on finding linearization points for methods. For each function call, the linearization point is the moment at which the function appears to execute atomically. Composition of non-interfering objects preserves linearizability. Gotsman and Yang [2012] mitigate the stricture of interference-freedom in this framework using ownership ideas.

The papers cited above make a sharp distinction between clients and libraries; clients are permitted to make method invocations and libraries accept method invocations. Thus, they are unable to describe the interface of open components such as a thread pool that relies on an external bounded buffer library. In contrast, our enhanced notion of interfaces is able to describe such components. In terms of implementations, our library can both make and receive method invocations in external interactions, in addition to also being able to invoke internal library methods. Indeed, we stop short of adding full objects, as suggested by Filipovic, O'Hearn, Rinetzky, and Yang [2010], only to avoid cluttering the presentation with heavy syntactic machinery.

The definition of linearizability relies on an SC view of shared memory. Batty, Dodds, and Gotsman [2013] address linearizability in the context of the C/C++ memory models. When specialized to SC, their definition of linearizability is stricter than that of Herlihy and Wing. In contrast, when specialized to SC, our definitions are *not* stricter.

In TSO, an update to a variable might be buffered and may not be seen by a reader in a different thread until the update is committed to the main memory. Burckhardt et al. [2012] address linearizability for the TSO memory model. In contrast both to Herlihy and Wing and to our definitions, their paper incorporates two extra actions for each method invocation in the sequential specification of an object: one to record when buffer updates made by the client are seen by the library, and the other to record when the updates made by library are committed to main memory. In our work we maintain the atomicity of methods of Herlihy and Wing by only associating call and return actions with each method invocation.

More generally, our methodology keeps the interface of a component free of the intricacies of the particular relaxed memory model under consideration. In this paper, we are thus able to address SC, TSO, PSO and a JMM variant. In particular, our analysis of TSO is subtle enough to address all the examples of Burckhardt et al. [2012], even though, from a purely formal TSO perspective, there is clearly greater expressiveness in their definition. Consequently, any data race free client can work precisely against a SC interface in our setting, whereas Gotsman, Musuvathi, and Yang [2012] explore the conditions on compilation necessary to validate the use of SC interfaces under TSO.

## 4 Traces

The semantics of a component is given by a set of *traces*, defined below. We build the syntax from the following disjoint sets. Let  $u, v \in \mathbb{Z}$  range over values,  $a, b \in Act$  over action names,  $A, B \subseteq Act$  over sets of action names,  $f, g \in Fun$  over function names,  $F \subseteq Fun$  over sets of function names,  $s, t \in Thrd$  over thread names (including the reserved thread names "tinit" and "tcom") and  $S, T \subseteq Thrd$  over sets thread names. Let  $\eta \in Fun \uplus Thrd$  range over names, which include both function and thread names, and H, G over sets of names.

Traces are strings of *actions*. These are divided into *communication actions*, described below, and *memory actions*, described in Section 5. For now, let *Mem* be the set of all memory actions.

 $\alpha, \gamma ::= \langle s! \mathsf{call} f \vec{u} a A \rangle | \langle s? \mathsf{call} f \vec{u} a A \rangle | \langle s. \mathsf{call} f \vec{u} a A \rangle$  $| \langle s? \mathsf{ret} f \vec{u} a A \rangle | \langle s! \mathsf{ret} f \vec{u} a A \rangle | \langle s. \mathsf{ret} f \vec{u} a A \rangle | \cdots$ 

Communication actions include seven components, discussed below: thread identifier *s*, polarity in  $\{!, ?, .\}$ , action type in  $\{call, ret\}$ , function name *f*, vector of arguments or return values  $\vec{u}$ , definition *a*, and use set *A*.

We typically elide the uninteresting parts of an action; missing parts are existentially quantified. For example, we write  $\langle ! \text{call } f \vec{u} a A \rangle$  to abbreviate  $(\exists s) \langle s ! \text{call } f \vec{u} a A \rangle$ , and similarly for other abbreviations such as  $\langle s ! \text{call } f \rangle$ ,  $\langle s ! f \rangle$  and  $\langle ! \rangle$ .

The thread identifier identifies the thread that performed the action.

As in Jeffrey and Rathke [2005], call and return actions include a *polarity*. Actions containing a "?" are *input*; those containing "!" are *output*; actions containing "." are *internal*, as are memory actions. Input actions are offered by *quiescent* threads, whereas all others are initiated by *active* threads. Two actions are *complementary* if one is an input, the other an output and they are identical when action names and "?" and "!" are ignored. If  $\alpha \in \{\langle ! \rangle, \langle ? \rangle\}$ , we say  $\alpha$  *is I/O*.

Actions  $\langle ! \operatorname{call} f \rangle$  and  $\langle :\operatorname{ret} f \rangle$  occur in the traces of components that do *not* define f; whereas  $\langle :\operatorname{call} f \rangle$ ,  $\langle :\operatorname{ret} f \rangle$ ,  $\langle :\operatorname{call} f \rangle$  and  $\langle :\operatorname{ret} f \rangle$  occur those that do. Action  $\langle :\operatorname{call} \rangle$  represents a call from outside the component, whereas  $\langle :\operatorname{call} \rangle$  represents a call from the component to itself. Thus, input and output actions cause a shift across the boundary of the component for that thread, whereas the internal actions do not.

Call actions include the vector of actual parameters. Return actions include a vector of return values. Several examples require multiple return values. An obvious generalization would be to support first-class tuples, but this would complicate the presentation.

The action names decorating actions are used to specify ordering properties (Section 5). Each action *defines* a unique action name *a*. For the purposes of defining traces and trace composition, these names are mere decorations: we identify traces up to the renaming of action names. In  $\langle ?A \rangle$ , the set *A* contains names defined by "!" actions and represents an order relied upon by the component. In  $\langle !A \rangle$ , the set *A* contains names defined by "?" actions and represents an order guaranteed the component. In  $\langle .A \rangle$ , the set *A* contains names defined by "?" actions and represents an order guaranteed the component. In  $\langle .A \rangle$ , the set *A* contains names defined by "?" actions and represent the interaction of two components, one which relies upon *A* and one which guarantees it. In operationally generated traces, *A* is empty for any  $\langle !A \rangle$  or  $\langle .A \rangle$ ; these sets or nonempty when working with specification interfaces.

*Definition 4.1 (Trace).* For any given thread, define a *single-threaded balanced trace* to be one generated by the following grammar.

| $B ::= A \mid Q$                                                                                                                | (Single-threaded balanced trace) |
|---------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| $\mathbf{A} ::= \langle .call f \rangle \mathbf{A} \langle .ret f \rangle \mid \mathbf{A} \mathbf{A} \mid \mathbf{a}$           | e (Active trace)                 |
| $ \left< !call  f \right> \mathrm{Q} \left< ?ret  f \right> \; \; \mathrm{M}$                                                   |                                  |
| $\mathbf{Q}  ::= \langle ?call  f \rangle  \mathbf{A}  \langle !ret  f \rangle  \mid  \mathbf{Q}  \mathbf{Q}  \mid  \mathbf{a}$ | e (Quiescent trace)              |
| $M \in Mem$                                                                                                                     | (Memory action)                  |

(We elide uninteresting metavariables within actions. Because they are single-threaded, all actions have the same thread name.)

A *balanced trace* is any interleaving of single-threaded balanced traces with distinct thread names. A *trace* is a trace of actions that is well-formed and is also a prefix of a balanced trace. Let  $\sigma$ ,  $\rho$ ,  $\pi$  range over traces.

We give an inductive characterization of traces in the full version of this paper.

We expose, and nest, calls and returns as with VPLs [Alur and Madhusudan 2009]. As seen from the grammar, prefixes of single-threaded balanced trace are divided into two *polarities*: quiescent and active. By convention,  $\varepsilon$  is quiescent. For all other traces, the polarity is determined by the first action of the trace: if it is  $\langle ?call \rangle$ , then the trace is quiescent; otherwise the trace is active.

Traces have three forms of bracketing, indexed by thread: call/return, input/output and output/input. (Internal actions provide no interesting bracketing other than call/ return.) In the trace  $\langle s? \text{call } f \rangle \langle s! \text{call } g \rangle \langle s? \text{ret } g \rangle \langle s! \text{ret } f \rangle$ , the call/return matches are  $\langle s? \text{call } f \rangle / \langle s! \text{ret } f \rangle$  and  $\langle s! \text{call } g \rangle / \langle s? \text{ret } g \rangle$ ; the input/output matches are  $\langle s? \text{call } f \rangle / \langle s! \text{ret } g \rangle$ .

Here are some further examples:  $\langle s! \rangle \langle s? \rangle$  is a trace, but  $\langle s! \rangle \langle s! \rangle$  is not.  $\langle s! \rangle \langle t? \rangle \langle s? \rangle$  is a trace, but  $\langle s! \rangle \langle s? \rangle \langle s? \rangle$  is not.  $\langle s? \rangle \langle s. \rangle$  is a trace, but  $\langle s! \rangle \langle s. \rangle$  is not.

Definition 4.2. Define the function *thrd* to return the thread name occurring inside an action and *thrds* to return the set of threads in a sequence of actions. Similarly, define the partial functions *fun* and *funs* to return the function name. For example, if  $\alpha = \langle s! \text{ call } f \ \vec{u} \ a A \rangle$ , then *thrd*( $\alpha$ ) = *s* and *fun*( $\alpha$ ) = *f*.

Given a trace  $\sigma$ , define the *thread projection*  $\sigma|_s$  of that trace, which includes only the actions attributed to thread *s*; this is always a prefix of a single-threaded balanced trace. Define the following functions over traces.

*intern*(
$$\alpha_1 \cdots \alpha_n$$
)  $\triangleq \{f \mid \exists i. \ \alpha_i = \langle \text{?call } f \rangle \text{ or } \alpha_i = \langle \text{.call } f \rangle \}$   
 $\cup \{s \mid (\sigma|_s) \neq \varepsilon \text{ is an active trace} \} \setminus \{\text{tinit, tcom} \}$   
*extern*( $\alpha_1 \cdots \alpha_n$ )  $\triangleq \{f \mid \exists i. \ \alpha_i = \langle \text{!call } f \rangle \}$   
 $\cup \{s \mid (\sigma|_s) \neq \varepsilon \text{ is an quiescent trace} \}$ 

These definitions lift to trace sets via set union. When interpreted over trace sets, *intern* identifies the functions and threads defined by the component, whereas *extern* identifies the functions and threads mentioned in a component, but not defined by it.

A trace  $\sigma$  is *coherent* if *intern*( $\sigma$ )  $\cap$  *extern*( $\sigma$ ) =  $\emptyset$ . We assume that all traces are coherent. We also assume other well-formedness criteria, detailed in the full version of this paper.

A set  $\Sigma$  of traces is *coherent* if *intern*( $\Sigma$ )  $\cap$  *extern*( $\Sigma$ ) =  $\emptyset$ . Note that this is stronger than requiring only that each individual trace be coherent. Let  $\Phi$ ,  $\Psi$  range over coherent sets of traces.

A trace is *sequential* if it can be extended in such a way that every  $\langle s? \rangle$  is followed by actions exclusively by *s*, up to a terminating  $\langle s! \rangle$ . A trace set is *sequential* if it contains only sequential traces.

A trace set is an *interface* if it contains only I/O actions.

#### 5 Memory actions and memory orders

Our approach is parametric with respect to the specific memory model considered. For concreteness, we will consider four models here: seq, hb, tso and pso. To keep the formalism simple, we assume that (1) memory stores only integers, (2) atomics provide the only form of synchronization and (3) components are specified as sets of functions, variables and threads.

Let  $z \in Reg$  range over registers (local variables),  $x, y \in DataVar$  over data variables and  $w \in SyncVar$  over synchronization variables. We use the general term variable to include data variables and synchronization variables, but not registers. Memory actions are as follows.

 $\begin{array}{l} \alpha, \gamma ::= \cdots \mid \langle s \ {\rm wr} \ x \ u \ a \rangle \mid \langle s \ {\rm rd} \ x \ u \ a \rangle \mid \langle {\rm com} \ s \ x \ a \rangle \\ \mid \langle s \ {\rm rel} \ w \rangle \quad \mid \langle s \ {\rm acq} \ w \rangle \quad \mid \langle s \ {\rm cas} \ w \rangle \end{array}$ 

For data variables, the actions record writes, reads and commits. For synchronization variables, the actions record releases, acquires and compare-and-sets. Action names (metavariable a, as before) are used to record relations between data actions. Commit actions are used by buffering models, such as tso and pso, to indicate the point at which a write is moved from the local buffer to main memory. Non-buffering models, such as seq and hb, have no commit actions.

Neither initializations nor commits are performed by the program, but by the underlying operational machinery. Initialization actions are normal writes attributed to the reserved pseudo-thread "tinit". Commit actions are only performed by the reserved pseudo-thread "tcom"; thus we simply define *thrd*( $\langle com s x a \rangle$ ) = tcom. In  $\langle com s x a \rangle$ , the identifiers s and x are redundant with the corresponding  $\langle s \text{ wr } x u a \rangle$ .

Synchronization variables carry memory effects whereas data variables do not. Registers are used to write programs, but are not shared between threads; thus, we do not require actions relating to registers.

The name a is defined in  $\langle wr a \rangle$  and used in  $\langle rd a \rangle$  and  $\langle com a \rangle$ . We expect that every write action is committed at most once and that the redundant information in read and commit actions should match the corresponding write. In addition, initialization writes by thread "tinit" must appear at the beginning of a trace. These bookkeeping requirements are included in the notion of well-formed trace, formalized in the full version of this paper. Most of the requirements are unsurprising. We note only that well-formedness does not require that a read be proceeded by the matching write, since this is not true under all of the models we consider.

Example 5.1. Consider the following traces, each containing actions from three different threads (eliding initialization and commit actions).

$$|\mathbf{s} \operatorname{wr} \mathbf{x} \mathbf{a}\rangle \langle \mathbf{t} \operatorname{wr} \mathbf{x} \mathbf{b}\rangle \langle \mathbf{r} \operatorname{rd} \mathbf{x} \mathbf{b}\rangle$$
 (a)

$$\langle \mathbf{t} \, \mathbf{wr} \, \mathbf{x} \, \mathbf{b} \rangle \langle \mathbf{s} \, \mathbf{wr} \, \mathbf{x} \, \mathbf{a} \rangle \langle \mathbf{r} \, \mathbf{rd} \, \mathbf{x} \, \mathbf{b} \rangle \tag{b}$$

$$\langle s wr \times a \rangle \langle t wr \times b \rangle \langle r rd \times b \rangle$$
 (a)  
 
$$\langle t wr \times b \rangle \langle s wr \times a \rangle \langle r rd \times b \rangle$$
 (b)  
 
$$\langle s wr \times a \rangle \langle r rd \times b \rangle \langle t wr \times b \rangle$$
 (c)  
 
$$uru b \rangle \langle t wr u d \rangle \langle r rd u d \rangle \langle r rd u d \rangle$$
 (d)

$$\langle \mathsf{s} \, \mathsf{wr} \, \mathsf{x} \, \mathsf{a} \rangle \langle \mathsf{s} \, \mathsf{wr} \, \mathsf{y} \, \mathsf{b} \rangle \langle \mathsf{t} \, \mathsf{wr} \, \mathsf{x} \, \mathsf{c} \rangle \langle \mathsf{t} \, \mathsf{wr} \, \mathsf{y} \, \mathsf{d} \rangle \langle \mathsf{r} \, \mathsf{rd} \, \mathsf{y} \, \mathsf{d} \rangle \langle \mathsf{r} \, \mathsf{rd} \, \mathsf{x} \, \mathsf{a} \rangle \tag{d}$$

- Under seq, reads and writes are atomic; thus, a read must be fulfilled by the previous write. Only trace (a) is allowable; the others require that a read see a stale write.

- Under tso, writes are placed in a buffer which is not visible to other threads; for any given thread, the buffered writes are committed to main memory in FIFO order, but the order between threads is nondeterministic. Thus, traces (a) and (b) are allowable.
- pso is similar to tso, except that each thread has a separate buffer for each variable.
   Thus, traces (a), (b) and (d) are allowable.
- Under hb, a write may be seen by a reader even before it is generated by the writer.
   Thus, all four executions are allowable.

*Example 5.2.* Consider the following unsynchronized implementation of a one place buffer (on the left) and client (on the right).

| var y=0                           | var x=0                                             |
|-----------------------------------|-----------------------------------------------------|
| fun put (z){y=z}                  | <pre>thrd s {x=1; put(3); wait(4); let z' =x}</pre> |
| fun wait (z) {do skip until y==z} | thrd t {wait(3);x=2;put(4)}                         |

Ignoring initialization and commits, here is a single trace of the library and of the client, each in isolation. (The label sets decorating return actions are specification elements. Those on the library output actions are *guarantees*, whereas those on client input actions are *relies*.)

| $\langle s?call put 3 a \rangle \langle s wr y 3 \rangle \langle s! ret \emptyset \rangle$    | $\langle s  wr  x  1 \rangle \langle s  !  call  put  3  a \rangle \langle s  ?ret  \emptyset \rangle$       |
|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| $\langle t$ ?call wait 3 b $\rangle \langle t$ rd y 3 $\rangle \langle t$ ! ret {a} $\rangle$ | $\langle t! call wait 3 b \rangle \langle t?ret \{a\} \rangle \langle t wr x 2 \rangle$                      |
| $\langle t$ ?call put 4 c $\rangle \langle t $ wr y 4 $\rangle \langle t!$ ret Ø $\rangle$    | $\langle t! call put 4 c \rangle \langle t? ret \emptyset \rangle$                                           |
| $\langle$ s?call wait 4 d $\rangle$ $\langle$ s rd y 4 $\rangle$ $\langle$ s!ret {c           | $\rangle \qquad \langle s! call wait 4 d \rangle \langle s? ret \{c\} \rangle \langle s rd \times 1 \rangle$ |

Composing the traces, we have the following trace (on the left), which, if we elide "." actions, is equivalent to the trace on the right.

| $\langle s wr x 1 \rangle \langle s.call put 3 a \rangle \langle s wr y 3 \rangle \langle s.ret \emptyset \rangle$                  | $\langle s  wr  x  1 \rangle \langle s  wr  y  3 \rangle$       |
|-------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| $\langle t. call wait 3 b \rangle \langle t rd y 3 \rangle \langle t. ret \{a\} \rangle \langle t wr x 2 \rangle$                   | $\langle t \ rd \ y \ 3 \rangle \langle t \ wr \ x \ 2 \rangle$ |
| $\langle t. call put 4 c \rangle \langle t wr y 4 \rangle \langle t. ret \emptyset \rangle$                                         | $\langle t wr y 4 \rangle \langle s rd y 4 \rangle$             |
| $\langle$ s.call wait 4 d $\rangle$ $\langle$ s rd y 4 $\rangle$ $\langle$ s.ret $\{$ c $\}$ $\rangle$ $\langle$ s rd x 1 $\rangle$ | $\langle s rd \times 1 \rangle$                                 |

Ignoring calls and returns, under what circumstances should such a trace be allowed?

On the one hand, it is clearly *not* allowed under sequential semantics, since  $\langle s rd \times 1 \rangle$  does not see the most recent write. On the other hand, it is clearly *allowed* under a happens-before semantics, since there is no synchronization between thread s and t.

For tso and pso, the situation is less obvious. In fact, pso will allow the trace, but tso will not. The difference is that tso enforces an ordering between  $\langle t wr \times 2 \rangle$  and  $\langle t wr y 4 \rangle$ , whereas pso does not.

Moving from the combined trace back to the trace of the library in isolation, for each memory model, we may ask "does the library implementation meets its specification?" In this case, the answer is positive for seq and tso, and negative for pso and hb.

Similarly, moving from the combined trace back to the trace of the client in isolation, for each memory model, we may ask "is the final client read valid?" For this question, the answers are reversed: valid for pso and hb, and invalid for seq and tso.  $\Box$ 

To formalize these properties, we introduce a notion of memory ordering, which is derivable from a trace. Recall that tinit is a reserved name.

Definition 5.3. The partial function var is undefined for commit and nonmemory actions and otherwise returns the variable mentioned:  $var(\alpha) \stackrel{\scriptscriptstyle \Delta}{=} x$  if  $\alpha \in \{ \langle wr x \rangle, \langle rd x \rangle \}$ ;  $var(\alpha) \stackrel{\scriptscriptstyle \Delta}{=} w$  if  $\alpha \in \{ \langle rel w \rangle, \langle acq w \rangle, \langle cas w \rangle \}$ ; and  $var(\alpha)$  is undefined otherwise. From a trace  $\sigma = \alpha_1 \cdots \alpha_n$ , we derive several relations.

-  $i <_{rf}^{\sigma}$  j if  $\alpha_i = \langle wr a \rangle$ ,  $\alpha_j = \langle rd a \rangle$  (reads-from relation) -  $i <_{cb}^{\sigma}$  j if  $\alpha_i = \langle wr a \rangle$ ,  $\exists \ell < j$ .  $\alpha_\ell = \langle \operatorname{com} a \rangle$  (committed-before relation) -  $i <_{rely}^{\sigma}$  j if  $\alpha_i = \langle ! a \rangle$ ,  $\alpha_j = \langle ? A \cup \{a\} \rangle$  or  $\alpha_i = \langle . a \rangle$ ,  $\alpha_j = \langle . A \cup \{a\} \rangle$  (rely order) -  $i <_{guar}^{\sigma}$  j if  $\alpha_i = \langle ? a \rangle$ ,  $\alpha_j = \langle ! A \cup \{a\} \rangle$  or  $\alpha_i = \langle . a \rangle$   $\alpha_i = \langle . A \cup \{a\} \rangle$  (rely order)  $-i \langle \sigma_{guar}^{\sigma} j \text{ if } \alpha_i = \langle ? a \rangle, \ \alpha_j = \langle ! A \cup \{a\} \rangle \text{ or } \alpha_i = \langle . a \rangle, \ \alpha_j = \langle . A \cup \{a\} \rangle (guarantee)$  $-i \langle \sigma_{init}^{\sigma} j \text{ if } i \langle j, thrd(\alpha_i) = tinit \neq thrd(\alpha_j) \qquad (init order)$ -  $i <_{\mathsf{thrd}}^{\sigma} j \text{ if } i < j, thrd(\alpha_i) = thrd(\alpha_j) \notin \{\mathsf{tinit}, \mathsf{tcom}\}$ (thread order)  $\begin{array}{l} -i <_{\text{var}}^{\sigma} j \text{ if } i < j, \ var(\alpha_i) = var(\alpha_j) \\ -i <_{\text{sync}}^{\sigma} j \text{ if } i < j, \ \alpha_i \in \{\langle \operatorname{rel} w \rangle, \langle \operatorname{cas} w \rangle\}, \ \alpha_j \in \{\langle \operatorname{acq} w \rangle, \langle \operatorname{cas} w \rangle\} \\ -i <_{\text{wr}}^{\sigma} j \text{ if } i' < j', \ \alpha_{i'} = \langle \operatorname{com} a \rangle, \ \alpha_{j'} = \langle \operatorname{com} b \rangle, \ \alpha_i = \langle \operatorname{wr} x a \rangle, \ \alpha_j = \langle \operatorname{wr} x b \rangle \end{array}$ (variable order)

Here,  $<_{sync}$  is synchronization order and  $<_{wr}$  is (unbuffered) write order. Using these relations, we define four memory orders and two commit orders.

- Define <<sup>σ</sup><sub>seq</sub> to be the transitive closure of (<<sup>σ</sup><sub>thrd</sub> ∪ <<sup>σ</sup><sub>rely</sub> ∪ <<sup>σ</sup><sub>init</sub> ∪ <<sup>σ</sup><sub>var</sub>).
   Define <<sup>σ</sup><sub>hb</sub> to be the transitive closure of (<<sup>σ</sup><sub>thrd</sub> ∪ <<sup>σ</sup><sub>rely</sub> ∪ <<sup>σ</sup><sub>init</sub> ∪ <<sup>σ</sup><sub>sync</sub>).
   Define <<sup>σ</sup><sub>tso</sub> to be the least transitive relation that includes (<<sup>σ</sup><sub>rely</sub> ∪ <<sup>σ</sup><sub>init</sub> ∪ <<sup>σ</sup><sub>sync</sub>) and satisfies the following, where  $\sigma = \alpha_1 \cdots \alpha_n$ .
  - (1) If  $thrd(\alpha_i) \neq thrd(\alpha_j)$  then  $i <_{tso}^{\sigma} j$  whenever  $i <_{rf}^{\sigma} j$  or  $i <_{wr}^{\sigma} j$ . (2) If  $thrd(\alpha_i) = thrd(\alpha_j)$  then  $i <_{tso}^{\sigma} j$  whenever  $i < j, \alpha_i \neq \langle com \rangle, \alpha_j \neq \langle com \rangle$ , and either (a)  $\alpha_i \neq \langle wr \rangle$ , (b)  $\alpha_j \neq \langle rd \rangle$ , or (c)  $\alpha_i = \langle wr a \rangle$ ,  $\alpha_j = \langle rd a \rangle$  and  $i <_{cb}^{\sigma} j$ .
- Define  $<_{pso}^{\sigma}$  similarly to  $<_{tso}^{\sigma}$ , replacing clause (b) with (b') and adding (d): (b')  $\alpha_j \notin \{\langle \mathsf{rd} \rangle, \langle \mathsf{wr} \rangle\}$ , (d)  $\alpha_j = \langle \mathsf{wr} x \rangle$  and  $\alpha_i \in \{\langle \mathsf{rd} \rangle, \langle \mathsf{wr} x \rangle\}$ .
- Define  $i <_{\text{compso}}^{\sigma} j$  whenever i < j and one of the following holds. (1)  $\exists a. \ \alpha_i = \langle \text{wr } a \rangle$  and  $\alpha_j = \langle \text{com } a \rangle$ . (2)  $\exists a, s, t. \ s \neq t, \ \alpha_i = \langle \text{com } s \ a \rangle$  and  $\alpha_j = \langle \text{com } s \ a \rangle$ .  $\langle t \text{ rd } a \rangle$ . (3)  $\exists s. \ \alpha_i = \langle \text{com } s \rangle$  and  $\alpha_j \in \{ \langle s \text{ rel} \rangle, \langle s \text{ cas} \rangle \}$ . (4)  $\exists i' < j' < i. \exists a, b. \ \alpha_{i'} = i$  $\langle \text{wr } a \rangle, \alpha_{j'} = \langle s! \text{ call } b \rangle, \alpha_i = \langle \text{com } a \rangle, \alpha_j = \langle \text{?ret } B \rangle \text{ and } b \in B. (5) \exists x. \alpha_i = \langle \text{com } x \rangle$ and  $\alpha_j = \langle \operatorname{com} x \rangle$ .
- Define  $\langle \sigma_{\text{comtso}} \text{ similarly to } \langle \sigma_{\text{compso}}, \text{ adding } (6) \exists s. \alpha_i = \langle \text{com } s \rangle \text{ and } \alpha_j = \langle \text{com } s \rangle.$

Let  $\mathcal{W}$  range over the memory orders in {seq, hb, tso, pso}.

For each  $\mathscr{W}$ , define  $\triangleleft_{\mathscr{W}}^{\sigma}$  similarly to  $<_{\mathscr{W}}^{\sigma}$ , simply replacing  $<_{\mathsf{rely}}^{\sigma}$  with  $<_{\mathsf{guar}}^{\sigma}$ . 

The memory orders relate actions that affect the visibility of values. The (nontransitive) commit orders,  $<_{compso}^{\sigma}$  and  $<_{compso}^{\sigma}$ , relate commit actions to conflicting actions.

All four memory orders include  $<_{relv}$ , which specifies orderings guaranteed by the environment, and  $<_{init}$ , which specifies initialization. Initial writes are performed by the reserved thread "tinit". For traces of interfaces (which include only I/O actions), the four memory orders coincide.

The definitions of  $<_{seq}$  and  $<_{hb}$  are standard. Relative to hb, clause (1) of the definition of  $<_{tso}$  captures tso's stronger inter-thread dependencies, and clause (2) captures tso's weaker intra-thread dependencies. Two actions of the same thread are ordered unless the first is a write and the second is a read; in this case, they are ordered if the write is committed before the read. With respect to tso, the definition of  $<_{pso}$  removes the ordering between writes of different variables by the same thread.

For each  $\mathcal{W}$ , we define an operational semantics. The order-theoretic properties that require are  $\mathcal{W}$ -consistency (no stale reads) and  $\mathcal{W}$ -closure (no stalled threads).

A trace is *W*-consistent if none of its read actions are matched with stale writes.

Definition 5.4. Trace  $\sigma = \alpha_1 \cdots \alpha_n$  is  $\mathscr{W}$ -consistent if  $<^{\sigma}_{\mathscr{W}}$  is antisymmetric and  $\forall i, j \in [1, n]$ .  $\alpha_j = \langle \operatorname{rd} x \rangle$  and  $i <^{\sigma}_{\operatorname{rf}} j$  imply  $j \not<^{\sigma}_{\mathscr{W}} i$  and  $( \not\exists k. \ \alpha_k = \langle \operatorname{wr} x \rangle \text{ and } i <^{\sigma}_{\mathscr{W}} k <^{\sigma}_{\mathscr{W}} j)$ . A semantic function is  $\mathscr{W}$ -consistent if every trace it produces is  $\mathscr{W}$ -consistent.

A trace set is  $\mathscr{W}$ -closed if, whenever  $\sigma$  is an allowed trace, then any interleaving consistent with  $<_{\mathscr{W}}^{\sigma}$  is also allowed. For example, the following trace is seq-closed, but not tso-, pso- or hb-closed:  $\langle \text{tinit wr } x \rangle \langle \text{s wr } y \rangle \langle \text{t wr } y \rangle$ .

*Definition 5.5.* Trace  $\rho = \gamma_1 \cdots \gamma_n$  is a  $\mathscr{W}$ -permutation of  $\sigma = \alpha_1 \cdots \alpha_n$  via  $\delta$ , if  $\delta$  is an injective total function in  $[1, n] \to [1, n]$  such that  $\forall i, j \in [1, n]$ . we have that (1)  $\alpha_i \neq \langle ? \rangle$  implies  $\gamma_{\delta(i)} = \alpha_i$ , (2)  $\alpha_i = \langle ? A \rangle$  implies  $\gamma_{\delta(i)} = \alpha_i \{ {}^B/A \}$  and  $B \subseteq A$ , (3) *thrd* $(\alpha_i) =$  tinit implies  $\delta(i) = i$ , (4)  $i <_{\text{thrd}}^{\sigma} j$  iff  $\delta(i) <_{\text{thrd}}^{\rho} \delta(j)$ , (5)  $i <_{\mathscr{W}}^{\sigma} j$  iff  $\delta(i) <_{\mathscr{W}}^{\rho} \delta(j)$ , and (6) i < j iff  $\delta(i) < \delta(j)$  whenever  $\exists w. w = var(\alpha_i) = var(\alpha_j)$ . When  $\mathscr{W} =$  tso, we additionally require (7)  $i <_{\text{comtso}}^{\sigma} j$  iff  $\delta(i) <_{\text{comtso}}^{\rho} \delta(j)$ , and similarly for pso.  $\Box$ 

Definition 5.6. Trace set  $\Phi$  is  $\mathscr{W}$ -closed if whenever  $\sigma \in \Phi$  and  $\rho$  is an  $\mathscr{W}$ -permutation of  $\sigma$ , then  $\rho \in \Phi$ . A semantic function is  $\mathscr{W}$ -closed if every set it produces is  $\mathscr{W}$ -closed.

## 6 Components

Components, M, N, are built using abstractions,  $\Lambda$ , and expressions, C, D. A component declares variables (with an initial value), threads (with an initial expression) and functions (with an abstraction). In addition to base components, there are component constructors for composition and restriction.

 $\Lambda ::= (\vec{z}) \{C\}$   $C, D ::= u \mid z \mid x \mid w \mid x=C \mid w=C \mid w. cas(C,D) \mid \text{let } \vec{z}=C;D \mid \cdots$   $M, N ::= M \mid N \mid M \setminus f \mid \text{var } x_1=u_1; \cdots \text{ var } x_\ell=u_\ell; \text{ atomic } w_1=v_1; \cdots \text{ atomic } w_m=v_m;$   $\text{thrd } s_1 C_1; \cdots \text{ thrd } s_n C_n; \text{ fun } f_1 \Lambda_1 \cdots \text{ fun } f_j \Lambda_j$ 

Data variables are introduced by the keyword var; synchronization variables are introduced by the keyword atomic; registers are introduced by abstractions and let; expressions. When unspecified, variables initially hold 0. It is important to note that the formal parameters to a function are registers, not shared variables. We require that each component uniquely declare every function and thread name that occurs within it. Variables that are declared in more than one subcomponent are shared, allowing the possibility of interference.

Definition 6.1. A component is well formed if (1) it contains at most one declaration for each thread and function name, and (2) all declarations of a variable agree on the initial value. Two components are *compatible* if their composition is well formed.  $\Box$ 

Henceforth we consider only well formed components.

For a base component  $M = \text{``var } \vec{x} = \vec{u}$ ; atomic  $\vec{w} = \vec{v}$ ; thrd  $\vec{s}$   $\vec{C}$ ; fun  $\vec{f}$   $\vec{A}$ '', define *funs*  $(M) \triangleq \vec{f}$  and  $thrds(M) \triangleq \vec{s}$ . For aggregate components, define  $funs(M \parallel N) = funs$  $(M) \cup funs(N)$  and  $funs(M \setminus f) = funs(M)$ , and similarly for *thrds*. Note that *funs* returns the set of functions defined by a component, regardless of whether those functions are restricted. For a well formed component  $M \parallel N$ , we have that  $funs(M) \cap funs(N) = \emptyset$ .

Definition 6.2. For each memory order  $<_{\mathcal{W}}$ , the full version of this paper provides a corresponding operational semantics, defined as a partial function  $\mathscr{O}_{\mathcal{W}}$ . If  $thrds(M) \cap S = \emptyset$  then  $\mathscr{O}_{\mathcal{W}}[M](S)$  returns a set traces that is coherent,  $\mathcal{W}$ -consistent and  $\mathcal{W}$ -closed.  $\Box$ 

In  $\mathcal{O}_{\mathcal{W}}[M](S)$ , the threads of *thrds*(*M*) are initially active in the component (and quiescent in the environment) whereas the threads of *S* are initially active in the environment (and quiescent in the component). The operational semantics are unsurprising. We comment only on the role of commit actions. These have a clear operational interpretation under tso and pso; however, both seq- and hb-consistency ignore commit actions. Both  $\mathcal{O}_{seq}$  and  $\mathcal{O}_{hb}$  generate a commit action immediately after each write. This ensures that  $\mathcal{O}_{seq}$  traces are tso-consistent; we do not attempt to interpret  $\mathcal{O}_{hb}$  traces under tso.

To understand the examples, it is important to understand how the operational semantics generates actions from expressions involving memory. (1) Register writes do not create actions; neither do reads. (2) Data variable writes create  $\langle wr \rangle$  actions; reads create  $\langle rd \rangle$  actions.  $\langle com \rangle$  actions are generated immediately after a write in seq and hb; they are generated nondeterministically by tso and pso. (3) Synchronization variable writes create  $\langle rel \rangle$  actions; reads create  $\langle acq \rangle$  actions. Successful cas operations create  $\langle cas \rangle$  actions; unsuccessful cas operations do not create actions.

## 7 Linearizability

Linearizability is defined in terms of I/O permutations.

Definition 7.1. Write  $\alpha \approx \gamma$  if either  $\alpha = \gamma$  or  $\alpha = \langle ! A \rangle$  and  $\gamma = \alpha \{ [B/A] \}$ . Trace  $\sigma = \alpha_1 \cdots \alpha_n$  has *l/O-permutation*  $\rho = \gamma_1 \cdots \gamma_m$  via  $\delta$ , if  $\delta$  is an injective partial function over  $[1, n] \rightarrow [1, m]$  such that

- 
$$\forall i \in [1, n]$$
. if  $\alpha_i$  is I/O then  $\exists k \in [1, m]$ .  $\alpha_i \approx \gamma_k$  and  $\delta(i) = k$ , and  
-  $\forall k \in [1, m]$ . if  $\gamma_k$  is I/O then  $\exists i \in [1, n]$ .  $\alpha_i \approx \gamma_k$  and  $\delta(i) = k$ .

*Definition 7.2 (Linearizability).* Define  $\Phi \vDash_{\mathscr{W}} \Psi$  if every  $\sigma = \alpha_1 \cdots \alpha_n \in \Phi$  has an I/O permutation  $\rho = \gamma_1 \cdots \gamma_m \in \Psi$  via  $\delta$ , such that

-  $\forall i, j \in [1, n]$ . if  $\alpha_i, \alpha_j$  are I/O and  $\delta(i) \triangleleft_{\mathscr{W}}^{\rho} \delta(j)$  then either  $i <_{\mathscr{W}}^{\sigma} j$  or  $i \triangleleft_{\mathscr{W}}^{\sigma} j$ , and -  $\forall i, j \in [1, n]$ . if  $\alpha_i, \alpha_j$  are I/O and  $i <_{\mathscr{W}}^{\sigma} j$  then  $\delta(i) < \delta(j)$ .

The first condition ensures that the orderings required by the specification are preserved in the implementation. The last condition ensures that the ordering on I/O actions in the implementation is reflected in the specification. As the next example illustrates, this is different from the traditional requirement that the ordering on non-overlapping I/O actions be reflected in the specification. Example 7.3. As a simple example, consider the following unsynchronized counter.

At first glance, we might expect this implementation to satisfy a specification which requires that the return values be non-decreasing; that is, we expect traces of form

 $\langle$ s?call inc $\rangle$  $\langle$ s!ret  $u_0$  $\rangle$  $\langle$ t?call inc $\rangle$  $\langle$ t!ret  $u_1$  $\rangle$  $\langle$ r?call inc $\rangle$  $\langle$ r!ret  $u_2$  $\rangle$ ...

where  $u_i \ge u_{i-1}$ . Although this specification contains no ordering on actions, the implementation does not satisfy it, for seq, tso or pso, due to the lack of synchronization. To see this, consider a call by one thread with overlapping and following calls by another.

Our results allow us to consider whether the implementation satisfies the specification if clients are constrained so that each thread may call inc() at most once. In this case, we can answer affirmatively for all four models.

To illuminate the definition of linearizability, consider the following traces. (We elide the commit actions that immediately follow each write.) Inc generates the first trace under all memory models, but the second, only under hb.

$$\begin{array}{l} \langle t? call \ inc \rangle \langle s? call \ inc \rangle \langle s \ rd \ x \ 0 \ init \rangle \langle s \ wr \ x \ 1 \ a \rangle \langle s! \ ret \ 1 \rangle \langle t \ rd \ x \ 1 \ a \rangle \langle t \ wr \ x \ 2 \ b \rangle \langle t! \ ret \ 2 \rangle \\ \langle t? call \ inc \rangle \langle t \ rd \ x \ 1 \ a \rangle \langle t \ wr \ x \ 2 \ b \rangle \langle t! \ ret \ 2 \rangle \\ \langle s? call \ inc \rangle \langle s \ rd \ x \ 1 \ a \rangle \langle s \ wr \ x \ 1 \ a \rangle \langle s! \ ret \ 1 \rangle \\ \end{array}$$

For each  $\mathcal{W} \in \{\text{seq}, \text{tso}, \text{pso}\}\)$ , the first trace is linearizable under  $\vDash_{\mathcal{W}}$ , whereas the second trace is not. The write and subsequent read of the shared variable creates order between threads (condition (2c) and (2d) for tso and pso) and thus we have  $\langle t?\text{call inc} \rangle <_{\mathcal{W}} \langle s!\text{ret } 1 \rangle$  in the second trace. This causes the last clause of Definition 7.2 to fail.

Touching a shared data variable creates no ordering under hb, and therefore both traces are linearizable under  $\vDash_{hb}$ . This would not be the case if we were to adopt the traditional requirement for linearizability: that the order of non-overlapping method calls be respected. This would also not be the case if the last clause of Definition 7.2 required  $\delta(i) <_{\mathscr{W}}^{\rho} \delta(j)$  rather than  $\delta(i) < \delta(j)$ , since  $(<_{\mathscr{W}}^{\rho})$  is the empty relation for every specification trace  $\rho$ .

Example 7.4. Suppose we have an implementation trace of the form

 $\langle s?call inc \rangle \langle s!ret u_0 a \emptyset \rangle \langle t?call inc \{a\} \rangle \langle t!ret u_1 b \rangle \langle r?call inc \{b\} \rangle \langle r!ret u_2 \rangle \cdots$ 

where the client has imposed ordering between each method return and the subsequent call. The definition of linearizability requires that the specification have exactly the same use sets, and thus the same client ordering. In this case, the specification may be more constrained. For example, it might require that  $u_i > u_{i-1}$ .

*Example 7.5.* The following example is drawn from java.lang.String.hashCode. The specification requires that every call to hashCode return the same value. The implementation has a benign write-write data race.

```
var hash;
fun hashCode() { let h=hash;
    if h!=0 then { return h } else { let h=42; hash=h; return h } }
(Hash)
```

Here, we set hash to 42; in a real implementation, the value is derived from immutable fields of the object. hash is always set to the same value, regardless of the number of threads that call hashCode simultaneously. The intended sequential interface specification for Hash is:

 $(\langle s?call hashCode \rangle \langle s!ret hashCode 42 \rangle)^*$ 

Hash satisfies its sequential specification under all memory models.

We consider two implementations of an atomic pair, inspired by an example in [Burckhardt et al. 2012]. The specification requires that the get return the pair of values specified by the preceding set:

 $(\langle s: call set (u, v) a \rangle \langle s: ret \rangle ( \langle t: call get \rangle \langle t: ret (u, v) \{a\} \rangle)^*)^*$ 

Example 7.6. The first implementation is fully synchronized using locks.

var x<sub>1</sub>; var x<sub>2</sub>; atomic lock; fun set( $z_1, z_2$ ) { do skip until lock.cas(0,1); x<sub>1</sub>=z<sub>1</sub>; x<sub>2</sub>=z<sub>2</sub>; lock=0 } (Pair1) fun get() { do skip until lock.cas(0,1); let z<sub>1</sub>=x<sub>1</sub>; let z<sub>2</sub>=x<sub>2</sub>; lock=0; return z<sub>1</sub>, z<sub>2</sub> }

Pair1 is linearizable under all memory models. The cas on the atomic variable provides the required order relation. The linearization point can be chosen to be the successful cas operation in both the methods. The specification also requires an order relationship from the call of set to the return of get as seen in the subsequence  $\langle s?call set (v_1, v_2) a \rangle \cdots \langle t!ret get (v_1, v_2) \{a\} \rangle$ . The order from the write of the atomic variable lock in set to the successful cas on lock in get establishes this relationship in the implementation.  $\Box$ 

*Example 7.7.* The second implementation uses locking for set, but not get. The version variable i is odd if and only if there is a write in progress.

var x<sub>1</sub>; var x<sub>2</sub>; var i; atomic lock; fun set(z<sub>1</sub>,z<sub>2</sub>) { do skip until lock.cas(0,1); i++; x<sub>1</sub>=z<sub>1</sub>; x<sub>2</sub>=z<sub>2</sub>; i++; lock=0 } fun get() { while (1) { let j=i; if even(j) then let  $z_1=x_1$ ; let  $z_2=x_2$ ; if j==i then return  $z_1,z_2$  } }

Pair2 exemplifies a publication idiom characteristic of tso, allowing data races between writes and reads. Pair2 is also not linearizable under pso or hb.

Pair2 is linearizable under tso. A candidate linearization point for set is the first increment of i. The linearization point for get is the successful check of the counter i. Pair1 and Pair2 share the same specification, so the specification requires the same order relationship from the call of set to the return from get. The second condition of the definition of  $<_{tso}$  on the counter i, from the write in set to the read in get, yields the required order. Neither pso nor hb provide this ordering.

*Example 7.8.* The next example is an "active" component, which implements an asynchronous function handler. This can be seen as a simplified thread pool, with a single, one-shot thread. Let v' be the result of performing the operation op on v.

 $\langle$ s?call send v a $\rangle$  $\langle$ s!ret true $\rangle$  ( (  $\langle$ t?call get $\rangle$  $\langle$ t!ret v' {a} $\rangle$  ) |  $\langle$ r?call send u $\rangle$  $\langle$ r?ret false $\rangle$  )\*

The first call to send succeeds, and calls to get return a value derived from its parameter. Subsequent calls to send return false.

Async satisfies its sequential specification for all four memory models.

A candidate linearization point for send is the successful cas or reading start==1, depending on which path is taken. The linearization point for the worker thread wrk and get is the point of exit from the loops, via the variables start and stop, respectively. The specification requires an order relationship as seen in the subsequence  $\langle s?call send v a \rangle \cdots \langle t!ret v' \{a\} \rangle$ . The implementation establishes this by combining two order relations yielded by atomic variables: start links send to wrk and stop links wrk to get.

*Example 7.9.* Async can be generalized to a thread pool which satisfies interface traces such as the following, where let v' be the result of performing some computation on v and j is a job identifier.

 $\langle$ s?call send v a $\rangle$  $\langle$ s!ret j $\rangle$  $\langle$ r?call get j $\rangle$  $\langle$ r!ret v' {a} $\rangle$ 

If the thread pool generates unique job identifiers, then it should be able to guarantee the happens-before relation given in the specification.

We describe an implementation parameterized on a bounded buffer and map. The bounded buffer holds waiting jobs and the map holds waiting results. Due to the complexity of the possible interleavings, we give exemplary traces rather than complete specifications. The implementation is straightforward.

The bounded buffer is an adaption of Buffer given in the introduction. To accomodate the example, the buffer holds pairs of values. If the buffer is FILO, then the sequential interface will include traces such as the following.

 $\begin{array}{l} \langle s? \mathsf{call \ bput} \ (1, 10) \ \mathsf{a} \rangle \langle s! \mathsf{ret} \rangle \ \langle \mathsf{t}? \mathsf{call \ bput} \ (1, 10) \ \mathsf{b} \rangle \langle \mathsf{t}! \mathsf{ret} \rangle \\ & \langle r? \mathsf{call \ bget} \rangle \langle \mathsf{r}! \mathsf{ret} \ (1, 10) \ \{\mathsf{b}\} \rangle \ \langle \mathsf{q}? \mathsf{call \ bget} \rangle \langle \mathsf{q}! \mathsf{ret} \ (1, 10) \ \{\mathsf{a}\} \rangle \end{array}$ 

Note that the same value is put twice, by different threads. The use sets in the get actions indicate the FILO order, even though the values do not.

The map is similar. Here is an example showing a value that is retrieved twice.

 $\begin{aligned} \langle s? \mathsf{call} \ \mathsf{mput} \ (1, 10) \ \mathsf{a} \rangle \langle s! \mathsf{ret} \rangle \ \langle t? \mathsf{call} \ \mathsf{mput} \ (1, 10) \ \mathsf{b} \rangle \langle t! \mathsf{ret} \rangle \\ & \langle r? \mathsf{call} \ \mathsf{mget} \ 1 \rangle \langle r! \mathsf{ret} \ 10 \ \{\mathsf{b}\} \rangle \ \langle \mathsf{q}? \mathsf{call} \ \mathsf{mget} \ 1 \rangle \langle \mathsf{q!ret} \ 10 \ \{\mathsf{b}\} \rangle \end{aligned}$ 

Assuming a bounded buffer and map, the general thread pool has traces such as the following. For clarity, we show the function name on return actions.

 $\begin{array}{l} \langle s?call \; send \; v \; a \rangle \; \langle s!call \; bput \; (v,j) \; b \rangle \langle s?ret \; bput \rangle \; \langle s!ret \; send \; j \rangle \\ \langle wrk!call \; bget \rangle \langle wrk?ret \; bget \; (v,j) \; \{b\} \rangle \langle wrk!call \; mput \; (j,v') \; c \rangle \langle wrk?ret \; mput \rangle \\ \langle r?call \; get \; j \rangle \; \langle r!call \; mget \; j \rangle \langle r?ret \; mget \; v' \; \{c\} \rangle \; \langle r!ret \; get \; v' \; \{a\} \rangle \end{array}$ 

The first line shows a client calling send with argument v. The thread pool creates a new job id j, stores the job in the buffer and returns j. Subsequently, the second line show a

worker thread retrieving the job from the buffer, computing v', and storing the result in the map. Finally, the third line shows a client thread retrieving the result using a call to get j; in response, the thread pool retrieves j from the map and returns the corresponding value. In  $\langle !ret \{a\} \rangle$ , the decoration is a guarantee, similar to the decorations in previous examples: the thread pool guarantees that there will be memory effects between the call and corresponding return.

Consider the projection of this trace of the thread pool on the methods of the bounded buffer. We get:

```
\langle s! call bput (v, j) b \rangle \langle s? ret bput \rangle \langle wrk! call bget \rangle \langle wrk? ret bget (v, j) \{b\} \rangle
```

The sequence of calls to the buffer methods, and the values returned by them, line up with the trace of the buffer presented above. Furthermore, so do the label sets. In  $\langle s! call bput (v, j) b \rangle$  and  $\langle wrk?ret bget (v, j) \{b\} \rangle$ , the label b indicates an assumption made by the thread pool on the bounded buffer. In the matching actions,  $\langle s? call bput (v, j) b \rangle$  and  $\langle wrk!ret bget (v, j) \{b\} \rangle$ ), the label b indicates a guarantee provided by the bounded buffer interface to the thread pool. Here one can recognize the semantic ingredients necessary for a full higher-order multiplicative linear logic of interfaces, perhaps in the style of Interaction Categories [Abramsky et al. 1996]. In this paper, however, we do not pursue this further.  $\Box$ 

## 8 Proving Linearizability

We explore methods to quarantine data race free programs from the subtleties of relaxed memory models. First, we define a component to be *locally sequential consistent* when its SC traces provide a complete description of all its traces—or in the terminology of [Filipovic et al. 2010], when the set of its SC traces is an operational refinement of all of its traces.

Definition 8.1. Define  $\sigma \sim_{\mathscr{W}} \rho$  when (1)  $\sigma = \sigma_0 \gamma_1 \sigma_1 \cdots \gamma_n \sigma_n$  and  $\rho = \rho_0 \gamma_1 \rho_1 \cdots \gamma_n \rho_n$ for some  $\vec{\sigma}, \vec{\rho}, \vec{\gamma}$  such that each  $\vec{\sigma}$  and  $\vec{\rho}$  contains only write and commit actions, and (2) for every read action  $\alpha, \sigma \alpha$  is  $\mathscr{W}$ -consistent if and only if  $\rho \alpha$  is  $\mathscr{W}$ -consistent.

A set of traces  $\Phi$  is *locally sequentially consistent (LSC)* for  $\mathscr{W}$  if

$$\forall \sigma \in \Phi. \ \exists \sigma' \in \Phi. \ \sigma \sim_{\mathscr{W}} \sigma' \text{ and } \sigma' \text{ is seq-consistent.} \qquad \Box$$

Intuitively, a set is LSC if every trace can be matched by a seq-consistent trace in the set, where all non-write/non-commit actions must match exactly and in the same order (condition 1), and the reads available at the end are the same (condition 2).

*Example 8.2.* Inc is not LSC for any of the weak models. Hash is LSC for all four memory models. This demonstrates that LSC does not require the absence of data races.

Pair1 and Async are LSC for all four memory models; however, Pair2 is not LSC under any of the relaxed models. To see this, consider traces in which there is a completed call to set with parameters (1,1) and a subsequent call to get returning (1,1). In every such trace, the write actions must occur before the call to get. Of these traces, choose one in which the loop in get initially fails because  $i \neq j$ . This trace will not be equivalent to any SC trace, since it must see a stale value.

We describe a sufficient condition to establish that a set is LSC.

Definition 8.3. Actions conflict if one is a write to a data variable and the other is a read or write to the same variable. Trace  $\sigma = \alpha_1 \alpha_1 \cdots \alpha_n$  is *locally data race free (LDRF)* if whenever  $\alpha_i$  and  $\alpha_j$  conflict then either  $i <_{hb}^{\sigma} j$  or  $j <_{hb}^{\sigma} i$ . A set of traces is LDRF if every member is LDRF.

*Example 8.4.* All of the examples from Section 7 are LDRF for all four memory models, with the exception of Inc, Hash and Pair2, which are not LDRF for any model.  $\Box$ 

Proposition 8.5. Any trace that is LDRF and  $\mathcal{W}$ -consistent is also seq-consistent.  $\Box$ 

Proposition 8.5 demonstrates that to establish that a component is LSC, it suffices to show that all of its traces are LDRF. This, in turn, can be established by various standard techniques for detecting data races. For tso, there is a weaker condition, "triangular race freedom", that suffices to establish that a component is LSC [Owens 2010].

In order to reason about a program using the SC semantics, we must ensure that the weak semantics is consistent with  $\mathcal{O}_{seq}$ , in the sense that any seq-consistent trace generated by the weak semantics can also be generated by  $\mathcal{O}_{seq}$ . All of the semantic functions we consider have this property.

Definition 8.6. A semantic function  $\mathscr{S}$  is consistent with  $\mathscr{O}_{seq}$  if whenever  $\sigma \in \mathscr{S}[\![M]\!]$ (S) and  $\sigma$  is seq-consistent then  $\sigma \in \mathscr{O}_{seq}[\![M]\!](S)$ .

LSC components can be quarantined. For LSC programs, it is sometimes possible to use traditional SC techniques to reason about linearizability, even in a relaxed setting. The restrictions should be unsurprising to readers familiar with [Filipovic et al. 2010], which states that "OSC observationally refines OSA iff OSC is linearizable with respect to OSA, assuming that client operations may use at *least one shared global variable*." For such programs, our results allow proof techniques developed in the SC setting to apply to relaxed models.

A trace is *I/O-ordered for*  $\mathscr{W}$  if there is a  $<_{\mathscr{W}}$  order between every input and output. Formally,  $\sigma = \alpha_1 \cdots \alpha_n$  is *I/O*-ordered for  $\mathscr{W}$  if whenever  $\alpha_i$  and  $\alpha_j$  are input/output bracketed (Section 4) then  $i <_{\mathscr{W}}^{\sigma} j$ . Let  $erase(\sigma)$  be the trace derived from  $\sigma$  by replacing every name set occurring in return actions by the empty set; this has the effect of removing all of the happens-before relations from an interface.

Proposition 8.7. Let  $\mathscr{S}$  be a semantic function that is  $\mathscr{W}$ -consistent and consistent with  $\mathscr{O}_{seq}$ . Let  $\Psi$  be a sequential interface. Let  $\mathscr{S}\llbracket M\rrbracket(S)$  be I/O-ordered and LSC for  $\mathscr{W}$ . Then  $\mathscr{O}_{seq}\llbracket M\rrbracket(S) \vDash_{seq} erase(\Psi)$  implies  $\mathscr{S}\llbracket M\rrbracket(S) \vDash_{\mathscr{W}} \Psi$ .  $\Box$ 

Here  $\mathscr{O}_{seq}\llbracket M \rrbracket(S) \vDash_{seq} erase(\Psi)$  is similar to traditional linearizability. The use of *erase*  $(\Psi)$  ensures that the proof obligation is indeed the traditional one: ordering requirements are removed. Touchiness of the implementation and sequentiality of the specification are required to ensure that the order can be recovered.

In Corollary 10.4, we show that LSC clients can be isolated from the subtleties of relaxed memory used in the implementations of (even racy) libraries.

### 9 Composition

In order to state properties of linearizability, we must first define semantic versions of restriction and composition. Restriction is straightforward.

*Definition 9.1.* Let 
$$incalls(\alpha_1 \cdots \alpha_n) \triangleq \{f \mid \exists i. \ \alpha = \langle ?call f \rangle \}.$$
  
Then  $\Phi \setminus F \triangleq \{\sigma \in \Phi \mid incalls(\sigma) \cap F = \emptyset \}.$ 

Definition 9.2. An action sequence  $\pi$  is a *collapsed interleaving* of  $\sigma$  and  $\rho$  if there exists a  $\pi'$  such that (1) all actions of tinit occur at the beginning of  $\pi'$ , (2)  $\pi'$  is an interleaving of  $\sigma$  and  $\rho$ , and (3)  $\pi$  is derived from  $\pi'$  by (3a) replacing every subsequence  $\langle s! \text{call } f \ \vec{u} \ a A \rangle \langle s? \text{call } f \ \vec{u} \ a A \rangle$  by  $\langle s. \text{call } f \ \vec{u} \ a A \rangle$ .

*Definition 9.3 (Composition).* Let  $intern(\Phi) = H$  and  $intern(\Psi) = G$ . If  $H \cap G = \emptyset$ , then define  $\Phi \otimes \Psi$  to be the set of traces,  $\pi$ , such that  $extern(\pi) \cap (H \cup G) = \emptyset$ , and  $\pi$  is a collapsed interleaving of some  $\sigma \in \Phi$  and  $\rho \in \Psi$ .

In the full version of this paper, we provide an inductive characterization of composition and discuss its properties.

*Example 9.4.* Here are some single threaded examples to illustrate the definition. We elide the thread identifier.  $\{\langle ?call f \rangle\} \otimes \{\langle ?call f \rangle\}$  and  $\{\langle wr \rangle\} \otimes \{\langle wr \rangle\}$  are undefined because their *intern* overlap; the first pair on f, the second, on the thread identifier.

Composition forces complete synchronization on invocations of functions that are defined in either component, but permits interleaving of invocations of functions that are undefined in both components. Let  $\mathscr{C}$  perform prefix closure.

Consider the following traces, where  $\alpha_{11} - \alpha_{32}$  are arbitrary memory actions. Both the first and second traces include calls to f, which is defined by third trace. The first trace also includes a call to g, which is defined by the second trace.

 $\begin{array}{l} \alpha_{11} \langle ! \, \text{call } g \rangle \langle ? \text{ret} \rangle \alpha_{12} \langle ! \, \text{call } f \rangle \langle ? \text{ret} \rangle \\ \langle ? \, \text{call } g \rangle \alpha_{21} \langle ! \, \text{call } f \rangle \langle ? \text{ret} \rangle \alpha_{22} \langle ! \, \text{ret} \rangle \\ \langle ? \, \text{call } f \rangle \alpha_{31} \langle ! \, \text{ret} \rangle \langle ? \, \text{call } f \rangle \alpha_{32} \langle ! \, \text{ret} \rangle \end{array}$ 

The first two compose to  $\alpha_{11}\langle .call g \rangle \alpha_{21} \langle !call f \rangle \langle ?ret \rangle \alpha_{22} \langle .ret \rangle \alpha_{12} \langle !call f \rangle \langle ?ret \rangle$ . Composing the second and third gives  $\langle ?call f \rangle \alpha_{31} \langle !ret \rangle \langle ?call g \rangle \alpha_{21} \langle .call f \rangle \alpha_{32} \langle .ret \rangle \alpha_{22} \langle !ret \rangle$  and  $\langle ?call g \rangle \alpha_{21} \langle .call f \rangle \alpha_{31} \langle .ret \rangle \alpha_{22} \langle !ret \rangle \langle ?call f \rangle \alpha_{32} \langle !ret \rangle$ . Composing the first and third gives  $\alpha_{11} \langle !call g \rangle \langle ?call f \rangle \alpha_{31} \langle !ret \rangle \langle ?ret \rangle \alpha_{12} \langle .call f \rangle \alpha_{32} \langle .ret \rangle \langle .ret \rangle$ . Composing the first and third gives  $\alpha_{11} \langle !call g \rangle \langle ?call f \rangle \alpha_{31} \langle !ret \rangle \langle ?ret \rangle \alpha_{12} \langle .call f \rangle \alpha_{32} \langle .ret \rangle$ .

$$\alpha_{11}\langle . \operatorname{call} g \rangle \alpha_{21} \langle . \operatorname{call} f \rangle \alpha_{31} \langle . \operatorname{ret} \rangle \alpha_{22} \langle . \operatorname{ret} \rangle \alpha_{12} \langle . \operatorname{call} f \rangle \alpha_{32} \langle . \operatorname{ret} \rangle.$$

*Example 9.5.* For any single trace, the order of cross-thread actions is fixed. Thus, composing  $\langle s?call f \rangle \langle t wr \rangle$  and  $\langle s!call f \rangle$  produces only  $\langle s.call f \rangle \langle t wr \rangle$ .  $\Box$ 

## **10** Properties of linearizability

We present the results using the most general client. More general results can be found in the appendix.

*Definition 10.1 (Interference freedom).* Two components are *interference free* if they are compatible (Definition 6.1) and declare disjoint variables.

Definition 10.2 (Compositionality). A semantic function  $\mathscr{S}$  is compositional if (1)  $\mathscr{S}\llbracket M \setminus F \rrbracket(S) = \mathscr{S}\llbracket M \rrbracket(S) \setminus F$  and, (2)  $\mathscr{S}\llbracket M \parallel N \rrbracket(S) \subseteq \mathscr{S}\llbracket M \rrbracket(S) \otimes \mathscr{S}\llbracket N \rrbracket(S)$ , whenever *M* and *N* are interference free.  $\Box$ 

Proposition 10.3 (Abstraction). Let  $\mathscr{S}$  be coherent, compositional and  $\mathscr{W}$ -closed. Let  $M_L$  and  $M_C$  be interference free. If  $\mathscr{S}[\![M_L]\!](S) \vDash_{\mathscr{W}} \Psi_L$  and  $\mathscr{S}[\![M_C]\!](S) \otimes \Psi_L \vDash_{\mathscr{W}} \Psi_C$  then  $\mathscr{S}[\![M_C]\!](S) \vDash_{\mathscr{W}} \Psi_C$ .

Consider the Lock discussed in the introduction. If we are given that (1) the lock implements its specification (that is,  $\mathscr{S}[[Lock]](S) \vDash_{\mathscr{W}} \mathcal{H}_{ock}$ ) and (2) the one place buffers implements its specification when it uses the lock specification (that is,  $\mathscr{S}[[Buffer]](S) \otimes \mathcal{H}_{lock} \vDash_{\mathscr{W}} \mathcal{H}_{buf}$ ), then the theorem allows us to deduce that the implementation of the buffer realizes its specification ( $\mathscr{S}[[Buffer]] Lock]](S) \vDash_{\mathscr{W}} \mathcal{H}_{buf}$ ).

Corollary 10.4 (Quarantining weakness). Let  $\mathscr{S}$  be coherent, compositional and  $\mathscr{W}$ closed. Let  $M_L$  and  $M_C$  be interference free. Let  $\Psi_L$  and  $\Psi_C$  be sequential interfaces. Suppose  $\Psi_L = erase(\Psi_L)$ ,  $\mathscr{S}[\![M_C]\!](S)$  is LSC and either (1)  $erase(\Psi_C) = \Psi_C$  or (2)  $\mathscr{S}$  $[\![M_C]\!](S)$  is I/O-ordered. If  $\mathscr{S}[\![M_L]\!](S) \vDash_{\mathscr{W}} \Psi_L$  and  $\mathscr{S}[\![M_C]\!](S) \otimes \Psi_L \vDash_{seq} \Psi_C$  then  $\mathscr{S}$  $[\![M_C]\!](S) \vDash_{\mathscr{W}} \Psi_C$ .

Corollary 10.4 demonstrates that well-synchronized clients (that do not depend on the library for synchronization), are not affected by data races in the library. Consider the unsynchronized counter lnc from Examples 7.3-7.4. A fully-synchronized client can safely use the library without regard to its data races; for example, a fully-synchronized counter can be built using the unsynchronized one.

## 11 Conclusion

This paper investigates reasoning about concurrent data structures, with a special focus on isolating the complexity wrought by relaxed memory models. We have presented an adaptation of linearizability that accounts for relaxed memory and provided ways to reason compositionally. Our treatment is parametric with respect to the memory model, with the required properties of the memory model confined to a couple of key properties. We have been able to address SC, TSO, PSO and (a variant of) the JMM in this style.

## References

S. Abramsky, S. J. Gay, and R. Nagarajan. Interaction categories and the foundations of typed concurrent programming. In NATO ASI DPD, pages 35–113, 1996.

- 20 Jagadeesan, Petri, Pitcher and Riely
- S. V. Adve and H.-J. Boehm. Memory models: a case for rethinking parallel languages and hardware. *Commun. ACM*, 53:90–101, 2010.
- S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. *Computer*, 29(12):66–76, 1996.
- R. Alur and P. Madhusudan. Adding nesting structure to words. J. ACM, 56(3), 2009.
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In *POPL*, pages 55–66. ACM, 2011.
- M. Batty, M. Dodds, and A. Gotsman. Library abstraction for C/C++ concurrency. In *POPL*, 2013. To appear.
- H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In *PLDI*, pages 68–78. ACM, 2008.
- S. Burckhardt, A. Gotsman, M. Musuvathi, and H. Yang. Concurrent library correctness on the TSO memory model. In *ESOP*, pages 87–107, 2012.
- D. Demange, V. Laporte, L. Zhao, S. Jagannathan, D. Pichardie, and J. Vitek. Plan B: A buffered memory model for Java. In *POPL*, 2013. To appear.
- R. Ferreira, X. Feng, and Z. Shao. Parameterized memory models and concurrent separation logic. In ESOP, volume 6012 of LNCS, pages 267–286, 2010.
- I. Filipovic, P. O'Hearn, N. Rinetzky, and H. Yang. Abstraction for concurrent objects. *Theoretical Comp. Sci.*, 411:4379–4398, 2010.
- A. Gotsman and H. Yang. Linearizability with ownership transfer. In CONCUR, volume 7454 of LNCS, pages 256–271, 2012.
- A. Gotsman, M. Musuvathi, and H. Yang. Show no weakness: sequentially consistent specifications of TSO libraries. In *DISC'12*, 2012. To appear.
- M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463–492, 1990.
- R. Jagadeesan, C. Pitcher, and J. Riely. Generative operational semantics for relaxed memory models. In ESOP, pages 307–326, 2010.
- A. Jeffrey and J. Rathke. A fully abstract may testing semantics for concurrent objects. *Theoretical Comp. Sci.*, 338:17–63, 2005.
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. *IEEE Trans. Comput.*, 28(9):690–691, 1979.
- J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In *POPL*, pages 378– 391, 2005.
- S. Owens. Reasoning about the implementation of concurrency abstractions on x86-TSO. In *ECOOP*, volume 6183 of *LNCS*, pages 478–503, 2010.
- S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding power multiprocessors. In *PLDI*, pages 175–186. ACM, 2011.
- J. Sevcík. Program Transformations in Weak Memory Models. PhD thesis, Laboratory for Foundations of Computer Science, University of Edinburgh, 2008.
- P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. *Commun. ACM*, 53(7): 89–97, 2010.
- SPARC, Inc. The SPARC Architecture Manual (version 9). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994.
- Sun Microsystems. http://docs.oracle.com/javase/1.5.0/docs/api/java/util/ concurrent/atomic/package-summary.html, 2004.