Acquire and Release
Next, we’re going to try and implement one of the simplest concurrent utilities possible — a mutex, but without support for waiting (since that’s not really related to what we’re doing now). It will hold both an atomic flag that indicates whether it is locked or not, and the protected data itself. In code this translates to:
use std::cell::UnsafeCell;
use std::sync::atomic::AtomicBool;
pub struct Mutex<T> {
locked: AtomicBool,
data: UnsafeCell<T>,
}
impl<T> Mutex<T> {
pub const fn new(data: T) -> Self {
Self {
locked: AtomicBool::new(false),
data: UnsafeCell::new(data),
}
}
}
Now for the lock function. We need to use an RMW here, since we need to both
check whether it is locked and lock it if it isn’t in a single atomic step; this
can be most simply done with a compare_exchange
(unlike before, it doesn’t
need to be in a loop this time). For the ordering, we’ll just use Relaxed
since we don’t know of any others yet.
#![allow(unused)] fn main() { use std::cell::UnsafeCell; use std::sync::atomic::{self, AtomicBool}; pub struct Mutex<T> { locked: AtomicBool, data: UnsafeCell<T>, } impl<T> Mutex<T> { pub fn lock(&self) -> Option<Guard<'_, T>> { match self.locked.compare_exchange( false, true, atomic::Ordering::Relaxed, atomic::Ordering::Relaxed, ) { Ok(_) => Some(Guard(self)), Err(_) => None, } } } pub struct Guard<'mutex, T>(&'mutex Mutex<T>); // Deref impl omitted… }
We also need to implement Drop
for Guard
to make sure the lock on the mutex
is released once the guard is destroyed. Again we’re just using the Relaxed
ordering.
#![allow(unused)] fn main() { use std::cell::UnsafeCell; use std::sync::atomic::{self, AtomicBool}; pub struct Mutex<T> { locked: AtomicBool, data: UnsafeCell<T>, } pub struct Guard<'mutex, T>(&'mutex Mutex<T>); impl<T> Drop for Guard<'_, T> { fn drop(&mut self) { self.0.locked.store(false, atomic::Ordering::Relaxed); } } }
Great! In the normal operation then, this primitive should allow unique access to the data of the mutex to be transferred across different threads. Usual usage could look like this:
// Initial state
let mutex = Mutex::new(0);
// Thread 1
if let Some(guard) = mutex.lock() {
*guard += 1;
}
// Thread 2
if let Some(guard) = mutex.lock() {
println!("{}", *guard);
}
Now, there are many possible executions of this code. For example, Thread 2 (the reader thread) could lock the mutex first, and Thread 1 (the writer thread) could fail to lock it:
Thread 1 locked data Thread 2
╭───────╮ ┌────────┐ ┌───┐ ╭───────╮
│ cas ├─┐ │ false │ │ 0 ├╌┐ ┌─┤ cas │
╰───────╯ │ └────────┘ └───┘ ┊ │ ╰───╥───╯
│ ┌────────┬───────┼─┘ ╭───⇓───╮
└─┤ true │ └╌╌╌┤ guard │
└────────┘ ╰───╥───╯
┌────────┬─────────┐ ╭───⇓───╮
│ false │ └─┤ store │
└────────┘ ╰───────╯
Or potentially Thread 1 could lock the mutex first, and Thread 2 could fail to lock it:
Thread 1 locked data Thread 2
╭───────╮ ┌────────┐ ┌───┐ ╭───────╮
│ cas ├─┐ │ false │ ┌─│ 0 │───┤ cas │
╰───╥───╯ │ └────────┘ │┌┼╌╌╌┤ ╰───────╯
╭───⇓───╮ └─┬────────┐ │├┼╌╌╌┤
│ += 1; ├╌┐ │ true ├─┘┊│ 1 │
╰───╥───╯ ┊ └────────┘ ┊└───┘
╭───⇓───╮ └╌╌╌╌╌╌╌╌╌╌╌╌╌┘
│ store ├───┬────────┐
╰───────╯ │ false │
└────────┘
But the interesting case comes in when Thread 1 successfully locks and unlocks the mutex, and then Thread 2 locks it. Let’s draw that one out too:
Thread 1 locked data Thread 2
╭───────╮ ┌────────┐ ┌───┐ ╭───────╮
│ cas ├─┐ │ false │ │ 0 │ ┌───┤ cas │
╰───╥───╯ │ └────────┘ ┌┼╌╌╌┤ │ ╰───╥───╯
╭───⇓───╮ └─┬────────┐ ├┼╌╌╌┤ │ ╭───⇓───╮
│ += 1; ├╌┐ │ true │ ┊│ 1 │ │ ?╌┤ guard │
╰───╥───╯ ┊ └────────┘ ┊└───┘ │ ╰───╥───╯
╭───⇓───╮ └╌╌╌╌╌╌╌╌╌╌╌╌╌┘ │ ╭───⇓───╮
│ store ├───┬────────┐ │ ┌─┤ store │
╰───────╯ │ false │ │ │ ╰───────╯
└────────┘ │ │
┌────────┬─────────┘ │
│ true │ │
└────────┘ │
┌────────┬───────────┘
│ false │
└────────┘
Look at the second operation Thread 2 performs (the read of data
), for which
we haven’t yet joined the line. Where should it connect to? Well actually, it
has multiple options…wait, we’ve seen this before! It’s a data race!
That’s not good. Last time the solution was to use atomics instead — but in this
case that doesn’t seem to be enough, since even if atomics were used it still
would have the option of reading 0
instead of 1
, and really if we want our
mutex to be sane, it should only be able to read 1
.
So it seems that what we want is to be able to apply the coherence rules from
before to completely rule out zero from the set of the possible values — if we
were able to draw a large arrow from the Thread 1’s += 1;
to Thread 2’s
guard
, then we could trivially then use the rule to rule out 0
as a value
that could be read.
This is where the Acquire
and Release
orderings come in. Informally put, a
release store will cause an arrow instead of a line to be drawn from the
operation to the destination; and similarly an acquire load will cause an
arrow to be drawn from the destination to the operation. To give a useless
example that illustrates this, for the given program:
#![allow(unused)] fn main() { use std::sync::atomic::{self, AtomicU32}; // Initial state let a = AtomicU32::new(0); // Thread 1 a.store(1, atomic::Ordering::Release); // Thread 2 a.load(atomic::Ordering::Acquire); }
The two possible executions look like this:
Possible Execution 1 ┃ Possible Execution 2
┃
Thread 1 a Thread 2 ┃ Thread 1 a Thread 2
╭───────╮ ┌───┐ ╭──────╮ ┃ ╭───────╮ ┌───┐ ╭──────╮
│ store ├─┐ │ 0 │ ┌─→ load │ ┃ │ store ├─┐ │ 0 ├───→ load │
╰───────╯ │ └───┘ │ ╰──────╯ ┃ ╰───────╯ │ └───┘ ╰──────╯
└─↘───┐ │ ┃ └─↘───┐
│ 1 ├─┘ ┃ │ 1 │
└───┘ ┃ └───┘
These arrows are a new kind of arrow we haven’t seen yet; they are known as happens-before (or happens-after) relations and are represented as thin arrows (→) on these diagrams. They are weaker than the sequenced-before double-arrows (⇒) that occur inside a single thread, but can still be used with the coherence rules to determine which values of a memory location are valid to read.
When a happens-before arrow stores a data value to an atomic (via a release
operation) which is then loaded by another happens-before arrow (via an acquire
operation) we say that the release operation synchronized-with the acquire
operation, which in doing so establishes that the release operation
happens-before the acquire operation. Therefore, we can say that in the first
possible execution, Thread 1’s store
synchronizes-with Thread 2’s load
,
which causes that store
and everything sequenced-before it to happen-before
the load
and everything sequenced-after it.
More formally, we can say that A happens-before B if any of the following conditions are true:
- A is sequenced-before B (i.e. A occurs before B on the same thread)
- A synchronizes-with B (i.e. A is a
Release
operation and B is anAcquire
operation that reads the value written by A)- A happens-before X, and X happens-before B (transitivity)
There is one more rule required for these to be useful, and that is release sequences: after a release store is performed on an atomic, happens-before arrows will connect together each subsequent value of the atomic as long as the new value is caused by an RMW and not just a plain store (this means any subsequent normal store, no matter the ordering, will end the sequence).
In the C++11 memory model, any subsequent store by the same thread that performed the original
Release
store would also contribute to the release sequence. However, this was removed in C++20 for simplicity and better optimizations and so must not be relied upon.
With those rules in mind, converting Thread 1’s second store to use a Release
ordering as well as converting Thread 2’s CAS to use an Acquire
ordering
allows us to effectively draw that arrow we needed before:
Thread 1 locked data Thread 2
╭───────╮ ┌───────┐ ┌───┐ ╭───────╮
│ cas ├─┐ │ false │ │ 0 │ ┌───→ cas │
╰───╥───╯ │ └───────┘ ┌┼╌╌╌┤ │ ╰───╥───╯
╭───⇓───╮ └─┬───────┐ ├┼╌╌╌┤ │ ╭───⇓───╮
│ += 1; ├╌┐ │ true │ ┊│ 1 ├╌│╌╌╌┤ guard │
╰───╥───╯ ┊ └───────┘ ┊└───┘ │ ╰───╥───╯
╭───⇓───╮ └╌╌╌╌╌╌╌╌╌╌╌╌┘ │ ╭───⇓───╮
│ store ├───↘───────┐ │ ┌─┤ store │
╰───────╯ │ false │ │ │ ╰───────╯
└───┬───┘ │ │
┌───↓───┬─────────┘ │
│ true │ │
└───────┘ │
┌───────┬───────────┘
│ false │
└───────┘
We now can trace back along the reverse direction of arrows from the guard
bubble to the += 1
bubble; we have established that Thread 2’s load
happens-after the += 1
side effect, because Thread 2’s CAS synchronizes-with
Thread 1’s store. This both avoids the data race and gives the guarantee that
1
will be always read by Thread 2 (as long as it locks after Thread 1, of
course).
However, that is not the only execution of the program possible. Even with this setup, there is another execution that can also cause UB: if Thread 2 locks the mutex before Thread 1 does.
Thread 1 locked data Thread 2
╭───────╮ ┌───────┐ ┌───┐ ╭───────╮
│ cas ├───┐ │ false │┌──│ 0 │────→ cas │
╰───╥───╯ │ └───────┘│ ┌┼╌╌╌┤ ╰───╥───╯
╭───⇓───╮ │ ┌───────┬┘ ├┼╌╌╌┤ ╭───⇓───╮
│ += 1; ├╌┐ │ │ true │ ┊│ 1 │ ?╌┤ guard │
╰───╥───╯ ┊ │ └───────┘ ┊└───┘ ╰───╥───╯
╭───⇓───╮ └╌│╌╌╌╌╌╌╌╌╌╌╌╌┘ ╭───⇓───╮
│ store ├─┐ │ ┌───────┬────────────┤ store │
╰───────╯ │ │ │ false │ ╰───────╯
│ │ └───────┘
│ └─┬───────┐
│ │ true │
│ └───────┘
└───↘───────┐
│ false │
└───────┘
Once again guard
has multiple options for values to read. This one’s a bit
more counterintuitive than the previous one, since it requires “travelling
forward in time” to understand why the 1
is even there in the first place —
but since the abstract machine has no concept of time, it’s just a valid UB as
any other.
Luckily, we’ve already solved this problem once, so it easy to solve again: just
like before, we’ll have the CAS become acquire and the store become release, and
then we can use the second coherence rule from before to follow forward the
arrow from the guard
bubble all the way to the += 1;
, determining that it is
only possible for that read to see 0
as its value, as in the execution below.
Thread 1 locked data Thread 2
╭───────╮ ┌───────┐ ┌───┐ ╭───────╮
│ cas ←───┐ │ false │┌──│ 0 ├╌┐──→ cas │
╰───╥───╯ │ └───────┘│ ┌┼╌╌╌┤ ┊ ╰───╥───╯
╭───⇓───╮ │ ┌───────┬┘ ├┼╌╌╌┤ ┊ ╭───⇓───╮
│ += 1; ├╌┐ │ │ true │ ┊│ 1 │ └─╌┤ guard │
╰───╥───╯ ┊ │ └───────┘ ┊└───┘ ╰───╥───╯
╭───⇓───╮ └╌│╌╌╌╌╌╌╌╌╌╌╌╌┘ ╭───⇓───╮
│ store ├─┐ │ ┌───────↙────────────┤ store │
╰───────╯ │ │ │ false │ ╰───────╯
│ │ └───┬───┘
│ └─┬───↓───┐
│ │ true │
│ └───────┘
└───↘───────┐
│ false │
└───────┘
This leads us to the proper memory orderings for any mutex (and other locks like
RW locks too, even): use Acquire
to lock it, and Release
to unlock it. So
let’s go back to and update our original mutex definition with this knowledge.
But wait, compare_exchange
takes two ordering parameters, not just one! That’s
right — it also takes a second one to apply when the exchange fails (in our case,
when the mutex is already locked). But we don’t need an Acquire
here, since in
that case we won’t be reading from the data
value anyway, so we’ll just stick
with Relaxed
.
impl<T> Mutex<T> {
pub fn lock(&self) -> Option<Guard<'_, T>> {
match self.locked.compare_exchange(
false,
true,
atomic::Ordering::Acquire,
atomic::Ordering::Relaxed,
) {
Ok(_) => Some(Guard(self)),
Err(_) => None,
}
}
}
impl<T> Drop for Guard<'_, T> {
fn drop(&mut self) {
self.0.locked.store(false, atomic::Ordering::Release);
}
}
Note that similarly to how atomic operations only make sense when paired with
other atomic operations on the same locations, Acquire
only makes sense when
paired with Release
and vice versa. That is, both an Acquire
with no
corresponding Release
and a Release
with no corresponding Acquire
are
useless, since the arrows will be unable to connect to anything.