Fences
As well as loads, stores, and RMWs, there is one more kind of atomic operation
to be aware of: fences. Fences can be triggered by the
core::sync::atomic::fence
function, which accepts a single ordering
parameter and returns nothing. They don’t do anything on their own, but can be
thought of as events that strengthen the ordering of nearby atomic operations.
Acquire fences
The most common kind of fence is an acquire fence, which can be triggered in three different ways:
atomic::fence(atomic::Ordering::Acquire)
atomic::fence(atomic::Ordering::AcqRel)
atomic::fence(atomic::Ordering::SeqCst)
An acquire fence retroactively makes every single non-Acquire
operation that
was sequenced-before it act like an Acquire
operation that occurred at the
fence — in other words, it causes every prior Release
d value that was
previously loaded on the thread to synchronize-with the fence. For example, the
following code:
#![allow(unused)] fn main() { use std::sync::atomic::{self, AtomicU32}; static X: AtomicU32 = AtomicU32::new(0); // t_1 X.store(1, atomic::Ordering::Release); // t_2 let value = X.load(atomic::Ordering::Relaxed); atomic::fence(atomic::Ordering::Acquire); }
Can result in two possible executions:
Possible Execution 1 ┃ Possible Execution 2
┃
t_1 X t_2 ┃ t_1 X t_2
╭───────╮ ┌───┐ ╭───────╮ ┃ ╭───────╮ ┌───┐ ╭───────╮
│ store ├─┐ │ 0 │ ┌─┤ load │ ┃ │ store ├─┐ │ 0 ├───┤ load │
╰───────╯ │ └───┘ │ ╰───╥───╯ ┃ ╰───────╯ │ └───┘ ╰───╥───╯
└─↘───┐ │ ╭───⇓───╮ ┃ └─↘───┐ ╭───⇓───╮
│ 1 ├─┘┌→ fence │ ┃ │ 1 │ │ fence │
└───┴──┘╰───────╯ ┃ └───┘ ╰───────╯
In the first execution, t_1
’s store synchronizes-with and therefore
happens-before t_2
’s fence due to the prior load, but note that it does not
happen-before t_2
’s load.
Acquire fences work on any number of atomics, and on release sequences too. A more complex example is as follows:
#![allow(unused)] fn main() { use std::sync::atomic::{self, AtomicU32}; static X: AtomicU32 = AtomicU32::new(0); static Y: AtomicU32 = AtomicU32::new(0); // t_1 X.store(1, atomic::Ordering::Release); X.fetch_add(1, atomic::Ordering::Relaxed); // t_2 Y.store(1, atomic::Ordering::Release); // t_3 let x = X.load(atomic::Ordering::Relaxed); let y = Y.load(atomic::Ordering::Relaxed); atomic::fence(atomic::Ordering::Acquire); }
This can result in an execution like so:
t_1 X t_3 Y t_2
╭───────╮ ┌───┐ ╭───────╮ ┌───┐ ╭───────╮
│ store ├─┐ │ 0 │ ┌─┤ load │ │ 0 │ ┌─┤ store │
╰───╥───╯ │ └───┘ │ ╰───╥───╯ └───┘ │ ╰───────╯
╭───⇓───╮ └─↘───┐ │ ╭───⇓───╮ ┌───↙─┘
│ rmw ├─┐ │ 1 │ │ │ load ├───┤ 1 │
╰───────╯ │ └─┬─┘ │ ╰───╥───╯ ┌─┴───┘
└─┬─↓─┐ │ ╭───⇓───╮ │
│ 2 ├─┘┌→ fence ←─┘
└───┴──┘╰───────╯
There are two common scenarios in which acquire fences are used:
- When an
Acquire
ordering is only necessary when a specific value is loaded. For example, you may only wish to acquire when aninitialized
boolean istrue
, since otherwise you won’t be reading the shared state at all. In this case, you can load with aRelaxed
ordering and then issue anAcquire
fence afterward only if that condition is met, which can aid in performance sometimes (since the acquire operation is avoided wheninitialized == false
). - When several
Acquire
operations on different locations need to be performed in a row, but individually each operation doesn’t needAcquire
ordering; it is often faster to perform all the loads asRelaxed
first and use a singleAcquire
fence at the end then it is to make each one separately useAcquire
.
Release fences
Release fences are the natural complement to acquire fences, and they similarly can be triggered in three different ways:
atomic::fence(atomic::Ordering::Release)
atomic::fence(atomic::Ordering::AcqRel)
atomic::fence(atomic::Ordering::SeqCst)
Release fences convert every subsequent atomic access in the same thread into a
release operation that has its arrow starting from the fence — in other words,
every Acquire
operation that sees a value that was written by the fence’s
thread after the release fence will synchronize-with the release fence. For
example, the following code:
#![allow(unused)] fn main() { use std::sync::atomic::{self, AtomicU32}; static X: AtomicU32 = AtomicU32::new(0); // t_1 atomic::fence(atomic::Ordering::Release); X.store(1, atomic::Ordering::Relaxed); // t_2 X.load(atomic::Ordering::Acquire); }
Can result in this execution:
t_1 X t_2
╭───────╮ ┌───┐ ╭───────╮
│ fence ├─┐ │ 0 │ ┌─→ load │
╰───╥───╯ │ └───┘ │ ╰───────╯
╭───⇓───╮ └─↘───┐ │
│ store ├───┤ 1 ├─┘
╰───────╯ └───┘
As well as it being possible for a release fence to synchronize-with an acquire
load (fence–atomic synchronization) and a release store to synchronize-with an
acquire fence (atomic–fence synchronization), it is also possible for release
fences to synchronize with acquire fences (fence–fence synchronization). In this
code snippet, only fences and Relaxed
operations are used to establish a
happens-before relation (in some executions):
#![allow(unused)] fn main() { use std::sync::atomic::{self, AtomicU32}; static X: AtomicU32 = AtomicU32::new(0); // t_1 atomic::fence(atomic::Ordering::Release); X.store(1, atomic::Ordering::Relaxed); // t_2 X.load(atomic::Ordering::Relaxed); atomic::fence(atomic::Ordering::Acquire); }
The execution with the relation looks like this:
t_1 X t_2
╭───────╮ ┌───┐ ╭───────╮
│ fence ├─┐ │ 0 │ ┌─┤ load │
╰───╥───╯ │ └───┘ │ ╰───╥───╯
╭───⇓───╮ └─↘───┐ │ ╭───⇓───╮
│ store ├───┤ 1 ├─┘┌→ fence │
╰───────╯ └───┴──┘╰───────╯
Like with acquire fences, release fences can be used to optimize over a series
of atomic stores that don’t individually need to be Release
, since in some
conditions and on some architectures it’s faster to put a single release fence
at the start and use Relaxed
from that point on than it is to use Release
every time.
AcqRel
fences
AcqRel
fences are just the combined behaviour of an Acquire
fence and a
Release
fence in one operation. There isn’t much special to note about them,
other than that they behave more like an acquire fence followed by a release
fence than the other way around, which is useful to know in situations like the
following:
t_1 X t_2 Y t_3
╭───────╮ ┌───┐ ╭───────╮ ┌───┐ ╭───────╮
│ A │ │ 0 │ ┌─┤ load │ │ 0 │ ┌─→ load │
╰───╥───╯ └───┘ │ ╰───╥───╯ └───┘ │ ╰───╥───╯
╭───⇓───╮ ┌─↘───┐ │ ╭───⇓───╮┌──↘───┐ │ ╭───⇓───╮
│ store ├─┘ │ 1 ├─┘┌→ fence ├┘┌─┤ 1 ├─┘ │ B │
╰───────╯ └───┴──┘╰───╥───╯ │ └───┘ ╰───────╯
╭───⇓───╮ │
│ store ├─┘
╰───────╯
Here, A happens-before B, which is singularly due to the AcqRel
fence’s
ability to “carry over” happens-before relations within itself.
SeqCst
fences
SeqCst
fences are the strongest kind of fence. They first of all inherit the
behaviour from an AcqRel
fence, meaning they have both acquire and release
semantics at the same time, but being SeqCst
operations they also participate
in S. Just as with all other SeqCst
operations, their placement in S is
primarily determined by strongly happens-before relations (including the
mixed-SeqCst
caveat that comes with it), which then gives additional
guarantees to your code.
Namely, the power of SeqCst
fences can be summarized in three points:
- Everything that happens-before a
SeqCst
fence is not coherence-ordered-after anySeqCst
operation that the fence precedes in S. - Everything that happens-after a
SeqCst
fence is not coherence-ordered-before anySeqCst
operation that the fence succeeds in S. - Everything that happens-before a
SeqCst
fence X is not coherence-ordered-after anything that happens-after anotherSeqCst
fence Y, if X preceeds Y in S.
In C++11, the above three statements were similar, except they only talked about what was sequenced-before and sequenced-after the
SeqCst
fences; C++20 strengthened this to also include happens-before, because in practice this theoretical optimization was not being exploited by anybody. However do note that as of the time of writing, Miri only implements the old, weaker semantics and so you may see false positives when testing with it.
The “motivating use-case” for SeqCst
demonstrated in the SeqCst
chapter can
also be rewritten to use exclusively SeqCst
fences and Relaxed
operations,
by inserting fences in between the operations in the two threads:
a static X static Y b
╭─────────╮ ┌───────┐ ┌───────┐ ╭─────────╮
│ store X ├─┐ │ false │ │ false │ ┌─┤ store Y │
╰────╥────╯ │ └───────┘ └───────┘ │ ╰────╥────╯
╭────⇓────╮ └─┬───────┐ ┌───────┬─┘ ╭────⇓────╮
│ *fence* │ │ true │ │ true │ │ *fence* │
╰────╥────╯ └───────┘ └───────┘ ╰────╥────╯
╭────⇓────╮ ╭────⇓────╮
│ load Y ├─? ?─┤ load X │
╰─────────╯ ╰─────────╯
There are two executions to consider here, depending on which way round the
fences appear in S. Should a
’s fence appear first, the fence–fence SeqCst
guarantee tells us that b
’s load of X
is not coherence-ordered-after a
’s
store of X
, which forbids b
’s load of X
from seeing the value false
. The
same logic can be applied should the fences appear the other way around, proving
that at least one thread must load true
in the end.