Modular Errors in Rust
It is thankfully common wisdom nowadays that documentation must be placed as near as possible to the code it documents, and should be fine-grained to a minimal unit of describability (the thing being documented). The practice provides numerous benefits to the codebase and project as a whole:
- When editing the source code, contributors are less likely to forget to update the documentation as well, ensuring it is kept up-to-date and accurate.
- When reading the source code, reviewers can easily jump back and forth between the docs and the code it documents, helping them understand it and allowing them to contrast the expected with actual behaviour.
- The codebase becomes more modular. Individual parts can be extracted into different crates or projects if necessary, and strong abstraction boundaries make the code easier to understand in small pieces.
But you probably already knew this; after all, Rust made the excellent design choice of making it the by far easiest method of writing documentation at all. And you probably also know that these same principles apply to tests: when unit tests are kept next to their minimum unit of checkability, you get the same benefits of convenient updating, assisted understanding and modularity. And most Rust projects do use unit tests in this way (when they can, for often there are limitations that prevent it from working), which again we can thank the tooling for.
But that’s all old news. What I’m here to convince you of today is that this principle applies additionally to error types: that is, error types should be located near to their unit of fallibility. To illustrate this point, I will follow the initial development and later API improvement of a hypothetical Rust library.
Case Study: A Blocks.txt Parser
Suppose you’re a library author, and you’re working on a crate to implement the parsing of Blocks.txt in the Unicode Character Database. If you’re not familiar with this file, it defines the list of so-called Unicode blocks, which are non-overlapping contiguous categories that Unicode characters can be sorted into. It looks a bit like this:
0000..007F; Basic Latin
0080..00FF; Latin-1 Supplement
0100..017F; Latin Extended-A
0180..024F; Latin Extended-B
0250..02AF; IPA Extensions
This file tells you that, for example, the character “½”, U+00BD
, is in the block “Latin-1 Supplement” because 0x0080 ≤ 0x00BD ≤ 0x00FF
. Every character has an associated block; characters which have not yet been assigned a block in the file above are considered to be in the special pseudo-block No_Block
.
So let’s get started on a Rust parser. The specification for the format is given by section 4.2 of Unicode Annex #44, but the format is so trivial you could almost guess it. Upon seeing this task, a typical Rustacean may write code like this:
//! This crate provides tools for working with Unicode blocks and its data files.
Now we need to define an error type, so let’s just follow the “big enum
” convention and bash out some boilerplate that gets the job done:
/// An error in this library.
Lastly, a couple other bits and imports go at the end:
pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt";
use cmp;
use fmt;
use Display;
use Formatter;
use fs;
use io;
use ParseIntError;
use RangeInclusive;
use Path;
use FromStr;
And we’re done.
There are a few small things to note with this code just before we move on:
- I omitted documentation, since it’s not relevant to the real example; in actual code, all the public items would be documented. Similarly, unit tests are omitted.
- In a real library, one would not hard-depend on ureq and
std
and would use feature-flags instead, but again I omitted that for this example. - You might have noticed I put my imports on separate lines each at bottom — I do have my reasons for this, but that’s best saved for another day ;)
Blocks
implementsFromStr
, but not. This is actually intentional, because despite being nearly identical traits signature-wise they mean two very different things:
FromStr
implies parsing from a string whereasis for when your data type is a subset of all strings. In our case,
FromStr
is the correct one to use.- The
Display
implementation ofError
formats error messages likeno semicolon
in lowercase and without a full stop at the end — this is in accordance with conventions established by the Standard Library (“Error messages are typically concise lowercase sentences without trailing punctuation”). A common pitfall of both new and experienced Rustaceans is using incorrect casing for error messages. - Another common pitfall is naming things like what we’ve named
Error::Io
asError::IoError
instead. Simply: you don’t need theError
suffix, it says it in the name already! - One could use the thiserror crate to shorten the code by using a
. Personally, I would never use this for a library crate since it’s really not that many lines of code saved for a whole extra dependency, but you might want to know about it.
- The
Ureq
variant of theError
enum is boxed becauseureq::Error
is actually very large and Clippy complains about it.
So there we have it: our perfect little library, let’s go off and publish it to crates.io.
What we’ve written so far, with regard to error handling, is what I’d say most libraries on crates.io do. It’s by far the most common way of handling errors: just stick everything in a big enum
of “different ways things can go wrong in the library” and don’t think about it after that. But unfortunately, while it is common it is not exactly good, for a few reasons the rest of this post will be covering.
Problem 1: Backtraces
Suppose you then decide to use your library in a CLI application; and as per usual advice and your own experience, you decide to use anyhow to handle the errors in it. So you write out all your code and it looks a little like this:
use Blocks;
Looks good, so you go ahead and run it — only, you’re rather abruptly met with:
Error: invalid digit found in string
Um, okay. That doesn’t help us very much at all. What went wrong here?
Well, much pain and many dbg!
statements later, you discover that the culprit is that somehow, on line 223 of Blocks.txt
you replaced a 0
with an O
. Oops!
-10840..1O85F; Imperial Aramaic
+10840..1085F; Imperial Aramaic
10860..1087F; Palmyrene
10800..1083F; Cypriot Syllabary
And then you run it again and it works fine.
But it didn’t have to be this hard. The error message could have displayed something more useful, and maybe this is just a pipe dream, but I’ve seen anyhow
emit this sort of thing before:
Error: error reading `Blocks.txt`
Caused by:
0: invalid Blocks.txt data on line 223
1: one end of range is not a valid hexidecimal integer
2: invalid digit found in string
That’s so much more helpful — you wouldn’t ever have had to suspect init_something
and init_something_else
as potential causes of the error, or even search Blocks.txt
for mistakes, it completely guides you to exactly where it went wrong!
Oh well, you say to yourself, at least this time it was decently obvious where the source of the error came from; at least I wasn’t getting a file not found error from TcpListener::bind
(the natural conclusion to this kind of “flat”-style error handling). But wouldn’t it be nice if all errors came with backtrace and context tracking built-in?
Problem 2: Inextensibility
At least one of the things in the above image looks feasible to fix though: adding line numbers as context to the error messages. All we have to do is return to our Error
enum and add more fields to the NoSemicolon
, and NoDotDot
, and ParseInt
, variants:
Except… we can’t do that without breaking backward compatibility, because while the enum itself is the individual variants aren’t, meaning you’ve fixed them to forever have the fields they do currently (without breaking changes).
Problem 3: Error Matching
Okay, so back to the application. You’ve now realized that you still want to call from_file
, but if it fails with a “file not found” error you actually want to download the file automatically instead of exiting the program entirely. We have to match on the Result
for that:
let blocks = match from_file ;
Great! But the compiler is yelling that the match
arms aren’t exhaustive. Not too hard to fix, let’s look at the cases we need to deal with:
NoSemicolon
,NoDotDot
,ParseInt
: Those are pretty obvious, they look like parsing errors, so we can just propagate them.Io
: Other I/O errors than “file not found” can also safely be propagated.Ureq
: Ummm…? Wait, is this function doing HTTP requests? Let me check the source code again… [please stand by…] oh okay, so it’s not. Then I could add anunreachable!
here which would be correct and indicates semantics nicely; on the other hand, nowhere is it written in the documentation of the API that it won’t ever return this, so maybe I should just propagate it anyway?- Oh, and I forgot, we added
#[non_exhaustive]
toso there’s always the possibility of it returning a variant that doesn’t exist yet. Well, I guess we can just propagate it anyway.
So, this situation isn’t ideal. The library doesn’t document anywhere what errors a given function can return, so users are often left shooting in the dark. From personal experience, there have been many times I have seen an error variant which was appropriate for me to catch, then I had to spend ages digging around in the source code to find out whether it was actually generated or not — and even an answer to that doesn’t constitute an API guarantee that it will or won’t be in future.
Another issue with the code that we’ve written is that it’s entirely non-obvious that our match
arm refers specifically to the Blocks.txt
file not being found. The arm itself just says “check if an I/O not found error occurred”, but in theory, and especially for more complex functions, an I/O not found error could mean one of several different things that the user can no longer differentiate between because they were all put together in a single Io
variant.
Problem 4: Privacy and Stability
One very common mistake libraries make with this style of big-enum
error is accidentally exposing dependencies intended to be private in their public API through error types. In our example code, suppose std::fs
and io::Error
weren’t part of the standard library but were rather types from an external library that was on version 0.4
. Now, when they bump their version to 0.5
I also have to make a breaking change to update it to the newer version, because I exposed the io::Error
type in my public API through the Error
enum, even though I never expose my usage of the library anywhere else (it’s covered up by the opaque interface of from_file
). The same issue occurs if I tried to switch out my usage of that library for a different one; it also forbids me from ever releasing 1.0 until the dependency library also reaches 1.0 as per the C-STABLE API requirement.
This is hard to fix with this approach to errors, because enum
data is hardcoded to always use inherited visibility, meaning if the outer enum
fields are public all inner fields are too. Private fields are also useful in errors in general, for reasons other than stability: private fields are just generally a nice feature to have on types.
Problem 5: Non-Modularity
And lastly, touching back on what I mentioned at the beginning of this article: this approach to error handling is non-modular. I couldn’t easily take a component alone, like the parser, and extract it to a different crate, because I’d have to change many APIs or otherwise hack around it. Every API is interconnected with each other through the underlying error type, tying the crate together in a big knot that makes it difficult to untangle and remove stuff.
This kind of non-modularity also makes the codebase more difficult to understand: one is forced, to a greater degree, to learn the entire codebase at once to work on it, rather than learn it piece by piece, a far preferable way of learning.
Guidelines for Good Errors
So the current error type we have has problems. But how do we fix them? And this is where we bring in that principle from the start:
Error types should be located near to their unit of fallibility.
The key phrase here is “unit of fallibility”. What are the units of fallibility in our library? Well, it’s certainly not the library itself — the library is just a way of interacting with Unicode blocks, and it’s not like that can particularly fail. The only libraries that would have the entire library as a unit of fallibility are those whose only purpose is to perform a single operation (they typically have an API surface of no more than two functions, maybe a Params
builder type, and nothing more).
This tells us that the unicode_blocks::Error
type is inherently misguided. Rather, the units of fallibility in our case are the operations we do, like downloading, reading a file, and parsing.
Now, things get a little subjective at this point on deciding what counts as two separate units or the same unit. In general, you should ask yourself the following two questions:
- Do they have different ways in which they can fail?
- Should they show different error messages should they fail?
If the answer to either of those questions is “yes”, then they should normally be separate error types.
For us, this means we actually want three separate error types:
FromFileError
, for errors inBlocks::from_file
;DownloadError
, for errors inBlocks::download
;ParseError
, for errors infrom_str
.
Leveraging the .source()
method
Earlier, we said we wanted our error messages (printed with anyhow
) to look good, like this:
Error: error reading `Blocks.txt`
Caused by:
0: invalid Blocks.txt data on line 223
1: one end of range is not a valid hexidecimal integer
2: invalid digit found in string
So how do we get anyhow
to print this? It turns out what the library calls internally is the Error::source()
method, a default-implemented method of the Error
trait that tells you the cause of an error. What we see in the above graphic depicts:
- an error type (we know to be
FromFileError
) whoseDisplay
implementation prints “error readingBlocks.txt
”, and whosesource
is… - …another error type, whose
Display
implementation prints “invalid Blocks.txt data on line 223”, and whosesource
is… - …another error type, whose
Display
implementation prints “one end of range is not a valid hexidecimal integer”, and whosesource
is… - …another error type (we know to be
ParseIntError
) whoseDisplay
implementation prints “invalid digit found in string” and whosesource
isNone
.
That might seem like a lot of layers, but they all map very nicely to our code: layer 1 is a FromFileError
, layer 2 has to be our ParseError
, layer 3 has to be something contained within the ParseError
, and layer 4 is ParseIntError
.
This leads us to a much nicer structure for the error types in the from_file
API.
This error:
- has very good backtraces, as it implements
Display
andsource()
well; - is extensible, as the
struct
is attributed with;
- supports precise error matching, as we’ve now automatically given the public API guarantee that we won’t produce HTTP errors from our function, so our users needn’t worry about dealing with that case;
- makes it clear where the
io::Error
s can come from, because the variant is namedReadFile
instead of simplyIo
; - would easily be able to adjust to support hiding
io::Error
from the public API surface simply by makingkind
andFromFileErrorKind
private; - is entirely modular, being conceptually contained within the
from_file
logic portion of the code, so it can be extracted, learnt independently, et cetera.
ParseError
can be defined in a somewhat similar fashion, also with the above benefits.
Note that the enum
variants themselves are , so that they can be extended in future with more information.
There is a slight deviation from FromFileError
’s design here, that its corresponding *Kind
type actually implements Display
and Error
in and of itself instead of simply existing as a data holder for other error types. The logic is that while we could separate make unit structs for NoSemicolon
, NoDotDot
and ParseInt
, it just isn’t very necessary here (where on the other hand io::Error
is an external type and ParseError
is required to be a distinct type because of FromStr
). However, sometimes it is still better to make unit structs: it depends on the use case.
Finally, DownloadError
showcases a similar pattern (although it’s not that interesting at this point):
Note that we could have merged DownloadErrorKind
and DownloadError
into a single type; I chose not to here in favour of extensibility, because it seems quite possible that one would want to add more fields to DownloadError
in future. But for some cases it definitely makes sense.
Constructing the error types
If you try to implement the functions that return these error types, you’ll quickly run into something rather annoying: they require quite a bit of boilerplate to use. For example, the body of from_file
now looks like this:
Yeah, not the prettiest. Unfortunately, I don’t think there’s much we can actually do here; once we get try
blocks it’ll definitely be nicer, but it seems to be an unavoidable cost of many good error-handling schemes.
On From
One thing notably omitted from the definitions of the new error types was implementations of From
for inner types. There is no problem with them really, one just has to be careful that it (a) works with extensibility and (b) actually makes sense. For example, taking FromFileErrorKind
:
While it does make sense to implement From<ParseError>
, because Parse
is literally the name of one of the variants of FromFileErrorKind
, it does not make sense to implement From<io::Error>
because such an implementation would implicitly add meaning that one failed during the process of reading the file from disk (as the variant is named ReadFile
instead of Io
). Constraining the meaning of “any I/O error” to “an error reading the file from the disk” is helpful but should not be done implicitly, thus rendering From
inappropriate.
On “nearness”
One part of my principle of errors I haven’t yet touched on is the aspect of “nearness”; that errors should, as well as having an appropriate associated unit of fallibility, be sufficiently near to it. The fact is, with Rust’s current design you can’t put them as close as I’d like without sacrificing documentation quality. That is, while you’d ideally write something like:
This just makes your rustdoc look bad, since the blocks are needlessly separated. So usually I end up writing something more like:
It’s unfortunate, but I don’t think it’s terrible — you still get most the benefits of nearness.
The only thing to make sure of is that they stay in the same module; this same concept of “nearness” is a similar reason why one should be extremely wary of any module named “errors”, which is of equal organizational value to having a drawer labelled “medium-sized and flat”.
Verbosity
Possibly the biggest objection to this style of error is the sheer number of lines of code required to implement it; error types aren’t a trivial number of lines, and making a new error type for every function can easily hugely increase the number of lines a library needs. This is definitely a valid criticism, I also find it tiresome to write the same things over and over again, but let me also offer an alternate perspective: rather than seeing it as simply a more verbose way to do the same thing, see it as due treatment for an oft ignored area.
Traditionally, errors as something to be pushed to the side as soon as possible to get on with “real” logic. But the art of resilient, reliable and user-friendly systems considers all outcomes, not just the successful one. As a success story, look no further than the Rust compiler itself; I don’t think it would be an exaggeration to say that Rust enjoys the current popularity it does because of how good its error messages are, and how much effort was put into it.
Conclusion
This post is not here to give you a structure that you should follow for your errors. The structure I used as an example in this post had one specific use case, and filled it appropriately. If you find you can apply the same structure to your own code and it works well, then great! But really, what post is for is to get people to start caring about errors, putting actual thought into their designs, and learning how to elegantly pull off ever-present balancing act between the five goals of good backtraces, extensibility, inspectability (matching), stability and modularity.
If there’s one thing I wish for you to take away, it’s that error handling is hard, but it’s worth it to learn. Because I’m tired of having to deal with lazy kitchen-sink-type errors.
The final code
//! This crate provides types for UCD’s `Blocks.txt`.
pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt";
use cmp;
use Error;
use fmt;
use Display;
use Formatter;
use fs;
use io;
use ParseIntError;
use RangeInclusive;
use Path;
use FromStr;