Modular Errors in Rust
It is thankfully common wisdom nowadays that documentation must be placed as near as possible to the code it documents, and should be fine-grained to a minimal unit of describability (the thing being documented). The practice provides numerous benefits to the codebase and project as a whole:
- When editing the source code, contributors are less likely to forget to update the documentation as well, ensuring it is kept up-to-date and accurate.
- When reading the source code, reviewers can easily jump back and forth between the docs and the code it documents, helping them understand it and allowing them to contrast the expected with actual behaviour.
- The codebase becomes more modular. Individual parts can be extracted into different crates or projects if necessary, and strong abstraction boundaries make the code easier to understand in small pieces.
But you probably already knew this; after all, Rust made the excellent design choice of making it the by far easiest method of writing documentation at all. And you probably also know that these same principles apply to tests: when unit tests are kept next to their minimum unit of checkability, you get the same benefits of convenient updating, assisted understanding and modularity. And most Rust projects do use unit tests in this way (when they can, for often there are limitations that prevent it from working), which again we can thank the tooling for.
But that’s all old news. What I’m here to convince you of today is that this principle applies additionally to error types: that is, error types should be located near to their unit of fallibility. To illustrate this point, I will follow the initial development and later API improvement of a hypothetical Rust library.
Case Study: A Blocks.txt Parser
Suppose you’re a library author, and you’re working on a crate to implement the parsing of Blocks.txt in the Unicode Character Database. If you’re not familiar with this file, it defines the list of so-called Unicode blocks, which are non-overlapping contiguous categories that Unicode characters can be sorted into. It looks a bit like this:
0000..007F; Basic Latin 0080..00FF; Latin-1 Supplement 0100..017F; Latin Extended-A 0180..024F; Latin Extended-B 0250..02AF; IPA Extensions
This file tells you that, for example, the character “½”,
U+00BD, is in the block “Latin-1 Supplement” because
0x0080 ≤ 0x00BD ≤ 0x00FF. Every character has an associated block; characters which have not yet been assigned a block in the file above are considered to be in the special pseudo-block
So let’s get started on a Rust parser. The specification for the format is given by section 4.2 of Unicode Annex #44, but the format is so trivial you could almost guess it. Upon seeing this task, a typical Rustacean may write code like this:
//! This crate provides tools for working with Unicode blocks and its data files.
Now we need to define an error type, so let’s just follow the “big
enum ” convention and bash out some boilerplate that gets the job done:
/// An error in this library.
Lastly, a couple other bits and imports go at the end:
pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt"; use cmp; use fmt; use Display; use Formatter; use fs; use io; use ParseIntError; use RangeInclusive; use Path; use FromStr;
And we’re done.
There are a few small things to note with this code just before we move on:
- I omitted documentation, since it’s not relevant to the real example; in actual code, all the public items would be documented. Similarly, unit tests are omitted.
- In a real library, one would not hard-depend on ureq and
stdand would use feature-flags instead, but again I omitted that for this example.
- You might have noticed I put my imports on separate lines each at bottom — I do have my reasons for this, but that’s best saved for another day ;)
FromStr, but not
. This is actually intentional, because despite being nearly identical traits signature-wise they mean two very different things:
FromStrimplies parsing from a string whereas
is for when your data type is a subset of all strings. In our case,
FromStris the correct one to use.
Errorformats error messages like
no semicolonin lowercase and without a full stop at the end — this is in accordance with conventions established by the Standard Library (“Error messages are typically concise lowercase sentences without trailing punctuation”). A common pitfall of both new and experienced Rustaceans is using incorrect casing for error messages.
- Another common pitfall is naming things like what we’ve named
Error::IoErrorinstead. Simply: you don’t need the
Errorsuffix, it says it in the name already!
- One could use the thiserror crate to shorten the code by using a
. Personally, I would never use this for a library crate since it’s really not that many lines of code saved for a whole extra dependency, but you might want to know about it.
Ureqvariant of the
Errorenum is boxed because
ureq::Erroris actually very large and Clippy complains about it.
So there we have it: our perfect little library, let’s go off and publish it to crates.io.
What we’ve written so far, with regard to error handling, is what I’d say most libraries on crates.io do. It’s by far the most common way of handling errors: just stick everything in a big
enum of “different ways things can go wrong in the library” and don’t think about it after that. But unfortunately, while it is common it is not exactly good, for a few reasons the rest of this post will be covering.
Problem 1: Backtraces
Suppose you then decide to use your library in a CLI application; and as per usual advice and your own experience, you decide to use anyhow to handle the errors in it. So you write out all your code and it looks a little like this:
Looks good, so you go ahead and run it — only, you’re rather abruptly met with:
Error: invalid digit found in string
Um, okay. That doesn’t help us very much at all. What went wrong here?
Well, much pain and many
dbg! statements later, you discover that the culprit is that somehow, on line 223 of
Blocks.txt you replaced a
0 with an
-10840..1O85F; Imperial Aramaic +10840..1085F; Imperial Aramaic 10860..1087F; Palmyrene 10800..1083F; Cypriot Syllabary
And then you run it again and it works fine.
But it didn’t have to be this hard. The error message could have displayed something more useful, and maybe this is just a pipe dream, but I’ve seen
anyhow emit this sort of thing before:
Error: error reading `Blocks.txt` Caused by: 0: invalid Blocks.txt data on line 223 1: one end of range is not a valid hexidecimal integer 2: invalid digit found in string
That’s so much more helpful — you wouldn’t ever have had to suspect
init_something_else as potential causes of the error, or even search
Blocks.txt for mistakes, it completely guides you to exactly where it went wrong!
Oh well, you say to yourself, at least this time it was decently obvious where the source of the error came from; at least I wasn’t getting a file not found error from
TcpListener::bind (the natural conclusion to this kind of “flat”-style error handling). But wouldn’t it be nice if all errors came with backtrace and context tracking built-in?
Problem 2: Inextensibility
At least one of the things in the above image looks feasible to fix though: adding line numbers as context to the error messages. All we have to do is return to our
Error enum and add more fields to the
Except… we can’t do that without breaking backward compatibility, because while the enum itself is
the individual variants aren’t, meaning you’ve fixed them to forever have the fields they do currently (without breaking changes).
Problem 3: Error Matching
Okay, so back to the application. You’ve now realized that you still want to call
from_file, but if it fails with a “file not found” error you actually want to download the file automatically instead of exiting the program entirely. We have to match on the
Result for that:
let blocks = match from_file ;
Great! But the compiler is yelling that the
match arms aren’t exhaustive. Not too hard to fix, let’s look at the cases we need to deal with:
ParseInt: Those are pretty obvious, they look like parsing errors, so we can just propagate them.
Io: Other I/O errors than “file not found” can also safely be propagated.
Ureq: Ummm…? Wait, is this function doing HTTP requests? Let me check the source code again… [please stand by…] oh okay, so it’s not. Then I could add an
unreachable!here which would be correct and indicates semantics nicely; on the other hand, nowhere is it written in the documentation of the API that it won’t ever return this, so maybe I should just propagate it anyway?
- Oh, and I forgot, we added
so there’s always the possibility of it returning a variant that doesn’t exist yet. Well, I guess we can just propagate it anyway.
So, this situation isn’t ideal. The library doesn’t document anywhere what errors a given function can return, so users are often left shooting in the dark. From personal experience, there have been many times I have seen an error variant which was appropriate for me to catch, then I had to spend ages digging around in the source code to find out whether it was actually generated or not — and even an answer to that doesn’t constitute an API guarantee that it will or won’t be in future.
Another issue with the code that we’ve written is that it’s entirely non-obvious that our
match arm refers specifically to the
Blocks.txt file not being found. The arm itself just says “check if an I/O not found error occurred”, but in theory, and especially for more complex functions, an I/O not found error could mean one of several different things that the user can no longer differentiate between because they were all put together in a single
Problem 4: Privacy and Stability
One very common mistake libraries make with this style of big-
enum error is accidentally exposing dependencies intended to be private in their public API through error types. In our example code, suppose
io::Error weren’t part of the standard library but were rather types from an external library that was on version
0.4. Now, when they bump their version to
0.5 I also have to make a breaking change to update it to the newer version, because I exposed the
io::Error type in my public API through the
Error enum, even though I never expose my usage of the library anywhere else (it’s covered up by the opaque interface of
from_file). The same issue occurs if I tried to switch out my usage of that library for a different one; it also forbids me from ever releasing 1.0 until the dependency library also reaches 1.0 as per the C-STABLE API requirement.
This is hard to fix with this approach to errors, because
enum data is hardcoded to always use inherited visibility, meaning if the outer
enum fields are public all inner fields are too. Private fields are also useful in errors in general, for reasons other than stability: private fields are just generally a nice feature to have on types.
Problem 5: Non-Modularity
And lastly, touching back on what I mentioned at the beginning of this article: this approach to error handling is non-modular. I couldn’t easily take a component alone, like the parser, and extract it to a different crate, because I’d have to change many APIs or otherwise hack around it. Every API is interconnected with each other through the underlying error type, tying the crate together in a big knot that makes it difficult to untangle and remove stuff.
This kind of non-modularity also makes the codebase more difficult to understand: one is forced, to a greater degree, to learn the entire codebase at once to work on it, rather than learn it piece by piece, a far preferable way of learning.
Guidelines for Good Errors
So the current error type we have has problems. But how do we fix them? And this is where we bring in that principle from the start:
Error types should be located near to their unit of fallibility.
The key phrase here is “unit of fallibility”. What are the units of fallibility in our library? Well, it’s certainly not the library itself — the library is just a way of interacting with Unicode blocks, and it’s not like that can particularly fail. The only libraries that would have the entire library as a unit of fallibility are those whose only purpose is to perform a single operation (they typically have an API surface of no more than two functions, maybe a
Params builder type, and nothing more).
This tells us that the
unicode_blocks::Error type is inherently misguided. Rather, the units of fallibility in our case are the operations we do, like downloading, reading a file, and parsing.
Now, things get a little subjective at this point on deciding what counts as two separate units or the same unit. In general, you should ask yourself the following two questions:
- Do they have different ways in which they can fail?
- Should they show different error messages should they fail?
If the answer to either of those questions is “yes”, then they should normally be separate error types.
For us, this means we actually want three separate error types:
FromFileError, for errors in
DownloadError, for errors in
ParseError, for errors in
Earlier, we said we wanted our error messages (printed with
anyhow) to look good, like this:
Error: error reading `Blocks.txt` Caused by: 0: invalid Blocks.txt data on line 223 1: one end of range is not a valid hexidecimal integer 2: invalid digit found in string
So how do we get
anyhow to print this? It turns out what the library calls internally is the
Error::source() method, a default-implemented method of the
Error trait that tells you the cause of an error. What we see in the above graphic depicts:
- an error type (we know to be
Displayimplementation prints “error reading
Blocks.txt”, and whose
- …another error type, whose
Displayimplementation prints “invalid Blocks.txt data on line 223”, and whose
- …another error type, whose
Displayimplementation prints “one end of range is not a valid hexidecimal integer”, and whose
- …another error type (we know to be
Displayimplementation prints “invalid digit found in string” and whose
That might seem like a lot of layers, but they all map very nicely to our code: layer 1 is a
FromFileError, layer 2 has to be our
ParseError, layer 3 has to be something contained within the
ParseError, and layer 4 is
This leads us to a much nicer structure for the error types in the
- has very good backtraces, as it implements
- is extensible, as the
structis attributed with
- supports precise error matching, as we’ve now automatically given the public API guarantee that we won’t produce HTTP errors from our function, so our users needn’t worry about dealing with that case;
- makes it clear where the
io::Errors can come from, because the variant is named
ReadFileinstead of simply
- would easily be able to adjust to support hiding
io::Errorfrom the public API surface simply by making
- is entirely modular, being conceptually contained within the
from_filelogic portion of the code, so it can be extracted, learnt independently, et cetera.
ParseError can be defined in a somewhat similar fashion, also with the above benefits.
Note that the
enum variants themselves are
, so that they can be extended in future with more information.
There is a slight deviation from
FromFileError’s design here, that its corresponding
*Kind type actually implements
Error in and of itself instead of simply existing as a data holder for other error types. The logic is that while we could separate make unit structs for
ParseInt, it just isn’t very necessary here (where on the other hand
io::Error is an external type and
ParseError is required to be a distinct type because of
FromStr). However, sometimes it is still better to make unit structs: it depends on the use case.
DownloadError showcases a similar pattern (although it’s not that interesting at this point):
Note that we could have merged
DownloadError into a single type; I chose not to here in favour of extensibility, because it seems quite possible that one would want to add more fields to
DownloadError in future. But for some cases it definitely makes sense.
Constructing the error types
If you try to implement the functions that return these error types, you’ll quickly run into something rather annoying: they require quite a bit of boilerplate to use. For example, the body of
from_file now looks like this:
Yeah, not the prettiest. Unfortunately, I don’t think there’s much we can actually do here; once we get
try blocks it’ll definitely be nicer, but it seems to be an unavoidable cost of many good error-handling schemes.
One thing notably omitted from the definitions of the new error types was implementations of
From for inner types. There is no problem with them really, one just has to be careful that it (a) works with extensibility and (b) actually makes sense. For example, taking
While it does make sense to implement
Parse is literally the name of one of the variants of
FromFileErrorKind, it does not make sense to implement
From<io::Error> because such an implementation would implicitly add meaning that one failed during the process of reading the file from disk (as the variant is named
ReadFile instead of
Io). Constraining the meaning of “any I/O error” to “an error reading the file from the disk” is helpful but should not be done implicitly, thus rendering
One part of my principle of errors I haven’t yet touched on is the aspect of “nearness”; that errors should, as well as having an appropriate associated unit of fallibility, be sufficiently near to it. The fact is, with Rust’s current design you can’t put them as close as I’d like without sacrificing documentation quality. That is, while you’d ideally write something like:
This just makes your rustdoc look bad, since the
blocks are needlessly separated. So usually I end up writing something more like:
It’s unfortunate, but I don’t think it’s terrible — you still get most the benefits of nearness.
The only thing to make sure of is that they stay in the same module; this same concept of “nearness” is a similar reason why one should be extremely wary of any module named “errors”, which is of equal organizational value to having a drawer labelled “medium-sized and flat”.
Possibly the biggest objection to this style of error is the sheer number of lines of code required to implement it; error types aren’t a trivial number of lines, and making a new error type for every function can easily hugely increase the number of lines a library needs. This is definitely a valid criticism, I also find it tiresome to write the same things over and over again, but let me also offer an alternate perspective: rather than seeing it as simply a more verbose way to do the same thing, see it as due treatment for an oft ignored area.
Traditionally, errors as something to be pushed to the side as soon as possible to get on with “real” logic. But the art of resilient, reliable and user-friendly systems considers all outcomes, not just the successful one. As a success story, look no further than the Rust compiler itself; I don’t think it would be an exaggeration to say that Rust enjoys the current popularity it does because of how good its error messages are, and how much effort was put into it.
This post is not here to give you a structure that you should follow for your errors. The structure I used as an example in this post had one specific use case, and filled it appropriately. If you find you can apply the same structure to your own code and it works well, then great! But really, what post is for is to get people to start caring about errors, putting actual thought into their designs, and learning how to elegantly pull off ever-present balancing act between the five goals of good backtraces, extensibility, inspectability (matching), stability and modularity.
If there’s one thing I wish for you to take away, it’s that error handling is hard, but it’s worth it to learn. Because I’m tired of having to deal with lazy kitchen-sink-type errors.
The final code
//! This crate provides types for UCD’s `Blocks.txt`. pub const LATEST_URL: &str = "https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt"; use cmp; use Error; use fmt; use Display; use Formatter; use fs; use io; use ParseIntError; use RangeInclusive; use Path; use FromStr;
⮬ Back to top