The implementation of WithMeta() is flawed. Not only is it not concurrency-safe, every nested call will be modifying the parent map.
The way to do this in a safe and performant manner is to structure the metadata as a tree, with a parent pointing to the previous metadata. You'd probably want to do some pooling and other optimizations to avoid allocating a map every time. Then all the maps can be immutable and therefore not require any locks. To construct the final map at error time, you simply traverse the map depth-first, building a merged map.
I'm not sure I agree with the approach, however. This system will incur a performance and memory penalty every time you descend into a new metadata context, even when no errors are occurring. Building up this contextual data (which presumably already exists on the call stack in the form of local variables) will be constantly going on and causing trouble in hot paths.
A better approach is to return a structured error describing the failed action that includes data known to the returner, which should have enough data to be meaningful. Then, every time you pass an error up the stack, you augment it with additional data so that everything can be gleaned from it. Rather than:
val, err := GetStuff()
if err != nil {
return err
}
You do: val, err := GetStuff()
if err != nil {
return fmt.Errorf("getting stuff: %w")
}
Or maybe: val, err := GetStuff()
if err != nil {
return wrapWithMetadata(err, meta.KV("database", db.Name))
}
Here, wrapWithMetadata() can construct an efficient error value that implements Unwrap().This pays the performance cost only at error time, and the contextual information travels up the stack with a tree of error causes that can be gotten with `errors.Unwrap()`. The point is that Go errors already are a tree of causes.
Sometimes tracking contextual information in a context is useful, of course. But I think the benefit of my approach is that a function returning an error only needs to provide what it knows about the failing error. Any "ambient" contextual information can be added by the caller at no extra cost when following the happy path.
I think this is the way to bubble up error messages that I like the most. Simple, not needing any additional tools, and very practical (sometimes even better than a stack trace).
The idea is to only add information that the caller isn't already aware of. Error messages shouldn't include the function name or any of its arguments, because the caller will include those in its own wrapping of that error.
This is done with fmt.Errorf():
userId := "A0101"
err := database.Store(userId);
if err != nil {
return fmt.Errorf("database.Store({userId: %q}): %w", userId, err)
}
If this is done consistently across all layers, and finally logged in the outermost layer, the end result will be nice error messages with all the context needed to understand the exact call chain that failed: fmt.Printf("ERROR %v\n", err)
Output: ERROR app.run(): room.start({name: "participant5"}): UseStorage({type: "sqlite"}): Store({userId: "A0101"}): the transaction was interrupted
This message shows at a quick glance which participant, which database selection, and which integer value where used when the call failed. Much more useful than Stack Traces, which don't show argument values.Of course, longer error messages could be written, but it seems optimal to just convey a minimal expression of what function call and argument was being called when the error happened.
Adding to this, the Go code linter forbids writing error messages that start with Upper Case, precisely because it assumes that all this will be done and error messages are just parts of a longer sentence:
> This message shows at a quick glance which participant, which database selection, and which integer value where used when the call failed.
And it's completely useless for looking up the errors linked to a participant in an aggregator, which is pretty much the first issue the article talks about, unless you add an entire parsing and extraction layer overtop.
> Much more useful than Stack Traces, which don't show argument values.
No idea how universal it is, but there are at least some languages where you can get full stackframes out of the stacktrace.
That's how pytest can show the locals of the leaf function out of the box in case of traceback:
def test_a():
> a(0)
test.py:11:
test.py:2: in a
b(f)
test.py:5: in b
c(f, 5/g)
f = 0, x = 5.0
def c(f, x):
> assert f
E assert 0
test.py:8: AssertionError
and can do so in every function of the trace if requested: def test_a():
> a(0)
test.py:11:
f = 0
def a(f):
> b(f)
test.py:2:
f = 0, g = 1
def b(f, g=1):
> c(f, 5/g)
test.py:5:
f = 0, x = 5.0
def c(f, x):
> assert f
E assert 0
test.py:8: AssertionError
So this is just a matter of formatting.You have to add all of that detail manually which sucks. You can get the function name from the runtime package and generate that metadata easily with a helper function. Otherwise when you rename the function, you have to rename all of the error messages.
I agree except for the part about using strings, as you lose structure they way. You should instead return structured errors:
return CouldNotStoreUser{
UserID: userId,
}
and now this struct is available to anyone looking up the chain of wrapped errors.This is covered by the “ergonomics” section of TFA:
> With custom error structs however, it's a lot of writing to create your own error type and thus it becomes more of a burden to encourage your team members to do this.
Because you need a type per layer, and that type needs to implement both error and unwrap.
In practice, I have found the benefit of explicit typing to outweigh the downsides of needing to declare the types.
As a concrete example, it means you can target types with precision in the API layer:
switch e := err.(type) {
case UserNotFound:
writeJSONResponse(w, 404, "User not found")
case interface { Timeout() bool }:
if e.Timeout() {
writeJSONResponse(w, 503, "Timeout")
}
}
I skimmed the article and didn't see the author proposing a way to do that with their arbitrary key/value map.Of course, you could use something else like error codes to translate groups of errors. But then why not just use types?
But as I suggested in my other comment, you could also generalize it. For example:
return meta.Wrap(err, "storing user", "userID", userID)
Here, Wrap() is something like: func Wrap(err error, msg string, kvs ...any) {
return &KV{
KV: kvs,
cause: err,
msg: msg,
}
}
This is the inverse of the context solution. The point is to provide data at the point of error, not at every call site.You can always merge these later into a single map and pay the allocation cost there:
var fields map[string]any
for err != nil {
if e, ok := err.(*KV); ok {
for i := 0; i < len(KV.KV); i += 2 {
fields[e.KV[i].(string)] = e.KV[i+1]
}
}
err = errors.Unwrap(err)
}
You can also just assign your errors to variables, and typically you only need to do so for the innermost error. Wrapping calls can just use Errorf with the %w operator.
Horses for courses.
err.(type) fails when errors are wrapped, which is common and needs to be accommodated. To do what you're trying to do, you need to use errors.As.
Sure. It was not intended to be a complete example.
It's not that the example is incomplete, it's that it's incorrect! :)
No, it's just incomplete. The same code just needs to unwrap:
for err != nil {
switch e := err.(type) {
case UserNotFound:
writeJSONResponse(w, 404, "User not found")
return
case interface { Timeout() bool }:
if e.Timeout() {
writeJSONResponse(w, 503, "Timeout")
return
}
}
err = errors.Unwrap(err)
}
Just a little more work, and you will reinvent exceptions with stack traces! Proper error handling in Go is now tantalisingly close!
Go already has exceptions with stack traces...
errors aren’t exception
These are good general tips applicable to other languages too. I strongly dislike when code returns errors as arbitrary strings rather than classes, as it makes errors extremely difficult to handle; one would presumably want to handle a http 502 diffrernetly to a 404, but if a programmer returns that in a string, I have to do some wonky regex instead of checking the type of error class (or pulling a property from an error class). I've commonly found JS and Go code particularly annoying as they tend to use strings, as the author mentioned.
An additional thing that is useful here would be a stack trace. So even when you catch, wrap & rethrow the error, you'll be able to see exactly where the error came from. The alternative is searching in the code for the string.
For the hate they seem to get, checked exceptions with error classes do give you a lot of stuff for free.
I couldn't agree more. I was surprised to see the default go error handling when I switched to the language a few years ago. Any meaningful REST API implementation, as you say, needs to know what to return to the user. Perhaps there is an error for the user, and then an error for the logs. With the default go mechanism, it's too easy to return system information to the user, potentially revealing database schemas.
I think what you want are not dedicated classes but error codes.
If you find yourself needing to branch on error classes it may mean error handling is too high up.
ps. personally I always prefer string error codes, ie. "not-found" as opposed to numeric ones ie. 404.
No, I want dedicated classes. Be they thrown or returned as a value. Error codes are limiting and serve a different purpose.
Error codes contain only the type of error that occurred and cannot contain any more data. With an error class you can provide context - a 400 happened when making a request, which URL was hit? What did the server say? Which fields in our request were incorrect? From a code perspective, if an error happens I want to know as much detail as possible about it, and that simply cannot be summarised by an error code.
If I want to know the type of an error and do different things based on its type, I can think of no better tool to use than my language's type system handling error classes. I could invent ways to switch on error codes (I hope I'm using a language like Rust that would assert that my handling of the enum of errors is exhaustive), but that doesn't seem very well-founded. For example, using error enums, how do I describe that an HTTP_404 is a type of REQUEST_ERROR, but not a type of NETWORK_CONN_ERROR? It's important to know if the problem is with us or the network. I could write some one-off code to do it, or I could use error classes and have my language's typing system handle the polymorphism for me.
Not that error codes are not useful. You can include an error code within an error class. Error codes are useful for presenting to users so they can reference an operator manual or provide it to customer support. Present the user with a small code that describes the exact scenario instead of an incomprehensible stack trace, and they have a better support experience.
Side note: please don't use strings for things that have discrete values that you switch on. Use enums.
You need some kind of grouping otherwise any simple action ie. involving single io would require handling of dozens different classes.
Agree on error codes, but I disagree on branching. An api request failing with 429 is retry able (after a period), but a 401 isn’t. A 401 might require a refreshing an authorisation token, but a 400 maybe needs the user to change their input.
> I always prefer string error codes
My parent company provides an API for us to use for “admin-y” things, but they use stringy errors in the response payload of the body. Except they’re in mandarin, and you’d be surprised (or maybe not) at how many tools barf at it. Getting them to localise the error codes is as likely to happen as us fixing the referer heading. The really nice thing about status codes is that there’s only a fixed amount of them so you don’t get two slightly different responses from two different endpoints (not-found vs not_found), and there’s no locale issues involved.
I didn't say not to branch on error code.
Error _code_ is code, shouldn't be localized.
I wrote my own error wrapper: github.com/samber/oops
Example:
err := oops. Code("iam_missing_permission"). In("authz"). Tags("authz"). Time(time.Now()). With("user_id", 1234). With("permission", "post.create"). Hint("Runbook: https://doc.acme.org/doc/abcd.md"). User("user-123", "firstname", "john", "lastname", "doe"). Errorf("permission denied")
For easier debugging, the error contains the full stacktrace.
The `WithMeta` func will panic if not passed a multiple of 2 varargs. This is exactly what makes go error handling difficult in the first place. Imagine panicking in production because you passed a key without value to your error wrapper for some reason.
My solution handled that by inserting default value for the unbalanced key, and a new error key describing the log error
There are several existing Go error libraries with a similar approach sans context.
The approach I ended up taking is to use slog attributes. It allows for reuse of existing logging attributes.
This is explained here (skip to the “adding metadata” portion). https://blog.gregweber.info/blog/go-errors-library/
Go package: https://pkg.go.dev/github.com/gregwebs/errors/slogerr
Not a Go engineer but Go-curious - shouldn’t this use slog[0] for structured logging rather than a third party?
The article is from 2022 which predates slog in stdlib. Article also references Wrap() & Wrapf() which are also not part of stdlib (odd they don't mention that imo).
It was likely using the previous default error lib github.com/pkg/errors
Depends on if you're making a public library. Using slog is "polite" but I don't really see it as the endgame of logging libraries. It has quite a few rough edges for CLI apps, like no control on attr order. But it is zomgfast, and speed is important, right?
The biggest advantage is being third-party-dependency-free. I tend to reject libraries which have no updated to use the stdlib approach at this stage as “likely unmaintained” - similar for those that pull in things like pkg/errors transitively. Apps can do what they like, though.
This is good stuff. After riding a unicorn and seeing systems mature over time as things change and scale grows, I've learned that Structured (Error) Logs are the only way to run systems at scale with sanity.
I wrote my Go version of this same error wrapping utility for the same reasons: https://github.com/sethgrid/kverr
My current work uses python and I am hoping to change us over to structured logs that play well with structured exceptions.
We've been storing values on errors using https://github.com/levenlabs/errctx for close to a decade now. We have a separate logging library that stores a KV map but technically you can store anything on errors. I recently just made it unwrap errors to find deeply stored keys.
Hopefully with some kind of synchronization, since errctx is pretty clearly not safe for concurrent use by multiple goroutines.. !
One thing that seemingly is missing is the ability to tag a specific error with an error code. You typically want to know that all of a sudden the ”failed to get user” error is being returned a lot. Since the message is a dynamic string you can’t just group by the string so unless you build it as part of your abstraction it becomes very hard to do.
Edit: looking more carefully at the lib I assume that ”tag” is the concept that is supposed to cover this?
in standard go you'd have errors implement:
func (myError) Is(err error) bool
and it can match different sentinel errors.
Or you can make your own wrapper to have the error chain match.In standard go you would not implement Is but would use the errors.Is call. How would this work if it is a sentinel error as you would use with errors.Is??
> An error is considered to match a target if it is equal to that target or if it implements a method Is(error) bool such that Is(target) returns true.
by default errors.Is matches the exact error variable, but you can use it to match other errors as well.
I stand corrected. Still you cannot implement an Is on a sentinel error and is only applicable to concrete error types. And if I'm using concrete types I'm going to us errors.As
Alright, so this looks pretty comprehensive for error handling. But I gotta ask – for smaller to mid-size projects, is there a point where this level of structure becomes more work than it's worth?
IMO error handling is the sort of thing you really want to get right early on, even in toy projects. It’s very hard to retrofit, and the actual payoff is low until you need it - at which point you definitely don’t want to do the work.
As antithetical as it might be, I tend to just stuff sentry in (no affiliation just a happy user) when I’m setting up the scaffolding, and insert rich context at the edges (in the router, at a DB/serialization/messagebus layer) and the rest usually just works itself out.
why do you think it's hard to retrofit? it's just an SDK to setup up, right (point at some DSN)?
The sdk setup is a breeze, but retrofitting all of the bespoke one off error logs and tracing to a common subset to send via the SDK is not.
Error handling always comes in useful at one time in particular - when something has gone wrong! At that point, having something that is structured and full of context is really useful and makes debugging much, much easier.
Even for small projects, this is a small thing to introduce but it will pay you dividends in the future. The earlier you start the more you'll thank yourself (it's not very helpful to frantically try and refactor this into a codebase after you've already been bitten!)
i'm not sure the point here, you can carry along information in errors at API boundaries and collect the info when you emit a log line. Using built in errors.Wrapf and slog, you get everything you need already.
I wrote an article covering the same problems but with a slightly different solution a few years ago: https://alexrichey.com/articles/2021-09-09-error-handling-in...
This approach has been successful in major projects at work and on the side for several years now.
Yeah always thought error handling is a bit wonky in Go. (Un)fortunately, most of my tinkering with Go are just toy level scripts. Thanks for the write up, will check the library!
Go itself is wonky, yet another programming language that is a fine example of worse is better mentality in the industry, whose adoption was helped by having critical infrastructure software written in it.
I think it depends a bit on perception. If you're coming from C, Go is a step up in my opinion. Everything surrounding Go's type system and lack luster error handling seems worse to me coming from almost any other language - except C.
Concurrency is also something they got more or less right. The most important thing is that they invented their (Google) language that they could exert complete control over. From a business perspective specifically, that was much better than Java.
> If you're coming from C, Go is a step up
Walking on burning coals is a step up if you're coming from C.
We shouldn't grade languages on that much of a curve by comparing them to garbage.
> Concurrency is also something they got more or less right
Except data-races are an incredibly common bug in Go, and an incredibly rare bug in Rust.
Data-races are way more common in Go than in modern C++ or Java, if only because mutex semantics in Go are awful, with many programmers opting for manual "lock" and "unlock" calls, and mentally reasoning about the correct scope for the critical section with no compiler assistance.
I will give you that they made concurrency very easy.
> whose adoption was helped by having critical infrastructure software written in it
Doesn't this contradict the first part of your post? Kubernetes for instance was ported from Java to Go (albeit, poorly). Is Java worse than Go?
Java is too advanced for Go folks, it is a PhD level language, when compared with Go minimalism, and their disdain for modern type systems (modern as in, invented in 1976, e.g. CLU and ML).
Also it is quite ironic how given the Java bashing on Go community, there was so little learned from Java evolution and design mistakes.
They even ended up having to reach for the same folks that helped designing Java generics.
As for your remark, the actual rewrite history as told at FOSDEM, is that the rewrite only happened as two strong minded Go devs joined the Kubernetes team and heavily pushed for the rewrite.
> and their disdain for modern type systems
To be fair, it was made abundantly clear when it was first released unto the world that it was intended to feel like a dynamically-typed language, but with performance characteristics closer to statically-typed languages. What little type system it has is there merely to support the performance goals. If they had figured out how to deliver on the performance end as a strictly dynamically-typed language, it is likely it would have gone without a static type system entirely.
Call it distain if you will, but it is not like there weren't already a million other languages with modern, advanced type systems. That market was already, and continues to be, flooded with many lovely languages to choose from. Go becoming yet another just like all the rest would have been rather pointless. "Like Python, but faster" was the untapped market at the time – and serving that market is why it is now a household name instead of being added to the long list of obscure languages that, while technically interesting, all do the same thing.
Hmm, many people writing Java without a PhD...
[dead]
[dead]