SOA Bits and Ramblings: Error handling considerations and best practices

A recurring topic in REST and Web API discussions is that of error handling (see for instance https://groups.google.com/d/topic/api-craft/GLz_nNbK-6U/discussion or http://stackoverflow.com/questions/942951/rest-api-error-return-good-practices]; what information should be included in error responses, how should HTTP status codes be used and what media type should the response be encoded in? In this blog post I will try to address these issues and give some guidelines based on my own experience and existing solutions.

Existing solutions

Let us first take a look at some existing solutions to get started:

The twitter API uses a list of descriptive error messages and error codes. Twitter has both JSON and XML representations with property names: "errors", "error", "code"
The Facebook Graph API has a single descriptive error message, an error code and even a sub-code. Facebook uses a JSON representation with property names: "error", "message", "type", "code" and "error_subcode".
The Github API has a top level descriptive error message and a optional list of additional error elements. The items in the error list refers to resources, fields and codes. Github uses a JSON representation with property names: "message", "errors", "resource", "field", "code".
The US White House has a set of guidelines for its APIs on GitHub. The error message used here contains the HTTP status code, a developer message, a user message, an error code and links to further information.
Ben Longden has proposed a media type for error reporting. This specification includes an "logref" identifier that some how refers to a log entry on the server side - such a feature can help debugging server errors later on.
Mark Nottingham has introduced "Problem Details for HTTP APIs" as an IETF draft. This proposal makes use of URIs for identifying errors and is as such meant as a general and extensible format for "problem reporting".

All of these response formats share some similar content: one or more descriptive messages, status codes and links to further information. But as can be seen there is a wide variety in the actual implementation and wire format.

Considerations and guidelines

So, what should you do with your web API? Well, here are some considerations and guidelines you can base your error reporting format on ...

Target audience

Remember that your audience includes both the end user, the client developer, the client application and your frontline support (which may just happen to be you). Your error responses should include information that caters for all of these parties:

The end user needs a short descriptive message.
The client developer needs as much detailed information as possible to debug the application.
The client application needs error codes (HTTP status codes) for error recovery actions.
The frontline support people needs detailed information and/or keywords to look for in their knowledge database.

Use the HTTP status codes correct

The HTTP status codes are standardized all over the web and your clients will know immediately how to handle them. Make sure to use them correct:

Do NOT just return HTTP status code 200 (OK) regardless of success or failure.
Use 2xx when a request succeeds.
Use 4xx when a request fails and the client should be able to fix it by modifying its own request.
Use 5xx when a request fails due to some internal server error.

Use descriptive error messages

Be descriptive in your error messages and include as much context as possible. Failure to do so will cost you dearly in support later on: if your client developers cannot figure out why their request went wrong, they will look for help - and eventually that will be you who will spend time tracking down client errors instead of coding new and exiting features for your service.

If it is a validation error, be sure to include why it failed, where it failed and what part of it that failed. A message like "Invalid input" is horrible and client developers will bug you for it over and over again, wasting your precious development time. Be descriptive and include context: "Could not place order: the field 'Quantity' should be an integer between 0 and 99 (got 127)".

You may want to include both a short version for end users and a more verbose version for the client developer.

Localization

Error messages for end users should be localized (translated into other languages) if your service is already a multi language service. Personally I don't think developer messages should be localized: it is difficult to translate technical terms correct and it will make it more difficult to search online for more information.

When localization is introduced it may also be necessary to include language codes and maybe even allow for a list of different translations to be returned in the error response.

Allow for more than one message

Make it possible to include more than one message in the error response. Then try to collect all possible errors on the server side and return the complete list in a single response. This is not always possible - and requires some more coding on the server side (compared to simply throwing an exception first time some invalid input is detected).

Additional status codes

If your business domain calls for more detailed information than can be found in the normal HTTP status codes then include a business specific status code in the response. Make sure all of the codes are documented.

You may be tempted to include more technical error codes, but consider who your audience is for that: It won't help your end user. It may help your client application recovering from errors - but probably not in any way that was not already covered by the HTTP status codes. Your client developer may have some need for it - but why make them lookup error codes in online documentation when you can include descriptive error text and links that refers directly to the documentation? It may help your support - but if the client dev have enough information in the error response they won't need to call your support anyway - right?

Use letters for status codes

I often find myself searching for online resources that can help me when I get some error while interacting with third party APIs. Usually I search for a combination of the API name, error messages and codes. If you include additional error codes in your response then you might want to use letters instead of digits: it is simply more likely to get a relevant hit for something like "OAUTH_AUTHSERVER_UNAVAILABLE" than "1625".

Include links to online resources

Include links to online help and other resources that will either clarify what went wrong or in some other way help the client developer to solve the problem.

Support multiple media types

If your have a RESTful service that allows both client applications and developers to explore it then you might want to support a human readable media type for your error responses. HTML is perfect for this as it allows the client developers to view the error information righ in their browsers without installing any additional plugins. A fallback to plain text could also be useful (but probably overkill).

Include a timestamp or log-reference

It can help support and bug hunting if the error report contains a timestamp (server timezone or UTC). This may help locating the right logfile entries later on.

Another possibility is to include some other kind of information that refers back to the logfiles such that server developers and support people can track what happened.

Field-by-field messages

In some cases it makes sense to be explicit about the fields in the input that caused the errors and include field names in separate elements of the error response. For instance something like this JSON response:

{
message: "One or more inputs were not entered correctly",
errors:
[
{ field: "Weight", message: "The value if 'Weight' exceeds 100 - the value should be between 0 and 100" },
{ field: "Height", message: "A value must be entered for 'Height'" }
]
}

This would make it possible for the client to highlight those fields in the UI and draw the end users attention to them. It is although difficult to keep clients and servers in sync and requires a lot of coding on both sides to get it to work. Usually field-by-field information is handled by client side validation logic anyway. So a clear error message like "The value of 'Weight' exceeded 100 - the value should be between 0 and 100" should be enough for most applications.

Include the HTTP status code

This may sound a bit odd, but according to people on api-craft there are some client side environments where the application code do not have access to the HTTP headers and status codes. To cater for these clients it may be necessary to include the HTTP status code in the error message payload.

Do not include stack traces

It may be tempting to include a stack trace for easier support when something goes wrong. Don't do it! This kind of information is too valuable for hackers and should be avoided.

Implementation

Now that we have our "requirements" ready we should be able to design a useful solution. Lets first try to define the response without considering an actual wire format:

message (string): the primary descriptive error message - either in the primary language of the server or translated into a language negotiated via the HTTP header "Accept-Language".
messages (List of string): an optional list of descriptive error messages (with the same language rules as above).
details (string): an optional descriptive text targeted at the client developer. This text should always be in the primary language of the expected developer community (that would be English in my case).
errorCode (string): an optional error code.
httpStatusCode (integer): an optional copy of the HTTP status code.
time (date-time): an optional timestamp of when the error occurred.
additional (any data): a placeholder for any kind of business specific data.
links (List of <string,string,string>): an optional list of links to other resources that can be helpful for debugging (but should probably not be shown to the end user). Each link consists of <href, rel, title> just like an ATOM link element.

I have ignored the possibility of having multiple translations of the messages. Neither does this implementation include any field-by-field validation since I expect that to be performed by the client. That doesn't mean the server shouldn't do the validation - it just doesn't have to return the detailed field information in a format readable by the client application.

JSON format example

Now it is time to select a wire format for the error information. I will choose JSON since that is a wide spread and well known format that can be handled by just about any piece of infrastructure nowadays. The format is straight forward and is probably best illustrated with a few examples:

Example 1 - the simplest possible instantiation

{
message: "The field 'StartDate' did not contain a valid date (the value provided was '2013-20-23'). Dates should be formated as YYYY-MM-DD."
}

Example 2 - handling multiple validation errors

{
message: "There was something wrong with the input (see below)",
messages:
[
    "The field 'StartDate' did not contain a valid date (the value provided was '2013-20-23'). Dates should be formated as YYYY-MM-DD.",
    "The field 'Title' must have a value."
]
}

Example 3 - using most of the features

{
message: "Could not authorize user due to an internal problem - please try again later.",
details: "The OAuth2 service is down for maintenance.",
errorCode: "O2SERUNAV",
httpStatusCode: 503,
time: "2013-04-30T10:27:12",
links:
[
    {
      href: "http://example.com/oauth2status.html",
      rel: "help",
      title: "Service status information"
    }
]
}

Client implementation and media types - a matter of perspective

The client implementation should, at a suitable high level, be straight forward:

Client makes an HTTP request.
Request fails for some reason, server returns HTTP status code 4xx or 5xx and includes error information in the HTTP body.
Client checks HTTP status code, sees that it is 4xx or 5xx and decodes the error information.
Client tries to recover from error - either showing the error message to the end user, write the error to a log, give up or maybe retry the request - all depending on the error and the client's own capabilities.

But, hey, wait a minute ... how does the client know how to to decode the payload? I mean, perhaps the client asked for a resource representation containing medical records, but then it got a HTTP status code 400 - how is it supposed to know the format of the error information?

If the client is working with a vendor specific service, like Twitter and GitHub, then chances are that the client is hard wired to extract the error information based on the vendor specific service documentation. My guess is that this is how most clients are implemented.

But what if the client is working with a more, shall we say, RESTful service? That is; the client doesn't know what actual implementation it is interacting with. This could for instance be the case of clients consuming an ATOM feed (application/atom+xml). How would the client know how to decode the error response payload? Actually this seems like an unanswered question for ATOM since the spec is rather vague about this point (see for instance http://stackoverflow.com/questions/9874319/how-to-represent-error-messages-in-atom-feeds)

A RESTful service specification may call for a media type dedicated to error reporting; lets call such a media type "application/error+json". When the client receives a 4xx or 5xx HTTP status it can then look at the content-type header: if it matches "application/error+json" then the client would know exactly what to look for in the HTTP body.

It could also be that the base media type included detailed specification about error payloads.

I would prefer one of the two last options: either specify error handling in the base media type of the service - or use an existing standard media type. The last option is actually what Mark Nottingham has done with https://tools.ietf.org/html/draft-nottingham-http-problem-03.

So it is a matter of perspective: vendor specific "one-of-a-kind" services tend to invent their own error formats whereas RESTful services (like ATOM) should standardize error reporting via media types for everyone to reuse all over the web.

Have fun, Jørn

7 kommentarer:

Kijana Woodard20. december 2013 kl. 17.57.00 CET
This is excellent. Thank you!
SvarSlet
Svar
kevin28. januar 2014 kl. 22.57.00 CET
Let's say you're using a library, such as a JSON decoder and that throws an error. Would you return the error of that library to the developer in the error message?

E.g.
{
...
details: "invalid character 'e' looking for beginning of value on line 1"
...
}

or would you return a generic message like
{
details: "Check to ensure your input format is correct. Also ensure you provided the correct Content-Type."
SvarSlet
Svar
Unknown29. maj 2015 kl. 11.06.00 CEST
Denne kommentar er fjernet af en blogadministrator.
SvarSlet
Svar
Melbourne Web Developer28. juni 2017 kl. 09.46.00 CEST
Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!
SvarSlet
Svar
Unknown1. juli 2017 kl. 13.27.00 CEST
Address Standardisation
Nice Article.
SvarSlet
Svar

Tilføj kommentar

onsdag, maj 15, 2013

Error handling considerations and best practices