A Few Tips for Cutting Down Exception Noise

No app survives first contact with actual users. Once people start to use it, they’re going to run into errors.

So, once in production, most apps will have a way of tracking and reporting errors. You could go simple with exception_notification or use a webapp like Honeybadger or Raygun.

But soon, you’ll see the same few exceptions over and over. Maybe a web service you depend on isn’t totally stable. Or the people using your site typo’d their email, so none of your messages are going through. Exceptions should be exceptional, they should be unexpected. But how unexpected can an error be if you see it thirty times a day?

There are better ways of solving these problems than reporting and ignoring. Most noisy exceptions fall into a few basic categories. And for each of these categories, you can use patterns to cut down the noise and make your users happier at the same time.

The network is down!

Few apps work alone. Most communicate with other apps. But when your geolocation API goes down or ec2 has a hiccup, you don’t want to get spammed with thousands of exceptions that you can’t do anything about.

When you deal with unreliable services, try the Circuit Breaker pattern, from Michael Nygard’s Release It:

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.

So, when a service goes down, you’ll automatically stop trying to connect to it. You’ll just go on without that functionality. You can even make it self-healing, so your app will automatically check the service again after a certain amount of time.

The circuit breaker pattern is designed to prevent cascading failures, but you can also use it to limit exception notifications. With this pattern, you really only need to get notified when it trips and when it fails to recover. It can turn thousands of exceptions into a few. If that’s still too many, you can tell the breaker to only report after it’s failed a few retries in a row. (Or find a different, more reliable service!)

Using this pattern takes work, but it also makes the experience better for your users. Instead of a hard error page, they’ll see a message saying that this feature isn’t working right now, and they should try again later. It’s better information for them, delivered at the right time.

Turns out gmaaaail.com isn’t what you meant

Another kind of exception I see a lot comes from bad user data.

For example, say someone typo’d their email when they signed up. They said, “justinweiss@gmaill.com”, but meant “justinweiss@gmail.com”. The first could theoretically be a valid email address, but all the emails you send bounce. And you get notified about those bounces by your email provider.

These notifications are just noise.

Instead, take a two-sided approach. Prevent the bad data up-front, and disable the feature and notify the user if it fails later on.

For email, I’ve used the mailcheck-js gem to spellcheck things like “gmail.com” and “yahoo.com” when new users register:

{% img img-responsive /images/posts/email-spellcheck.gif 477 451 Ooh, fancy. %}

Then, if an email still bounces later on, turn off email to that user.

Once you turn the feature off for someone, you also need to tell them that it’s disabled and how to fix it. A banner on the top of the site is usually a good answer. Something like “We weren’t able to send your last few emails, so we’ve turned off sending emails to you. Click here to update your email address, and we’ll turn them right back on.”

You’ll get better data, and the user’s emails won’t just go into the void. Way better than those errors you’ve just been ignoring.

404s and RoutingErrors

You probably want to know about broken links or assets on your site. But those things don’t belong in your exception tracker.

For these, and other “half-expected errors”, batch them up and handle them all at once. You don’t need to get notified about them as they happen. You want pull, not push.

Things like RoutingErrors and 404s can be handled with something like Google Webmaster Tools, which will show you the pages Google knows about that are throwing 404s. Or you could run something like link-checker to check the links on your site as part of your pre-release process.

Exceptions should be actionable

It should be rare to get an exception email. Too much noise in your error tracker will keep you from seeing and fixing real problems right away.

If you’re more annoyed than embarrassed about the exceptions you see, you have a noise problem. Use the patterns here to cut down on that noise and give your users a better experience at the same time.

I’ve talked about a few of the noisy exception categories I’ve seen most often. But I’m sure I haven’t seen them all. Which exceptions annoy you the most in your apps? Do they fit into any of these categories, or do they define a new one? How do you keep them from bothering you a few hundred times a day?

Did you like this article? You should read these: