“Don’t Repeat Yourself” is one of the most valuable ideas from one of the most valuable books I read during my software development career. If you can refactor away duplicate code, you will produce more general, more stable code. When you start DRY-ing up your code, though, you’ll start to run into some problems: code that doesn’t handle edge cases well, code that’s too generalized and hard to read, or code that’s hard to find. If refactoring toward DRYness doesn’t work all the time, how do you know when you should refactor?
Too-DRY code comes from a misunderstanding of what kind of duplication you should try to refactor away. You need to be able to identify the difference between essential duplication and accidental duplication.
Essential duplication is code that solves the class of problems you’re working on. This is the kind of duplication that you should kill right away.
Accidental duplication, though, is duplication that’s unrelated to the problem at hand, “duplication by coincidence.” If you clean up accidental duplication too aggressively, you’ll end up with more brittle code that has to be un-refactored as new cases are added.
But how can you tell the difference? With experience, you might get better at this, but even after decades of programming, I still can’t do this right even half the time. Luckily, there’s a general rule that helps a lot: Three Strikes and You Refactor.
The first time you do something, you just do it.
The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway.
The third time you do something similar, you refactor.
Can you see why this helps? By the third time, you should start to see where the patterns are. You should have an idea of which duplicate code is necessary for solving your problem, and which code just looks the same coincidentally. You can start to generalize (and DRY up) just the code that’s fundamentally the same across all instances of the problem you’re trying to solve.
Take a moment and think about the last time refactoring caused you pain. Were you trying to DRY up something that was duplicated between two copies of the problem, but the third copy was just a little different?
And the next time you feel like you have to duplicate some code, try waiting until the third copy before you refactor. (It’s really hard and feels terrible, but close your eyes and try it anyway). Then, think about it as you DRY up the code. Are you refactoring differently than you would have if you refactored right after writing the second copy?
Good developers know they should test their code. Too often, though, the tests get skipped, rushed through, or never started. There are some really common traps I’ve seen people fall into, and they’ll kill your motivation to test every time.
1. Should I use RSpec? Cucumber? Capybara? Minitest?
When you start a new project, it’s way too easy to spend a lot of time trying to pick the best tools. This kind of procrastination hides the fact that you don’t know where to start, and until you pick your tools, you can feel productive without actually being productive.
Instead, just pick the stack you know best. If you’re not experienced with any particular test stack, take the default—start with what Rails gives you. You can always add more later. There’s nothing wrong with trying out new tools, but it’s a much better idea to introduce them over time, as you get to know your project and workflow better.
2. Do I start with unit tests? Functional tests? Integration tests?
When you begin a project, you should have an idea of which features or screens should be built first. This is a great place to start! Pick the first thing off your feature list and write a failing integration test for it. This should tell you what kind of controllers and routes you’re missing, which means you need to write some failing functional tests. The controllers need data and logic to do their job, so you’ll write unit tests and your models next. Then, once you get your unit tests passing, the functional tests should pass, and so should the integration test. Now you can move onto the next feature.
If you have a testing process that always has the next step defined, it’s a lot easier to stay motivated. If you don’t have to make decisions, there’s less of an opportunity for procrastination to creep in.
3. I don’t know how to test this network code, command line utility, or rake task!
The easiest thing to do here is move as much code out of the file or class that’s hard to test and into a new object that’s easy to test. That way, the hard-to-test thing just creates and passes parameters to your new object.
Of course, you still have the original thing that’s hard to test. But it’s probably just a few lines of code now, and it should be easier to stub or fake it out.
4. “The project’s almost done, now I just have to write the tests for it!”
Every developer I’ve met craves shipping code. If the tests are the last thing to do before you can ship, you’ll write the minimum number of tests you need to feel kind of confident that the code works, probably. If you get into this habit, you’ll start to see tests as annoying instead of helpful, and it’ll be that much harder to motivate yourself to write them.
One of my favorite things about TDD is that it mixes the testing with the design and the coding, which means you start to see testing as coding, which makes it much more fun (and you get the benefit of having tests a lot earlier).
5. What if I’m doing it wrong?
The Ruby community is known for really pushing code quality, unit testing, and object oriented design principles. This is a great thing! Unfortunately, it means that it’s really common to feel an enormous amount of pressure to ship perfect code with 100% test coverage on your first try.
This makes it really hard to start a project, especially open sounce, where you know other people will see the code. What will people say if they see that it doesn’t follow all of the SOLID principles? But there are a few things I’ve learned that have helped me deal with this pressure:
- Every good developer writes code they’re embarassed by later.
- Good code that ships is infinitely better than perfect code that doesn’t.
- Some people are just jerks and will make fun of your code. This really sucks. It’s ruined entire weeks of mine. But good developers want to help you instead of criticize you. And I’d bet that if you showed that code to a programming hero of yours, they wouldn’t make fun of it—they’d help you make it better.
What else have you noticed?
Which of these traps have you fallen into? What’s helped you pull yourself out? And are there any that I didn’t mention that you’ve noticed?
If you do recognize yourself in any of these traps, what are you going to do to get yourself out?
Luckily, Rails includes scopes, which can provide you with a lot of what you need for simple searching, filtering, and sorting. If you take advantage of scope chaining, you can build the features you want without taking on big dependencies or writing a bunch of repetitive search code yourself.
Searching with scopes
#index method on your RESTful controller that shows a table of products. The products can be active, pending, or inactive, are available in a single location, and have a name.
If you want to be able to filter these products, you can write some scopes:
1 2 3 4 5
Each of these scopes defines a class method on Product that you can use to limit the results you get back.
Your controller can use these scopes to filter your results:
1 2 3 4 5 6
And now you can show just the active products with names that start with ‘Ruby’.
Clearly, this needs some cleanup
You can see how this code starts to get unwieldy and repetitive! Of course, you’re using Ruby, so you can stuff this in a loop:
1 2 3 4 5 6 7 8 9 10 11 12 13
A more reusable solution
You can move this code into a module and include it into any model that supports filtering:
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4
1 2 3
You now have filtering and searching of your models with one line in the controller and one line in the model. How easy is that? You can also get built-in sorting by using the built-in
order class method, but it’s probably a better idea to write your own scopes for sorting. That way you can sanity-check the input.
To save you some effort, I put
Filterable into a gist. Give it a try in your own project, It’s saved me a lot of time and code.
I’ve heard from some people that are worried about breaking apart large Rails views into smaller partials: How much time does rendering a partial really take? Will the performance impact of calling the partials outweigh the benefits in code readability?
I ran some numbers to give an example on how simple partial rendering compares to inline rendering, so we can discuss the tradeoff later. This is the code I used to benchmark, in a brand new Rails app (with
config.cache_classes = true):
1 2 3 4 5 6 7 8 9 10 11 12 13
And the results (using Ruby 2.1 and Rails 4.0.2, on a 2013 15” Retina Macbook Pro):
1 2 3 4 5 6 7 8
So, a partial render runs at an average of about 0.1ms on a speedy machine. It’s much slower than rendering inline, but fast enough that it would be hard to notice when you also have things like URL generation, Rails helpers, and browser rendering time to think about.
Speed isn’t everything
If the performance of rendering partials was a lot worse, I’d still probably make the decision to break apart huge views.
There’s a phrase, Make It Work, Make It Right, Make It Fast. When you’re trying to fix a view that’s too big, you’re trying to get out of “Make it Work” mode, and you can’t just skip to “Make it Fast.” You’ll make your code less maintainable, and you could miss some opportunities for refactoring, which could lead to opportunities for caching, which could lead to even bigger performance benefits.
You won’t know until you measure
You don’t have to believe my numbers. I’d recommend that you didn’t! Ruby’s Benchmark library gives you some simple tools to run your own benchmarks. There are also tools for profiling your full stack, like New Relic and MiniProfiler. I use all three of these pretty regularly in my day-to-day work.
When you profile, always profile in production mode! The Rails development environment auto-reloads classes and views, which is useful while developing, but will kill your performance numbers.
While you’re writing your Rails app, you might run into a problem that you could solve more easily with a different data store. For example, you might have a leaderboard that ranks users by the number of points they got for answering questions on your site. With Redis Sorted Sets, a lot of the implementation is done for you. Awesome! But where do you put the code that interacts with Redis?
User model could talk to Redis:
1 2 3 4 5
But now your
User model holds responsibility for representing a
user, talking to Redis, and managing the leaderboard! This makes
User harder to understand and harder to test, and that’s the
opposite of what you want.
Instead, you could wrap the Redis requests in a new object that
represents what you’re using Redis for. For example, you could
Leaderboard class that wraps the Redis communication,
doesn’t inherit from
ActiveRecord::Base, but still lives in the
app/models directory. This would look like:
1 2 3 4 5
1 2 3 4 5
Both of these classes can live in
app/models, but now you’re not
contaminating your ActiveRecord models with extra logic. The
Leaderboard class manages the communication with Redis, so the
User class no longer has to care about how leaderboards are
also makes it easier to test.
You gain a lot by creating new classes for new responsibilities, and
app/models is a great place to keep them. You get the benefits of
leaning on another service for your feature’s implementation, while
also keeping your code easy to work with.
Try it out
Can you think of any network service communication that’s called directly from your ActiveRecord models, controllers, or views? Try moving that code into a non-ActiveRecord data model, and see if your code becomes easier to understand, work with, and test. Then, send me an email and let me know how it went!
All problems in computer science can be solved by another level of indirection.
…except for the problem of too many layers of indirection.
Writing code feels so much easier than writing tests for it, and does that one-line method really need to be tested, anyway? It’s so trivial! Any tests you add would just double or triple development time, and the next time you change the code, you’ll have to change the test, too. It seems like such a waste, especially when you only have a little bit of time left on your estimate.
But soon, your code is only 20% covered by tests, and any change you make to your code feels like trying to replace the middle layer of a house of cards without knocking the whole thing over. Somewhere, something went wrong and even though the decisions you made seemed right at the time, you still ended up with a totally unmaintainable codebase.
How did you get here? You wanted your tests to provide a safety net and allow you to confidently refactor. They should have helped make your code better! Instead, you came back to low test coverage on code you don’t understand anymore, and the tests you have somehow make it harder to change the code.
This isn’t a skill failure. It happens to the best developers. It’s a process failure. With a few changes in how you write new features, you can protect your code without your tests slowing you down. Your tests can make your code more understandable and more flexible. You’ll be able to change your code with confidence, knowing every path through it is tested.
You shouldn’t have to make a decision
If you’re sitting at your keyboard trying to decide whether a bit of code needs to be tested, you’re already going down the wrong path. You should always default to “Test it!” Even if it seems too trivial to need tests, write the test.
If the code is trivial to write, it should be easy to test. And complicated code will never seem as trivial as it does right after you wrote it. How do you know it’ll still seem as trivial six months from now?
But you don’t want to overtest
A gigantic test suite can be its own problem. Tests that take 20 minutes to run are as good as no tests at all, since you won’t run them all the time. (You say you will, but I know you won’t). Even worse, if you have too many brittle tests, it makes refactoring even more of a pain than it was before, so you won’t do it. You’ll end up with methods longer than most novels.
Does this contradict my earlier point? Not necessarily. Your tests should always focus on your code’s interface, not its implementation. For example:
1 2 3 4 5 6 7 8 9
It really feels like the code here needs to test both the
LineItem class. But is the
LineItem class used by anything
else? If it’s just an implementation detail of
Cart, and isn’t
exposed to the outside world, how many tests does it really need?
Can’t it just be tested through your
Classes that are extracted by refactoring often don’t need their own test suite. They’re just an implementation detail. It’s only when they’re used on their own that they need those extra tests.
With a great test suite against a public interface, you have the flexibility to change your implementation without rewriting all your tests. You can do this with a lot less effort than writing even an average test suite against your object’s implementation.
Amortize your test costs with Test-Driven Development
In the first section, you learned that you should test everything. In the second section, you learned that you should only test public interfaces. It’s Test-Driven Development that brings these two opposing goals together.
With Test-Driven Development, your tests drive the design and implementation of your code by following this process:
- Write a failing test that assumes that the code you need is already there.
- Write the simplest implementation of the code that passes the test.
- Refactor to remove duplication (or make the code more expressive).
- Run the tests again (to make sure they still pass).
- Return to step 1.
By following these steps, you’ll test everything (since no code should be written without a failing test), while only testing public interfaces (since you don’t write new tests right after refactoring).
It’s never exactly that easy. But there are still ways to test-drive even the most complicated code.
TDD has some side benefits:
- You’ll find yourself with a more flexible, tested object model (this is arguably the main benefit).
- Your system is by definition testable, making future tests less expensive to write.
- You amortize your testing costs across the development process, making your estimates more accurate.
- It keeps you in flow, because you never have to decide what to do next.
So how do I get started?
Starting is the hard part! Once you get in the rhythm of writing test-driven code, it’s hard to stop.
The next time you work on a new feature, try following the TDD steps above. You’ll find yourself with close to 100% code coverage with the least amount of work possible, you’ll have a solid foundation to build on, and you’ll have total confidence that your test suite will protect you when you have to change the code next year. Or later today, when your requirements change again.
When you’re done, send me an email and let me know how it went.
By following a straightforward testing process, you can spend less time making simple decisions and chasing down bugs, and more time writing code that solves your customers’ and business’ needs.
You’ve started a new project and it’s time for your code to depend on a third-party service. It could be something like ElasticSearch, Resque, a billing provider, or just an arbitrary HTTP API. You’re a good developer, so you want this code to be well tested. But how do you test code that fires off requests to a service that’s totally out of your control?
You could skip the tests, but you’ll soon be piling more code on a shaky foundation. Untested code tends to attract more complex code to it, and you’ll eventually feel like the code is too dangerous to refactor because you don’t have the test coverage you need to feel safe. You want to build a stable foundation for your future work, but instead you’ve ended up with an unmaintainable mess.
Avoiding this situation is a lot easier than it seems! With a few tools and a little up-front effort, you can decouple your tests from the services your code depends on, write simpler code, and have the confidence to improve the code you’ve written—without introducing bugs. Instead of procrastinating because you don’t know how to approach writing those next tests, you could look at an interaction between your code and the outside world, and know exactly how to jump right into the middle of it.
Mocha: the quick-and-dirty approach
Mocha is the easiest way to get in between your code and the outside world.
As an example, say you have a
Cart object that triggers a credit
card charge when it gets checked out. You want to make sure that the
cart has an error message attached to it if the charge fails.
You probably won’t want your tests to actually hit the billing system every time the tests run. Even if you did, it might be hard to force that service to return a failure. Here’s what it would look like with Mocha:
1 2 3 4 5 6
Mocha can also fail your tests if methods aren’t called the way you expect them to:
1 2 3 4 5 6
Mocha is simple to use, but can be incredibly handy. You have to be
careful that you’re mocking out only the behavior you don’t want
to happen — it’s easy to mock too much and hide real bugs. You also
don’t want to go overboard with this approach: tests full of
stubs are hard to read and think about.
Test fakes: my preferred approach
If you mock or stub the same methods on the same objects all the time, you can promote your mocks to full-fledged objects (sometimes called test fakes), like this:
1 2 3 4 5 6 7
Fakes are great:
Your fake can keep track of its internal state
The fake can have custom assertion messages and helper functions that make writing your tests easier, like the
total_chargesmethod in the example above.
As a full-fledged object, you get extra editor and language support
If you’re using an editor that supports it, you can get autocomplete, inline documentation, and other things you won’t get by stubbing out individual methods with Mocha. You’ll also get better validations, exception handling, and whatever else you want to build into your fake.
If you use a fake in development mode, you don’t have to have a connection to the real service
You can write your app on the bus, you don’t have to have a forest of services running down your laptop’s battery, and you can set these fake services up to return the data you need to work through edge cases without needing a lot of setup.
These objects can be used outside of your tests
This is probably my favorite part of fakes. You can have a logging client log to both a 3rd party service and your fake, backed by an in-memory array. You could then dump the contents of this array in an admin view on your site, making it much easier to verify that you’re logging what you think you’re logging.
You could do something like this:
1 2 3 4
Writing a fake takes more effort than stubbing out individual methods, but with practice it shouldn’t take more than an hour or two to get a helpful fake built. If you build one that would be useful to other people, share it! I built resque-unit a long time ago, and lots of people still use it today.
How do I get these objects injected, anyway?
You’ll have to get your objects under test to talk to these fakes somehow. Luckily, Ruby is so easy to abuse that injecting fakes usually isn’t hard.
If you control the API of the object under test, it’s best to add a default parameter, an attribute, or a constructor option where you can set your fake:
1 2 3 4 5 6
This is clean when you are talking to the real service and gives you a hook to add flexibility later.
If you don’t control the object or don’t want to add the extra parameter, you can always monkey patch:
1 2 3 4 5 6
It’s uglier in test, but cleaner in environments that don’t use the fake.
Start building your own fake right now
Building fakes gets easier with practice, so you should give it a try now:
- Find a test that talks to an external service. Tests that would fail if you disconnected from the internet are good candidates.
- Figure out what object actually does the communication, and what calls your code makes to that object.
- Create a mostly empty duplicate of that object’s class, and have it log the calls you make to an array.
- Add a method to your fake to return the list of calls made.
- Swap out the real object with your new fake object, and write some assertions against the calls your code makes.
If you give it a try, let me know how it goes!
With these techniques, it won’t be long until you’re able to tame the craziest interactions between your application and the outside world. A simple stub in the right place will let you ship your well-tested code with confidence.
I’ll answer that with a screenshot:
We upgraded from Ruby 1.9 to 2.1 in production about a week ago, and this is what we saw. The gray section at the bottom is our garbage collection time going to almost nothing. Overall, the upgrade gave us a 15-20% improvement in server response time. That’s without doing any additional GC tuning!
We saw a big boost in Google’s and Bing’s crawl rates after shipping the upgrade, which is especially awesome for sites like ours that benefit heavily from search engine traffic.
Cool, so how hard was it?
On a large Rails app with close to 150 gem dependencies, the upgrade took a single small commit and a few days of testing. We run rvm on all of our servers and provision with chef, so upgrading all our servers to 2.1 was as simple as changing a version number in a chef role file.
What problems did we run into?
We still use iconv for transliteration, so we had to include the
iconv gem, which replaces built-in
functionality that was removed from 2.x. I was never able to get
ruby-debug or the
debugger gem working under 2.x,
so we switched to pry and
still prefer the debugger gem to pry-byebug, but pry is so much better
than irb that it’s kind of a wash. Deivid Rodriguez,
byebug’s author, pointed out that my problems with pry-byebug were
probably not caused by byebug. He was totally right! (it was actually
caused by the interaction between byebug and pry).
In a mixed 1.9 and 2.1 environment, you’ll also have to upgrade bundler to 1.5.1. With the new version of bundler, you can specify 2.1 as a platform requirement in your Gemfile:
It sounds like GitHub ran into some other problems with 2.1, mentioned by Aman Gupta in this gist: https://gist.github.com/tmm1/8393897, but we haven’t noticed any issues yet.
You’ll also want to tune the GC after you ship to get even more speed. Aman has a good post with lots of details on how to do that: http://tmm1.net/ruby21-rgengc/. The entire 2.1 series on his blog is worth reading if you want to get the most benefit from your cheap, simple Ruby upgrade.
“How long would it take to do this?”
“How hard would it be to make this change?”
If you’re starting to get angry and frustrated, you’re probably a software developer. Estimation is painful, especially when the person asking you to estimate a task (Let’s call them ‘Bob’) just has a rough idea of what they’re asking for. A lot of this pain is caused by a difference in what Bob wants and what you think Bob wants.
Estimates are not a goal
As a developer, I think of an estimate as “How long, on average, will it take me to accomplish a task?” I’m frustrated because I usually don’t have enough information to answer that question, and I know I’m going to be held to what I say. I assume that my ability to hit this estimate will be an indicator of my overall performance, so I get stressed and annoyed.
They’re a communication tool
Any product that ships has things that depend on it. Depending on how large the company is, it could be marketing, publicity, business analysis, future product planning, or lots of other things. This means that the answer Bob is looking for is “When can I start assuming this thing will be done, so I can start planning things that depend on it?”
I’m answering with “This should be done by…”, but Bob is looking for a “This will be done by…”.
David Bryant Copeland has a great book, The Senior Software Engineer, which talks about this in its first chapter:
A senior software engineer is trusted and expected to do the most important work and to do it reliably. The way to accomplish that is to focus everything you do on the delivery of results.
Keeping that in mind, it’s best to meet Bob on his terms. This means that:
Estimates should be made at 90% confidence, not 50%
Bob wants to know when he can start planning things that depend on the work being done. If this means that the estimate needs to be padded (it probably does), that’s fine. If this means you have to get back him in an hour or two, that’s fine too.
Estimates should be based on time-to-ship, not time-to-test-handoff or time-to-deploy-queue or anything else
This goes back to the focus on the delivery of results. If Bob is waiting for a project to ship, it does no good to him if it’s stuck in QA. This usually means the estimates need to be padded again, which is fine.
The best estimate is often “I can’t answer that now, but I’ll get back to you.”
It’s much easier to get to a 90%-confidence time-to-ship estimate if the estimate is broken up into smaller parts, each of which is easier to understand. Many books discuss the tactics of making accurate estimates, but it’s out of the scope of this post.
What if I’m told that I’m not moving fast enough?
Everywhere I’ve worked, I’ve been told that management wishes we could move faster. No matter how fast products are delivered, even if quality is completely sacrificed, even if the estimates are absolutely insane, even if you work 20 hours a day, management will still wish you could move faster. It’s the job of a developer to see that it’s just a wish and do their job of keeping a fast but maintainable pace, being predictable, delivering results, and keeping a high bar of quality.
What about when I beat my estimates by a hilarious amount and it looks like my estimates are incredibly padded?
I’ve never seen someone who regulary overestimates get punished for it, and it’ll pay off the next time you have a project that goes much longer than you expected. Overestimation by a given amount is much, much better than underestimation by the same amount (I think it’s probably similar to Loss Aversion). Most of the time, there will be a backlog and you can start the next task earlier. Great!
If you targeted the 50% case, unexpected complications could demolish an entire product schedule. You’d do that about half the time! If you target the 90% case, even if things go badly, the rest of the schedule can operate as intended. The product ships on time, everyone’s happy, you get raises and promotions and the best reward of all: You get to make more estimates!