Have you ever wanted to import a bunch of data into your app from a CSV file? Or maybe you need to fix badly encoded characters in some of your customer reviews. Or you changed your mind about how you wanted to store data in Redis, and had to move everything from the old format to the new one.
At Avvo, we called these “ad-hoc tasks.” As in, you probably only need to run them once. So what’s the best way to handle an ad-hoc task in Rails?
Write a database migration
A migration works well if you need to change the structure of the data in your database. It tracks whether the task was run, it carries over changes to other environments – it’s what migrations were built for. It’s also what you’re probably already using them for.
If you’re changing data at the same time, a migration might work well. But there are some things to watch out for.
Calling something like
Permissions.create(...) in your migration can cause you trouble. If the model has changed, your migration might break, because your model might not be available when the migration runs. Or your model might have changed between the time you wrote the migration and when it ran. There are ways to get around this, but they’re error-prone and can fail in weird ways.
Migrations are also less useful if your task doesn’t involve ActiveRecord.
These aren’t deal-breakers. But I tend not to import or change much data in migrations. There are better options.
Write a rake task
You have a task. You probably only want to run it once. And you want to be able to test it on your machine and run it in production.
Rake tasks work really well for this. Rails can even generate rake tasks for you:
This creates a file for you to stash your code into:
task block, you can use all your models and the rest of the code in your Rails app. It’s easy to import and change data, because you can write your code just like you were sitting at a Rails console.
Once you’ve written your task, you can run it with
rake locations:import. If you’re using Heroku, you can run it with
heroku run rake locations:import. If you’re using Capistrano, you can use the capistrano-rake gem to run your task. You might have an even better option, though.
Write a scheduled job, using sidekiq-scheduler
Most of these background job processors can schedule jobs to run later. In Sidekiq, for example, there’s the sidekiq-scheduler gem. And with sidekiq-scheduler, there’s a trick you can do.
What if you had a job that never automatically scheduled itself, but let you manually schedule it whenever you wanted? That would work great for “one-off” jobs that you might want to run again later, or that you’d rather run using a UI.
In sidekiq-scheduler, you can schedule the job far in the future, and set the job to disabled:
Then, when you visit sidekiq-web, you’ll see a button to manually enqueue the job:
With this, you can run your job whenever you’re ready, in both development and production. And if you ever need to run it again, it’s right there in the UI.
This isn’t the best option if your job is dangerous. It’s too easy to accidentally click that button. And it’s also not great if the job takes a while to complete, because Sidekiq works best if jobs finish quickly. Your job will take over a worker, and you won’t be able to safely restart Sidekiq until your job finishes. But if your job is fast, and can run safely more than once, this works well. If it’s a cleanup kind of task, you might decide you want to run it regularly.
If you only want to focus on scheduling and triggering, or need more flexibility to set params in your one-time scripts, a reader, Dmitry, pointed me at sidekiq-enqueuer. With sidekiq-enqueuer, you can schedule jobs and set params, all through the Sidekiq web interface.
SSH into production and paste code into the Rails console
Which should you choose?
I’ve used all of these ways to run one-off tasks. But I’ll usually go for a rake task first. It works, it’s hard to run accidentally, and it’s easy to get rid of when you’re done with it. I don’t choose rake tasks every time, though.
I might choose a migration if:
- The job fixes up data using SQL as part of a database schema change.
- The job is very simple data work, like changing data in a column or adding a few records.
- I want to easily track whether the job has been run, and not run it again.
I might choose a Sidekiq job if:
- I think I might want to run the job again later.
- Someone who’s not me has to run it. All they’ll have to do is click a button.
- It’s a short data import or data cleanup job. I’ll probably have to run those regularly, even if I don’t expect to at first.
How about you? Do you have any other options, or make different choices? Leave a comment and let me know!