Avoid data migrations in the schema migrations for Rails
Could you re-run all migrations in the project? How often were you required to fix them while production was under fire? What can you do to prevent problems with migration and not waste much time on it?
Solution
Do not mess with schema and data migrations! Separate those problems. There is a pretty common strategy for this:
- Use migrations only for schema changes.
- Use one-off tasks to seed, transform, or import data.
- Drop outdated migrations and join redundant tasks into efficient workflows.
How can you onboard it?
As usual, there are ready solutions with a lot of helpers to make this work:
But you can also do this without gems by following a simple convention.
DIY Algorithm
Add Service to migrate data. More details about Services you can find on How to use a Transaction Script (aka Service Objects) in Ruby on Rails.
(Optional) If you are using the TDD way, you should also add tests for it. Ensure that this migration does not corrupt production data.
Create a one-off rake task with the timestamp. The timestamp will simplify finding unrelated tasks and cleaning them up.
In the task, you should run Service. No need to have any logic there. Run and output.
On release: invoke the rake task.
Schedule removes the rake task and obsolete code following success.
More details about the problem
It’s not common for Ruby on Rails developers to re-run migrations. But migrations are the most “perishable product” in the project.
Models schema, methods, or in general logic at all in the migrations, as usual, are not relevant in a very short time. We do not use them on an everyday basis, so we would not find a problem faster.
As a result, once you have a situation in which you need to run a migration. And you will find that you broke them. It will need some time and energy to clean up and make them work.
But as usual, you have production on the fire now, and last what you would like to fix are the migrations.
Other solutions
To make migrations work as long as possible, avoid direct Model references, method calls, and Active Record Queries. Use raw SQL instead.
Also, you can do regular migration cleanups and squashes, but as usual, we do not care about them much as soon as we have delivered them to production.