ratchet approach
…The ratchet approach is a valuable tool for managing technical [[Migrations]]. One example is the colorfully-named [shitlist approach](https://sirupsen…
A form of technical Change management and an essential engineering skill, migrations involve replacing elements of a running system over time. On the smaller end, you might be renaming a column on a large and frequently-accessed database table over the course of a day. On the larger end, replacing all of Github’s permissions code incrementally over the course of two years or even, when necessary, gradually replacing entire legacy systems.
Migrations come into play when you can’t easily or safely replace the old element with the new element in a single change. The general pattern I’ve seen work is to:
A simple example is renaming a frequently-accessed column on a large database table. There are at least two reasons not to just use SQL’s RENAME COLUMN
:
RENAME COLUMN
takes out an ACCESS EXCLUSIVE
lock on the table, preventing reads and writes for the duration of the operation. If this takes more than a few seconds (such as on a large table), queries on this table from your application will hang, potentially leading to timeouts and downtime.RENAME COLUMN
, one version of the code will fail for the duration of that operation, resulting in errors.To safely rename a column requires a multi-stage approach:
new_column_name
to the table while keeping old_column_name
in placeold_column_name
.new_column_name
with the value of old_column_name
for all records created prior to step 2 (often called a “backfill”). At this point, we should have a guarantee that new_column_name
and old_column_name
contain the same data for all rows. A query like SELECT COUNT(*) FROM table WHERE new_column_name <> old_column_name
, which should remain 0
, can help confirm this.new_column_name
.old_column_name
.Of course, this is quite a bit more work than a single RENAME COLUMN
query, so it’s worth considering the tradeoff between development effort and avoiding downtime. But as your app and user-base grow the tradeoff typically tilts in favor of avoiding downtime.
The most common anti-pattern I’ve seen is migrations that are never completed. This can happen for a number of reasons, most of them sociotechnical, e.g.
Unfinished migrations impose a significant burden, so it’s best to avoid these by ensuring migrations are followed through on or rolled back. Some of those burdens are
…The ratchet approach is a valuable tool for managing technical [[Migrations]]. One example is the colorfully-named [shitlist approach](https://sirupsen…
…experience](https://martinfowler.com/articles/patterns-legacy-displacement/) with failed [[Migrations]] off of legacy systems resulting from focusing only on the…