One year of automatic DB migrations from git

For the last year-plus, for most of my solo work, I’ve used a tool called automig to automatically turn my SQL schema changes into deltas that can be applied to a DB (plug – I wrote it).

I hate writing migrations because it feels like work a computer should know how to do, and because in general there’s no guarantee that the migrations produce something equivalent to your ‘schema.sql’ or ORM definition.

(SQLAlchemy’s alembic has an autogenerate feature which compares a running DB to your ORM spec; having a running DB in the loop for migration generation creates a different set of tradeoffs than automig, which analyzes the sql files directly).

This article is a pros and cons retrospective on that year.

Pros
Cons

Pros

I haven’t written a migration in 1+ years. And I don’t love writing migrations. My process to add a column has fewer steps. This is a win.

Simple and readable source of truth. As long as you trust that the tool knows what it’s doing, you can open up the schema.sql file (or whatever you choose to name yours) and get a schema that is both a readable doc of what the database should have and a reliable indicator of what the database actually has.

Standard tool across different languages. I’ve used automig on different python and golang projects and it doesn’t care. It’s not linked to any design or tool decision inside the codebase. Automig isn’t a standard tool, but if it were, it would be a portable skillset.

No cluttered migrations dir. More of a personal hygiene decision than a legit gripe, but migration directories aren’t my favorite; hundreds of files that do very little good. Automig is also faster at reinitialization because you can start from git HEAD rather than applying hundreds of changes from the last 36 months.

Turns something complicated into something simple and almost as good. There are cons (see below) but there’s a bunch of migration-related work that I no longer think about. I no longer dread adding a DB column or an index. If my capabilities are less because the tool is simpler and declarative, that’s a tradeoff, but it’s one that I’ve lived with happily.

Cons

Data migrations not supported. Automig is good at schema migrations but doesn’t have an easy way to transform columns or run code on your DB. The tool has an answer to this in the roadmap. For my own needs I’ve been able to work around this by doing two-step migrations with default values.

For larger users, data migrations involve lots of design (see for example github’s GH-OST tool). In the future I think migrations should be a native feature in the DB – you should upload a schema and specify whether migrations run up-front or on read. And we shouldn’t tie type to storage locality.

When something goes wrong, I have to fix it. This is 50% a gripe about using a tool that I maintain and am the only user of. But 50% a legit point that a ‘declarative diffing’ tool has more logic in it than migrations that you write yourself in SQL. Running arbitrary SQL gives you a lot of flexibility and gives you infinite freedom to choose incompatible dialects.

Extra lifting to integrate with ORMs. Automig can generate SQLAlchemy definitions from your schema.sql, but that’s it. If you use a single language / framework, defining your DB in an ORM is probably more useful than having it specified using SQL.

Dialect support is no picnic. When I switched from postgres to sqlite for some projects, it was a pain to support the different dialects. I ran into things like different support for transactional DDL.

(Since I first wrote this I’ve seen comments from the skeema and migra committers. Both of these tools support only one DB: mysql and postgres respectively, and rely on the DB to do the heavy lifting. There are a lot of positives to this approach).

Branch conflict issues + rebasing. Any nonlinear git history can be a source of errors. Automig has an --opaque switch to work around these, but manually-specified migrations are likely better at branches, especially if you need to support out-of-order changes. I haven’t encountered these problems because I’m in solo codebases, but I can see there being issues in big teams who sometimes deploy from non-main branches.

Migrating production involves up-front work. If you use your main backend language / framework to run migrations, life is easy. Automig has extra requirements: it needs to bundle the .git folder (i.e. whole history). When I ran this on lambda, I had to also bundle a git binary, and my ubuntu binary didn’t work. I spent a whole day learning how to build git statically before I realized I could just grab the centos one. The good news is that this work only has to be done once per platform.

Testing is annoying. Because automig only works on committed changes, I sometimes have to do a few rounds of git commit --amend before things work.

Column order. Automig doesn’t guarantee column order (it does add column but not add column b after a). This has caused issues with backup / restore. It’s a problem with the tool but not necessarily with the approach of using git + sql as the source of truth.

Weird parser. My parser library is easily confused, especially by uppercase / lowercase and names that look like keywords. And it’s multi-layer (I use python sqlparse and then wrap it), i.e. janky. This isn’t an issue with the approach so much as the specific tools I use, but it causes problems.