TLDR

I’m tired of carefully ordering my Dockerfiles to take advantage of caching. So are you. Some of the new docker buildkit features sort of solve this but I’m not sure how it works in CI.

Example dockerfile

(The ‘syntax’ comment is load-bearing).

#syntax=docker/dockerfile:experimental
FROM python:3.8

WORKDIR /app
COPY requirements.txt .
# `pip cache list` spits out cache contents -- doesn't impact functionality but shows that the cache is working.
RUN --mount=type=cache,id=custom-pip,target=/root/.cache/pip pip cache list
RUN --mount=type=cache,id=custom-pip,target=/root/.cache/pip pip install -r requirements.txt

And you have to set a DOCKER_BUILDKIT shell var when you run this:

DOCKER_BUILDKIT=1 docker build -t whatever .

Improvement pretty big

From scratch builds are ~10 seconds, and incremental builds are ~2 seconds.

Details:

  • My caching setup caches wheels, the build output from pip install. Some packages ship wheels to pypi, others don’t. A cached wheel always saves download work and sometimes saves build work.
  • My requirements.txt has a transitive dependency on pyyaml, which has a compilation step and doesn’t host a whl (at least in the python:3.8 image).
  • I created ‘incremental builds’ by adding a comment in requirements.txt; a real-life version of this would be to add a single new dependency to a large requirements file.

(Pop these open if you want to see some build output).

Uncached output (10 seconds)
#10 [stage-0 4/5] RUN --mount=type=cache,id=custom-pip,target=/root/.cache/p...
#10 0.873 Nothing cached.
#10 DONE 0.9s

#11 [stage-0 5/5] RUN --mount=type=cache,id=custom-pip,target=/root/.cache/p...
#11 1.276 Collecting automig==0.1.1
#11 1.387   Downloading automig-0.1.1.tar.gz (16 kB)
#11 1.811 Collecting sqlparse==0.3.0
#11 1.831   Downloading sqlparse-0.3.0-py2.py3-none-any.whl (39 kB)
#11 1.917 Collecting gitpython==3.1.0
#11 1.939   Downloading GitPython-3.1.0-py3-none-any.whl (450 kB)
#11 2.237 Collecting pyyaml==5.1
#11 2.256   Downloading PyYAML-5.1.tar.gz (274 kB)
#11 2.721 Collecting gitdb<5,>=4.0.1
#11 2.742   Downloading gitdb-4.0.5-py3-none-any.whl (63 kB)
#11 2.850 Collecting smmap<4,>=3.0.1
#11 2.873   Downloading smmap-3.0.4-py2.py3-none-any.whl (25 kB)
#11 2.888 Building wheels for collected packages: automig, pyyaml
#11 2.888   Building wheel for automig (setup.py): started
#11 3.155   Building wheel for automig (setup.py): finished with status 'done'
#11 3.155   Created wheel for automig: filename=automig-0.1.1-py3-none-any.whl size=15793 sha256=3ea320007aa5ddc9de90a947e80628eaf3dbf40ec50a5da4fca448fe0f9c1e0b
#11 3.155   Stored in directory: /root/.cache/pip/wheels/06/c0/46/8299287bb8a6d390a81fda57644f12ab77dc0f773bd5cab4dc
#11 3.156   Building wheel for pyyaml (setup.py): started
#11 8.901   Building wheel for pyyaml (setup.py): finished with status 'done'
#11 8.903   Created wheel for pyyaml: filename=PyYAML-5.1-cp38-cp38-linux_x86_64.whl size=568218 sha256=e25710020c0bbcfaaea9c2a4ba76a3b4b76e5c0f1c5df3b73d0d82f543d3fb30
#11 8.903   Stored in directory: /root/.cache/pip/wheels/52/dd/2b/10ff8b0ac81b93946bb5fb9e6749bae2dac246506c8774e6cf
#11 8.904 Successfully built automig pyyaml
#11 8.949 Installing collected packages: sqlparse, smmap, gitdb, gitpython, pyyaml, automig
#11 9.322 Successfully installed automig-0.1.1 gitdb-4.0.5 gitpython-3.1.0 pyyaml-5.1 smmap-3.0.4 sqlparse-0.3.0
#11 DONE 9.6s
Subsequent run with requirements.txt changed (2 seconds)
#10 [stage-0 4/5] RUN --mount=type=cache,id=custom-pip,target=/root/.cache/p...
#10 0.894 Cache contents:
#10 0.894
#10 0.894  - PyYAML-5.1-cp38-cp38-linux_x86_64.whl (568 kB)
#10 0.894  - automig-0.1.1-py3-none-any.whl (15 kB)
#10 DONE 0.9s

#11 [stage-0 5/5] RUN --mount=type=cache,id=custom-pip,target=/root/.cache/p...
#11 1.318 Processing /root/.cache/pip/wheels/06/c0/46/8299287bb8a6d390a81fda57644f12ab77dc0f773bd5cab4dc/automig-0.1.1-py3-none-any.whl
#11 1.392 Processing /root/.cache/pip/wheels/52/dd/2b/10ff8b0ac81b93946bb5fb9e6749bae2dac246506c8774e6cf/PyYAML-5.1-cp38-cp38-linux_x86_64.whl
#11 1.466 Collecting gitpython==3.1.0
#11 1.469   Using cached GitPython-3.1.0-py3-none-any.whl (450 kB)
#11 1.543 Collecting sqlparse==0.3.0
#11 1.544   Using cached sqlparse-0.3.0-py2.py3-none-any.whl (39 kB)
#11 1.612 Collecting gitdb<5,>=4.0.1
#11 1.615   Using cached gitdb-4.0.5-py3-none-any.whl (63 kB)
#11 1.655 Collecting smmap<4,>=3.0.1
#11 1.657   Using cached smmap-3.0.4-py2.py3-none-any.whl (25 kB)
#11 1.711 Installing collected packages: pyyaml, smmap, gitdb, gitpython, sqlparse, automig
#11 2.098 Successfully installed automig-0.1.1 gitdb-4.0.5 gitpython-3.1.0 pyyaml-5.1 smmap-3.0.4 sqlparse-0.3.0
#11 DONE 2.2s

Not sure how this works in CI

I couldn’t navigate docs / read source code well enough to know how the RUN --mount=type=cache feature gets cached when you’re not always running it on the same box. Does it mirror caches to the registry? Is buildkit supposed to do manage remote caches?

In my personal life / small project work, all my builds run on my laptop or on a dedicated build server, so caching is great. In my paid work, CI often runs in clusters with network volumes or docker registries where taking advantage of caching involves a lot of network traffic and limited speedup.

In practice, the better answer might be a local pypi mirror that builds & caches whls for your platform, rather than better cache awareness by docker builds. I’m not sure how that ports to a modern compiled language like go/rust where the package manager isn’t supposed to serve built binaries.

If you understand these topics better than me, email me with dets and I’ll update this section.

Docker should integrate better with package management

Thesis:

  • CI is expensive for everyone (vendors who provide it for free, developers who wait for it to finish, package servers who have to provide bandwidth)
  • Packages caching + docker caching are incompatible as-is
  • But package caching is conceptually easy to do on a per-cluster basis – cache the download, cache the build per platform. Not doing this = wasted CPU + time = wasted money.
  • All involved parties benefit from cheaper / faster builds

I don’t know why package management integration isn’t the first priority of docker build improvements. Maybe I don’t understand the demand and this isn’t general enough.

Alternatively: very conceivable that we’ll standardize on non-docker tools for container builds. I could see using bazel or something with a language-aware build cache to build containers within containers. I think there’s no way around complexity if you want fast builds.

This seems to be the way of things with docker – good concepts, pioneering implementation, lunch gets stolen when execution quality starts to matter. Just like kubernetes was a more reasonable approach to running container workloads than docker’s swarm, watch builds get taken from them too.

Update: docker build caching in golang isn’t as simple

Compiled languages have more steps (package download cache + binary build).

Ran a quick test of something similar, success:

=> [stage-1  4/11] RUN --mount=type=cache,id=gopkg,target=/go/pkg go mod download                                                               1.4s
=> [stage-1  5/11] COPY *.go .                                                                                                                  0.0s
=> [stage-1  6/11] COPY cmd/ cmd/                                                                                                               0.0s
=> [stage-1  7/11] COPY Makefile .                                                                                                              0.0s
=> [stage-1  8/11] RUN --mount=type=cache,id=gobuild,target=/root/.cache/go-build --mount=type=cache,id=gopkg,target=/go/pkg go build ./cmd/... 2.9s

This is rebuilding with a version change to a package and a one-liner change to a source file. The speedup is:

step full cached
mod download 9.5s 1.4s
build 30s 2.9s
total 39.5s 3.3s

10x faster!

BUT when you subsequently try to use the build image to run a test, the cache layer isn’t present. The cache is only available in builds, not in runs.

I worked around this by making the test command into a build rather than a run, but (1) yuck, and (2) there’s network and storage overhead associated with docker build that isn’t incurred for docker run. The go test CI step takes ~20 seconds now, which is more than the ~5 seconds it takes on my laptop after a small change, and less than the ~40 seconds it took with no build cache.

I think making builds faster is life or death for languages. It’s not just a problem for compiled languages; nodejs / webpack can get very tricky if you’re using a lot of babel.

Whatever compiled language uses distributed build caching for packages and other tricks to provide a python-speed edit-test-run loop will rule the roost. Even if it doesn’t take market share from the other system langs (cpp / java / rust, maybe go is on this list), it will get users from the interpreted lang community. Some insider said this about go when it launched – they expected it to replace java but it actually replaced python. Expect more of the same from the next fast-to-build compiled language.