Remote Caching

Mill can share task outputs between builds in different folders, or even on different machines, via a remote cache. Outputs computed by one build are stored in the cache and transparently re-used by any other build whose inputs are unchanged, so you never need to compute the same thing twice, whether across different worktrees or checkouts on the same machine or across a distributed fleet of developer machines or CI workers.

The simplest cache backend is a plain local or shared (e.g. NFS) directory: point the cache at a filesystem path and Mill reads and writes cache entries there directly, with no server to run. We enable it from the build header above, so it applies to every command without passing --remote-cache-location on the command line.

build.mill (download, browse)
//| mill-remote-cache-location: ~/mill-remote-cache-folder

package build
import mill.*, javalib.*

object foo extends JavaModule
> ./mill foo.run
Hello World!

> ./mill clean foo.compile

> ./mill foo.run
Hello World!

./mill clean foo.compile removed the local out/ entry for the compile, so the second ./mill foo.run could only skip recompiling by fetching the result from the shared cache folder. out/mill-profile.json confirms it with "cached": true for foo.compile, rather than a local recompute:

> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true

Point any other checkout of the project at the same folder, local or on a shared mount, to share results between them.

A Bazel-Remote Cache Server

To share a cache across a fleet of machines or CI workers you want a real cache server. Mill speaks the widely-supported Bazel Remote Execution HTTP cache protocol, so it works against off-the-shelf servers like bazel-remote (as well as Buildbarn, EngFlow, and others). Point mill-remote-cache-location at the server’s base URL, here in the build header so every command uses it.

Below we download the bazel-remote binary — the two `curl`s fetch the macOS/arm64 and Linux/x86-64 builds; see the releases page for other platforms — then run it serving an HTTP cache on a local port and build against it.

build.mill (download, browse)
//| mill-remote-cache-location: http://localhost:8378

package build
import mill.*, javalib.*

object foo extends JavaModule
> curl -sL -o bazel-remote https://github.com/buchgr/bazel-remote/releases/download/v2.6.1/bazel-remote-2.6.1-darwin-arm64 && chmod +x bazel-remote # mac

> curl -sL -o bazel-remote https://github.com/buchgr/bazel-remote/releases/download/v2.6.1/bazel-remote-2.6.1-linux-amd64 && chmod +x bazel-remote # linux

> ./bazel-remote --dir bazel-remote-data --max_size 1 --http_address localhost:8378 > bazel-remote.log 2>&1 & echo $! > bazel-remote.pid; sleep 3

> ./mill foo.run
Hello World!

> ./mill clean foo.compile

> ./mill foo.run
Hello World!

> kill $(cat bazel-remote.pid)

After ./mill clean foo.compile, the second ./mill foo.run skipped recompiling and instead downloaded the compile output from bazel-remote, as "cached": true for foo.compile in out/mill-profile.json confirms:

> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true

Because bazel-remote validates each uploaded ActionResult against the blobs it references, a successful hit also confirms Mill’s protocol implementation is correct on the wire. For authenticated servers, set the MILL_REMOTE_CACHE_AUTHORIZATION environment variable and Mill sends it as the Authorization header on every request.

An Apache httpd Cache Server

Mill’s remote cache speaks the Bazel HTTP cache protocol, which per the Bazel docs works against "any HTTP/1.1 server that supports PUT and GET" — not just purpose-built caches. So you can also back the cache with a general-purpose web server you may already run, such as Apache httpd with its mod_dav WebDAV module.

The httpd.conf next to this build serves an apache-data/ folder over WebDAV; we create that folder, start a stock Apache with the config, and point mill-remote-cache-location at it in the build header.

build.mill (download, browse)
//| mill-remote-cache-location: http://localhost:8379

package build
import mill.*, javalib.*

object foo extends JavaModule
> mkdir -p apache-data/ac apache-data/cas

> /usr/sbin/httpd -d "$(pwd)" -f httpd.conf -k start # mac

> apache2 -d "$(pwd)" -f httpd.conf -k start # linux

> ./mill foo.run
Hello World!

> ./mill clean foo.compile

> ./mill foo.run
Hello World!

> kill $(cat httpd.pid)

After ./mill clean foo.compile, the second ./mill foo.run fetched the compile output from Apache rather than recompiling, as "cached": true for foo.compile in out/mill-profile.json shows:

> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true

The cache files Apache stored end up under apache-data/ac/ and apache-data/cas/, the same on-the-wire layout as a dedicated Bazel remote cache.

Configuration

Remote caching is controlled by three settings, set in your build header so they apply to every build, or as command-line flags for a one-off run:

  • --remote-cache-location is the cache location: the base URL of a Bazel-remote-protocol HTTP cache, or a file: URL / path (a leading ~/ is resolved against your home directory) to a shared folder used as the cache with no server. This is the only setting needed to turn remote caching on.

  • --remote-cache-filter restricts caching to the tasks matching a task query, e.g. __.compile. When omitted, every cacheable task is eligible.

  • --remote-cache-salt adds an extra string to the cache key to partition the cache, e.g. to keep entries computed on different operating systems from being shared.

For caches that require authentication, set the MILL_REMOTE_CACHE_AUTHORIZATION environment variable; Mill sends its value as the Authorization header on every request, supporting bearer tokens, HTTP basic auth, and similar schemes.

Limitations

As with every build tool’s remote cache (Bazel, Buck, Pants, …​), remote caching comes with trade-offs. None of these are unique to Mill:

  • Remote caching assumes all of a task’s inputs are tracked. It cannot detect a task that produces different output for the same inputsHash because of an un-tracked input, e.g. a task that shells out to a system tool whose version differs between machines. Use --remote-cache-salt to give such environments different cache keys, for example a different salt on macOS developer machines and on Linux CI.

  • Anyone with write access to the cache can run code on every reader. A task output uploaded by one machine, including compiled binaries and scripts, is downloaded and used on every other machine that reads from the cache, so all machines sharing a cache must be within the same trust boundary. A common safe topology is to grant write access only to trusted CI (e.g. main-branch builds) while letting developer laptops and PR builds read from it.

  • Not every cacheable task benefits from remote caching. Some tasks are faster to recompute locally than to fetch over the network, and Mill caches at a whole-task granularity, where each task’s output directory is a single cache entry. Use --remote-cache-filter to target expensive, high-reuse tasks such as __.compile where caching pays off.

  • Only declared output paths are restored from Task.dest. For custom tasks, remote caching restores files and directories under Task.dest that are returned as PathRef`s by the task. Incidental files written to `Task.dest by a task returning a plain value, such as a String or Int, are not portable through the remote cache even if the task’s value itself is served from cache. If downstream tasks or users need those files, make sure the returned value contains the `PathRef`s pointing at them, e.g. as fields of a case class or members of a tuple or named tuple.

  • Only cacheable task types participate. Normal cached tasks and persistent tasks, including compile tasks whose dest/ holds incremental-compiler state, can be stored in and restored from the remote cache. A remote miss leaves a persistent task’s local dest/ intact so an incremental recompute can still re-use it. Side-effecting task types such as Task.Input, Task.Command, Task.Worker, and other non-cached tasks are never sent to the remote cache. This means commands like run or testForked may still execute even when their upstream compile, link, or environment-setup tasks are remote-cache hits.