Remote Caching
Mill can share task outputs between builds in different folders, or even on different machines, via a remote cache. Outputs computed by one build are stored in the cache and transparently re-used by any other build whose inputs are unchanged, so you never need to compute the same thing twice, whether across different worktrees or checkouts on the same machine or across a distributed fleet of developer machines or CI workers.
The simplest cache backend is a plain local or shared (e.g. NFS) directory: point the
cache at a filesystem path and Mill reads and writes cache entries there directly, with no
server to run. We enable it from the build header above, so it
applies to every command without passing --remote-cache-location on the command line.
//| mill-remote-cache-location: ~/mill-remote-cache-folder
package build
import mill.*, javalib.*
object foo extends JavaModule
> ./mill foo.run
Hello World!
> ./mill clean foo.compile
> ./mill foo.run
Hello World!
./mill clean foo.compile removed the local out/ entry for the compile, so the second
./mill foo.run could only skip recompiling by fetching the result from the shared cache
folder. out/mill-profile.json confirms it with "cached": true for foo.compile, rather
than a local recompute:
> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true
Point any other checkout of the project at the same folder, local or on a shared mount, to share results between them.
A Bazel-Remote Cache Server
To share a cache across a fleet of machines or CI workers you want a real cache server.
Mill speaks the widely-supported Bazel Remote
Execution HTTP cache protocol, so it works against off-the-shelf servers like
bazel-remote (as well as Buildbarn, EngFlow, and
others). Point mill-remote-cache-location at the server’s base URL, here in the
build header so every command uses it.
Below we download the bazel-remote binary — the two `curl`s fetch the macOS/arm64 and
Linux/x86-64 builds; see the releases page
for other platforms — then run it serving an HTTP cache on a local port and build against it.
//| mill-remote-cache-location: http://localhost:8378
package build
import mill.*, javalib.*
object foo extends JavaModule
> curl -sL -o bazel-remote https://github.com/buchgr/bazel-remote/releases/download/v2.6.1/bazel-remote-2.6.1-darwin-arm64 && chmod +x bazel-remote # mac
> curl -sL -o bazel-remote https://github.com/buchgr/bazel-remote/releases/download/v2.6.1/bazel-remote-2.6.1-linux-amd64 && chmod +x bazel-remote # linux
> ./bazel-remote --dir bazel-remote-data --max_size 1 --http_address localhost:8378 > bazel-remote.log 2>&1 & echo $! > bazel-remote.pid; sleep 3
> ./mill foo.run
Hello World!
> ./mill clean foo.compile
> ./mill foo.run
Hello World!
> kill $(cat bazel-remote.pid)
After ./mill clean foo.compile, the second ./mill foo.run skipped recompiling and instead
downloaded the compile output from bazel-remote, as "cached": true for foo.compile in
out/mill-profile.json confirms:
> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true
Because bazel-remote validates each uploaded ActionResult against the blobs it references,
a successful hit also confirms Mill’s protocol implementation is correct on the wire. For
authenticated servers, set the MILL_REMOTE_CACHE_AUTHORIZATION environment variable and Mill
sends it as the Authorization header on every request.
An Apache httpd Cache Server
Mill’s remote cache speaks the Bazel HTTP cache protocol, which per the
Bazel docs works against "any HTTP/1.1 server that
supports PUT and GET" — not just purpose-built caches. So you can also back the cache with a
general-purpose web server you may already run, such as Apache
httpd with its mod_dav WebDAV module.
The httpd.conf next to this build serves an apache-data/ folder over WebDAV; we create
that folder, start a stock Apache with the config, and point mill-remote-cache-location at
it in the build header.
//| mill-remote-cache-location: http://localhost:8379
package build
import mill.*, javalib.*
object foo extends JavaModule
> mkdir -p apache-data/ac apache-data/cas
> /usr/sbin/httpd -d "$(pwd)" -f httpd.conf -k start # mac
> apache2 -d "$(pwd)" -f httpd.conf -k start # linux
> ./mill foo.run
Hello World!
> ./mill clean foo.compile
> ./mill foo.run
Hello World!
> kill $(cat httpd.pid)
After ./mill clean foo.compile, the second ./mill foo.run fetched the compile output from
Apache rather than recompiling, as "cached": true for foo.compile in
out/mill-profile.json shows:
> cat out/mill-profile.json | jq '.[] | select(.label == "foo.compile").cached'
true
The cache files Apache stored end up under apache-data/ac/ and apache-data/cas/, the same
on-the-wire layout as a dedicated Bazel remote cache.
Configuration
Remote caching is controlled by three settings, set in your build header so they apply to every build, or as command-line flags for a one-off run:
-
--remote-cache-locationis the cache location: the base URL of a Bazel-remote-protocol HTTP cache, or afile:URL / path (a leading~/is resolved against your home directory) to a shared folder used as the cache with no server. This is the only setting needed to turn remote caching on. -
--remote-cache-filterrestricts caching to the tasks matching a task query, e.g.__.compile. When omitted, every cacheable task is eligible. -
--remote-cache-saltadds an extra string to the cache key to partition the cache, e.g. to keep entries computed on different operating systems from being shared.
For caches that require authentication, set the MILL_REMOTE_CACHE_AUTHORIZATION environment
variable; Mill sends its value as the Authorization header on every request, supporting
bearer tokens, HTTP basic auth, and similar schemes.
Limitations
As with every build tool’s remote cache (Bazel, Buck, Pants, …), remote caching comes with trade-offs. None of these are unique to Mill:
-
Remote caching assumes all of a task’s inputs are tracked. It cannot detect a task that produces different output for the same
inputsHashbecause of an un-tracked input, e.g. a task that shells out to a system tool whose version differs between machines. Use--remote-cache-saltto give such environments different cache keys, for example a different salt on macOS developer machines and on Linux CI. -
Anyone with write access to the cache can run code on every reader. A task output uploaded by one machine, including compiled binaries and scripts, is downloaded and used on every other machine that reads from the cache, so all machines sharing a cache must be within the same trust boundary. A common safe topology is to grant write access only to trusted CI (e.g. main-branch builds) while letting developer laptops and PR builds read from it.
-
Not every cacheable task benefits from remote caching. Some tasks are faster to recompute locally than to fetch over the network, and Mill caches at a whole-task granularity, where each task’s output directory is a single cache entry. Use
--remote-cache-filterto target expensive, high-reuse tasks such as__.compilewhere caching pays off. -
Only declared output paths are restored from
Task.dest. For custom tasks, remote caching restores files and directories underTask.destthat are returned asPathRef`s by the task. Incidental files written to `Task.destby a task returning a plain value, such as aStringorInt, are not portable through the remote cache even if the task’s value itself is served from cache. If downstream tasks or users need those files, make sure the returned value contains the `PathRef`s pointing at them, e.g. as fields of a case class or members of a tuple or named tuple. -
Only cacheable task types participate. Normal cached tasks and persistent tasks, including
compiletasks whosedest/holds incremental-compiler state, can be stored in and restored from the remote cache. A remote miss leaves a persistent task’s localdest/intact so an incremental recompute can still re-use it. Side-effecting task types such asTask.Input,Task.Command,Task.Worker, and other non-cached tasks are never sent to the remote cache. This means commands likerunortestForkedmay still execute even when their upstreamcompile, link, or environment-setup tasks are remote-cache hits.