Tasks

Task Graphs

One of Mill’s core abstractions is its Task Graph: this is how Mill defines, orders and caches work it needs to do, and exists independently of any support for building Scala.

The following is a simple self-contained example using Mill to compile Java:

import mill._, mill.modules.Jvm

// sourceRoot -> allSources -> classFiles
//                                |
//                                v
//           resourceRoot ---->  jar

def sourceRoot = T.sources { os.pwd / "src" }

def resourceRoot = T.sources { os.pwd / "resources" }

def allSources = T { sourceRoot().flatMap(p => os.walk(p.path)).map(PathRef(_)) }

def classFiles = T {
  os.makeDir.all(T.dest)

  os.proc("javac", allSources().map(_.path.toString()), "-d", T.dest)
    .call(cwd = T.dest)
  PathRef(T.dest)
}

def jar = T { Jvm.createJar(Agg(classFiles().path) ++ resourceRoot().map(_.path)) }

def run(mainClsName: String) = T.command {
  os.proc("java", "-cp", classFiles().path, mainClsName).call()
}

Here, we have two T.sources, sourceRoot and resourceRoot, which act as the roots of our task graph. allSources depends on sourceRoot by calling sourceRoot() to extract its value, classFiles depends on allSources the same way, and jar depends on both classFiles and resourceRoot.

Filesystem operations in Mill are done using the os-lib library.

The above build defines the following task graph:

Task graph for jar

sourceRoot -> allSources -> classFiles
                               |
                               v
          resourceRoot ---->  jar

When you first evaluate jar (e.g. via mill jar at the command line), it will evaluate all the defined targets: sourceRoot, allSources, classFiles, resourceRoot and jar.

Subsequent invocations of mill jar will evaluate only as much as is necessary, depending on what input sources changed:

If the files in sourceRoot change, it will re-evaluate allSources, compiling to classFiles, and building the jar
If the files in resourceRoot change, it will only re-evaluate jar and use the cached output of allSources and classFiles

Primary Tasks

There are three primary kinds of Tasks that you should care about:

Targets, defined using T {…}
Sources, defined using T.sources {…}
Commands, defined using T.command {…}

Targets

def allSources = T { os.walk(sourceRoot().path).map(PathRef(_)) }

Targets are defined using the def foo = T {…} syntax, and dependencies on other targets are defined using foo() to extract the value from them. Apart from the foo() calls, the T {…} block contains arbitrary code that does some work and returns a result.

Each target, e.g. classFiles, is assigned a path on disk as scratch space & to store its output files at out/classFiles/dest/, and its returned metadata is automatically JSON-serialized and stored at out/classFiles/meta.json. The return-value of targets has to be JSON-serializable via uPickle.

In case you want return your own case class (e.g. MyCaseClass), you can make it JSON-serializable by adding the following implicit def to its companion object:

object MyCaseClass {
  implicit def rw: upickle.default.ReadWriter[MyCaseClass] = upickle.default.macroRW
}

If you want to return a file or a set of files as the result of a Target, write them to disk within your T.dest available through the Task_Context_API.adoc#_mill_api_ctx_dest and return a PathRef(T.dest) that hashes the files you wrote.

If a target’s inputs change but its output does not, e.g. someone changes a comment within the source files that doesn’t affect the classfiles, then downstream targets do not re-evaluate. This is determined using the .hashCode of the Target’s return value. For targets returning os.Paths that reference files on disk, you can wrap the Path in a PathRef (shown above) whose .hashCode() will include the hashes of all files on disk at time of creation.

The graph of inter-dependent targets is evaluated in topological order; that means that the body of a target will not even begin to evaluate if one of its upstream dependencies has failed. This is unlike normal Scala functions: a plain old function foo would evaluate halfway and then blow up if one of foo's dependencies throws an exception.

Targets cannot take parameters and must be 0-argument defs defined directly within a Module body.

Sources

def sourceRootPath = os.pwd / "src"

def sourceRoots = T.sources { sourceRootPath }

Sources are defined using T.sources {…}, taking one-or-more os.Paths as arguments. A Source is a subclass of Target[Seq[PathRef]]: this means that its build signature/hashCode depends not just on the path it refers to (e.g. foo/bar/baz) but also the MD5 hash of the filesystem tree under that path.

T.sources also has an overload which takes Seq[PathRef], to let you override-and-extend source lists the same way you would any other T {…} definition:

def additionalSources = T.sources { os.pwd / "additionalSources" }
override def sourceRoots = T.sources { super.sourceRoots() ++ additionalSources() }

Commands

def run(mainClsName: String) = T.command {
  os.proc("java", "-cp", classFiles().path, mainClsName).call()
}

Defined using T.command {…} syntax, Commands can run arbitrary code, with dependencies declared using the same foo() syntax (e.g. classFiles() above). Commands can be parametrized, but their output is not cached, so they will re-evaluate every time even if none of their inputs have changed. A command with no parameter is defined as def myCommand() = T.command {…}. It is a compile error if () is missing.

Like Targets, a command only evaluates after all its upstream dependencies have completed, and will not begin to run if any upstream dependency has failed.

Commands are assigned the same scratch/output folder out/run/dest/ as Targets are, and its returned metadata stored at the same out/run/meta.json path for consumption by external tools.

Commands can only be defined directly within a Module body.

Other Tasks

Anonymous Tasks, defined using T.task {…}
Persistent Targets
Inputs
Workers

Anonymous Tasks

def foo(x: Int) = T.task { ... x ... bar() ... }

You can define anonymous tasks using the T.task {…} syntax. These are not runnable from the command-line, but can be used to share common code you find yourself repeating in Targets and Commands.

def downstreamTarget = T { ... foo(42)() ... }
def downstreamCommand(x: Int) = T.command { ... foo(x)() ... }

Anonymous task’s output does not need to be JSON-serializable, their output is not cached, and they can be defined with or without arguments. Unlike Targets or Commands, anonymous tasks can be defined anywhere and passed around any way you want, until you finally make use of them within a downstream target or command.

While an anonymous task foo's own output is not cached, if it is used in a downstream target baz and the upstream target bar hasn’t changed, baz's cached output will be used and foo's evaluation will be skipped altogether.

Persistent Targets

def foo = T.persistent { ... }

Identical to Targets, except that the dest/ folder is not cleared in between runs.

This is useful if you are running external incremental-compilers, such as Scala’s Zinc, Javascript’s WebPack, which rely on filesystem caches to speed up incremental execution of their particular build step.

Since Mill no longer forces a "clean slate" re-evaluation of T.persistent targets, it is up to you to ensure your code (or the third-party incremental compilers you rely on!) are deterministic. They should always converge to the same outputs for a given set of inputs, regardless of what builds and what filesystem states existed before.

Inputs

def foo = T.input { ... }

A generalization of Sources, T.inputs are tasks that re-evaluate every time (unlike Anonymous Tasks), containing an arbitrary block of code.

Inputs can be used to force re-evaluation of some external property that may affect your build. For example, if I have a Target bar that makes use of the current git version:

def bar = T { ... os.proc("git", "rev-parse", "HEAD").call().out.text() ... }

bar will not know that git rev-parse can change, and will not know to re-evaluate when your git rev-parse HEAD does change. This means bar will continue to use any previously cached value, and bar's output will be out of date!

To fix this, you can wrap your git rev-parse HEAD in a T.input:

def foo = T.input { os.proc("git", "rev-parse", "HEAD").call().out.text() }
def bar = T { ... foo() ... }

This makes foo to always re-evaluate every build; if git rev-parse HEAD does not change, that will not invalidate bar's caches. But if git rev-parse HEAD does change, foo's output will change and bar will be correctly invalidated and re-compute using the new version of foo.

Note that because T.inputs re-evaluate every time, you should ensure that the code you put in T.input runs quickly. Ideally it should just be a simple check "did anything change?" and any heavy-lifting should be delegated to downstream targets.

Workers

def foo = T.worker { ... }

Most tasks dispose of their in-memory return-value every evaluation; in the case of Targets, this is stored on disk and loaded next time if necessary, while Commands just re-compute them each time. Even if you use --watch or the Build REPL to keep the Mill process running, all this state is still discarded and re-built every evaluation.

Workers are unique in that they store their in-memory return-value between evaluations. This makes them useful for storing in-memory caches or references to long-lived external worker processes that you can re-use.

Mill uses workers to manage long-lived instances of the Zinc Incremental Scala Compiler and the Scala.js Optimizer. This lets us keep them in-memory with warm caches and fast incremental execution.

Like Persistent Targets, Workers inherently involve mutable state, and it is up to the implementation to ensure that this mutable state is only used for caching/performance and does not affect the externally-visible behavior of the worker.

Task Cheat Sheet

The following table might help you make sense of the small collection of different Task types:

Target

Command

Source/Input

Anonymous Task

Persistent Target

Worker

Cached to Disk

Must be JSON Writable

Must be JSON Readable

Runnable from the Command Line

Can Take Arguments

Cached between Evaluations