Tasks
Task Graphs
One of Mill’s core abstractions is its Task Graph: this is how Mill defines, orders and caches work it needs to do, and exists independently of any support for building Scala.
The following is a simple self-contained example using Mill to compile Java:
import mill._, mill.modules.Jvm
// sourceRoot -> allSources -> classFiles
// |
// v
// resourceRoot ----> jar
def sourceRoot = T.sources { os.pwd / "src" }
def resourceRoot = T.sources { os.pwd / "resources" }
def allSources = T { sourceRoot().flatMap(p => os.walk(p.path)).map(PathRef(_)) }
def classFiles = T {
os.makeDir.all(T.dest)
os.proc("javac", allSources().map(_.path.toString()), "-d", T.dest)
.call(cwd = T.dest)
PathRef(T.dest)
}
def jar = T { Jvm.createJar(Agg(classFiles().path) ++ resourceRoot().map(_.path)) }
def run(mainClsName: String) = T.command {
os.proc("java", "-cp", classFiles().path, mainClsName).call()
}
Here, we have two T.sources
, sourceRoot
and resourceRoot
, which act as the
roots of our task graph. allSources
depends on sourceRoot
by calling
sourceRoot()
to extract its value, classFiles
depends on allSources
the
same way, and jar
depends on both classFiles
and resourceRoot
.
Filesystem operations in Mill are done using the os-lib library.
The above build defines the following task graph:
jar
sourceRoot -> allSources -> classFiles | v resourceRoot ----> jar
When you first evaluate jar
(e.g. via mill jar
at the command line), it will
evaluate all the defined targets: sourceRoot
, allSources
, classFiles
,
resourceRoot
and jar
.
Subsequent invocations of mill jar
will evaluate only as much as is necessary, depending on what input sources changed:
-
If the files in
sourceRoot
change, it will re-evaluateallSources
, compiling toclassFiles
, and building thejar
-
If the files in
resourceRoot
change, it will only re-evaluatejar
and use the cached output ofallSources
andclassFiles
Primary Tasks
There are three primary kinds of Tasks that you should care about:
Targets
def allSources = T { os.walk(sourceRoot().path).map(PathRef(_)) }
Target
s are defined using the def foo = T {…}
syntax, and dependencies on
other targets are defined using foo()
to extract the value from them.
Apart from the foo()
calls, the T {…}
block contains arbitrary code that does some work and returns a result.
Each target, e.g. classFiles
, is assigned a path on disk as scratch space & to
store its output files at out/classFiles/dest/
, and its returned metadata is
automatically JSON-serialized and stored at out/classFiles/meta.json
.
The return-value of targets has to be JSON-serializable via
uPickle.
In case you want return your own
case class (e.g. MyCaseClass
), you can make it JSON-serializable by adding the
following implicit def to its companion object:
object MyCaseClass {
implicit def rw: upickle.default.ReadWriter[MyCaseClass] = upickle.default.macroRW
}
If you want to return a file or a set of files as the result of a Target
,
write them to disk within your T.dest
available through the
Task_Context_API.adoc#_mill_api_ctx_dest and return a PathRef(T.dest)
that hashes the files you wrote.
If a target’s inputs change but its output does not, e.g. someone changes a
comment within the source files that doesn’t affect the classfiles, then
downstream targets do not re-evaluate.
This is determined using the .hashCode
of the Target’s return value.
For targets returning os.Path
s that reference files on disk, you can wrap the Path
in a PathRef
(shown above) whose .hashCode()
will include the hashes of all files on disk at time of creation.
The graph of inter-dependent targets is evaluated in topological order; that
means that the body of a target will not even begin to evaluate if one of its
upstream dependencies has failed.
This is unlike normal Scala functions: a plain old function foo
would evaluate halfway and then blow up if one of foo
's dependencies throws an exception.
Targets cannot take parameters and must be 0-argument def
s defined directly
within a Module
body.
Sources
def sourceRootPath = os.pwd / "src"
def sourceRoots = T.sources { sourceRootPath }
Source
s are defined using T.sources {…}
, taking one-or-more
os.Path
s as arguments.
A Source
is a subclass of Target[Seq[PathRef]]
: this means that its build signature/hashCode
depends not just on the path it refers to (e.g. foo/bar/baz
) but also the MD5 hash of the filesystem tree under that path.
T.sources
also has an overload which takes Seq[PathRef]
, to let you
override-and-extend source lists the same way you would any other T {…}
definition:
def additionalSources = T.sources { os.pwd / "additionalSources" }
override def sourceRoots = T.sources { super.sourceRoots() ++ additionalSources() }
Commands
def run(mainClsName: String) = T.command {
os.proc("java", "-cp", classFiles().path, mainClsName).call()
}
Defined using T.command {…}
syntax, Command
s can run arbitrary code, with
dependencies declared using the same foo()
syntax (e.g. classFiles()
above).
Commands can be parametrized, but their output is not cached, so they will
re-evaluate every time even if none of their inputs have changed.
A command with no parameter is defined as def myCommand() = T.command {…}
.
It is a compile error if ()
is missing.
Like Targets, a command only evaluates after all its upstream dependencies have completed, and will not begin to run if any upstream dependency has failed.
Commands are assigned the same scratch/output folder out/run/dest/
as
Targets are, and its returned metadata stored at the same out/run/meta.json
path for consumption by external tools.
Commands can only be defined directly within a Module
body.
Other Tasks
-
Anonymous Tasks, defined using
T.task {…}
Anonymous Tasks
def foo(x: Int) = T.task { ... x ... bar() ... }
You can define anonymous tasks using the T.task {…}
syntax.
These are not runnable from the command-line, but can be used to share common code you find yourself repeating in Target
s and Command
s.
def downstreamTarget = T { ... foo(42)() ... }
def downstreamCommand(x: Int) = T.command { ... foo(x)() ... }
Anonymous task’s output does not need to be JSON-serializable, their output is not cached, and they can be defined with or without arguments. Unlike Targets or Commands, anonymous tasks can be defined anywhere and passed around any way you want, until you finally make use of them within a downstream target or command.
While an anonymous task foo
's own output is not cached, if it is used in a
downstream target baz
and the upstream target bar
hasn’t changed,
baz
's cached output will be used and foo
's evaluation will be skipped
altogether.
Persistent Targets
def foo = T.persistent { ... }
Identical to Targets, except that the dest/
folder is not cleared in between runs.
This is useful if you are running external incremental-compilers, such as Scala’s Zinc, Javascript’s WebPack, which rely on filesystem caches to speed up incremental execution of their particular build step.
Since Mill no longer forces a "clean slate" re-evaluation of T.persistent
targets, it is up to you to ensure your code (or the third-party incremental
compilers you rely on!) are deterministic. They should always converge to the
same outputs for a given set of inputs, regardless of what builds and what
filesystem states existed before.
Inputs
def foo = T.input { ... }
A generalization of Sources, T.input
s are tasks that re-evaluate
every time (unlike Anonymous Tasks), containing an
arbitrary block of code.
Inputs can be used to force re-evaluation of some external property that may
affect your build. For example, if I have a Target bar
that makes
use of the current git version:
def bar = T { ... os.proc("git", "rev-parse", "HEAD").call().out.text() ... }
bar
will not know that git rev-parse
can change, and will
not know to re-evaluate when your git rev-parse HEAD
does change. This means
bar
will continue to use any previously cached value, and bar
's output will
be out of date!
To fix this, you can wrap your git rev-parse HEAD
in a T.input
:
def foo = T.input { os.proc("git", "rev-parse", "HEAD").call().out.text() }
def bar = T { ... foo() ... }
This makes foo
to always re-evaluate every build; if git rev-parse HEAD
does not change, that will not invalidate bar
's caches.
But if git rev-parse HEAD
does change, foo
's output will change and bar
will be correctly invalidated and re-compute using the new version of foo
.
Note that because T.input
s re-evaluate every time, you should ensure that the
code you put in T.input
runs quickly. Ideally it should just be a simple check
"did anything change?" and any heavy-lifting should be delegated to downstream
targets.
Workers
def foo = T.worker { ... }
Most tasks dispose of their in-memory return-value every evaluation; in the case
of Targets, this is stored on disk and loaded next time if
necessary, while Commands just re-compute them each time.
Even if you use --watch
or the Build REPL to keep the Mill process running, all this state is still discarded and re-built every evaluation.
Workers are unique in that they store their in-memory return-value between evaluations. This makes them useful for storing in-memory caches or references to long-lived external worker processes that you can re-use.
Mill uses workers to manage long-lived instances of the Zinc Incremental Scala Compiler and the Scala.js Optimizer. This lets us keep them in-memory with warm caches and fast incremental execution.
Like Persistent Targets, Workers inherently involve mutable state, and it is up to the implementation to ensure that this mutable state is only used for caching/performance and does not affect the externally-visible behavior of the worker.
Task Cheat Sheet
The following table might help you make sense of the small collection of different Task types:
Target | Command | Source/Input | Anonymous Task | Persistent Target | Worker | |
---|---|---|---|---|---|---|
Cached to Disk |
X |
X |
X |
|||
Must be JSON Writable |
X |
X |
X |
|||
Must be JSON Readable |
X |
X |
||||
Runnable from the Command Line |
X |
X |
X |
|||
Can Take Arguments |
X |
X |
||||
Cached between Evaluations |
X |