Tasks
One of Mill’s core abstractions is its Task Graph: this is how Mill defines, orders and caches work it needs to do, and exists independently of any support for building Scala.
Mill target graphs are primarily built using methods and macros defined on
mill.define.Target
, aliased as T
for conciseness:
Task Cheat Sheet
The following table might help you make sense of the small collection of different Task types:
Target | Command | Source/Input | Anonymous Task | Persistent Target | Worker | |
---|---|---|---|---|---|---|
Cached to Disk |
X |
X |
||||
JSON Writable |
X |
X |
X |
X |
||
JSON Readable |
X |
X |
||||
CLI Runnable |
X |
X |
X |
|||
Takes Arguments |
X |
X |
||||
Cached In-Memory |
X |
The following is a simple self-contained example using Mill to compile Java:
package build
import mill._
def mainClass: T[Option[String]] = Some("foo.Foo")
def sources = Task.Source(millSourcePath / "src")
def resources = Task.Source(millSourcePath / "resources")
def compile = Task {
val allSources = os.walk(sources().path)
os.proc("javac", allSources, "-d", Task.dest).call()
PathRef(Task.dest)
}
def assembly = Task {
for(p <- Seq(compile(), resources())) os.copy(p.path, Task.dest, mergeFolders = true)
val mainFlags = mainClass().toSeq.flatMap(Seq("-e", _))
os.proc("jar", "-c", mainFlags, "-f", Task.dest / s"assembly.jar", ".")
.call(cwd = Task.dest)
PathRef(Task.dest / s"assembly.jar")
}
This code defines the following task graph, with the boxes being the tasks and the arrows representing the data-flow between them:
This example does not use any of Mill’s builtin support for building Java or
Scala projects, and instead builds a pipeline "from scratch" using Mill
tasks and javac
/jar
/java
subprocesses. We define Task.Source
folders,
plain T{…}
targets that depend on them, and a Task.Command
.
> ./mill show assembly
".../out/assembly.dest/assembly.jar"
> java -jar out/assembly.dest/assembly.jar i am cow
Foo.value: 31337
args: i am cow
> unzip -p out/assembly.dest/assembly.jar foo.txt
My Example Text
When you first evaluate assembly
(e.g. via mill assembly
at the command
line), it will evaluate all the defined targets: mainClass
, sources
,
compile
, and assembly
.
Subsequent invocations of mill assembly
will evaluate only as much as is
necessary, depending on what input sources changed:
-
If the files in
sources
change, it will re-evaluatecompile
, andassembly
(red)
-
If the files in
resources
change, it will only re-evaluateassembly
(red) and use the cached output ofcompile
(green)
Primary Tasks
There are three primary kinds of Tasks that you should care about:
Sources
package build
import mill.{Module, T, _}
def sources = Task.Source { millSourcePath / "src" }
def resources = Task.Source { millSourcePath / "resources" }
Source
s are defined using Task.Source{…}
taking one os.Path
, or Task.Sources{…}
,
taking multiple os.Path
s as arguments. A Source
's':
its build signature/hashCode
depends not just on the path
it refers to (e.g. foo/bar/baz
) but also the MD5 hash of the filesystem
tree under that path.
Task.Source
and Task.Sources
are most common inputs in any Mill build:
they watch source files and folders and cause downstream targets to
re-compute if a change is detected.
Targets
def allSources = Task {
os.walk(sources().path)
.filter(_.ext == "java")
.map(PathRef(_))
}
def lineCount: T[Int] = Task {
println("Computing line count")
allSources()
.map(p => os.read.lines(p.path).size)
.sum
}
Target
s are defined using the def foo = Task {…}
syntax, and dependencies
on other targets are defined using foo()
to extract the value from them.
Apart from the foo()
calls, the Task {…}
block contains arbitrary code that
does some work and returns a result.
The os.walk
and os.read.lines
statements above are from the
OS-Lib library, which provides all common
filesystem and subprocess operations for Mill builds. You can see the OS-Lib library
documentation for more details:
If a target’s inputs change but its output does not, e.g. someone changes a
comment within the source files that doesn’t affect the classfiles, then
downstream targets do not re-evaluate. This is determined using the
.hashCode
of the Target’s return value.
> ./mill show lineCount
Computing line count
16
> ./mill show lineCount # line count already cached, doesn't need to be computed
16
Furthermore, when code changes occur, targets only invalidate if the code change
may directly or indirectly affect it. e.g. adding a comment to lineCount
will
not cause it to recompute:
def lineCount: T[Int] = Task {
println("Computing line count")
+ // Hello World
allSources()
.map(p => os.read.lines(p.path).size)
.sum
But changing the code of the target or any upstream helper method will cause the
old value to be invalidated and a new value re-computed (with a new println
)
next time it is invoked:
def lineCount: T[Int] = Task {
- println("Computing line count")
+ println("Computing line count!!!")
allSources()
.map(p => os.read.lines(p.path).size)
.sum
For more information on how the bytecode analysis necessary for invalidating targets based on code-changes work, see PR#2417 that implemented it.
The return-value of targets has to be JSON-serializable via
uPickle. You can run targets directly from the command
line, or use show
if you want to see the JSON content or pipe it to
external tools. See the uPickle library documentation for more details:
Task.dest
Each target, e.g. classFiles
, is assigned a Task.dest
folder e.g. out/classFiles.dest/
on disk as scratch space & to store its
output files , and its returned metadata is automatically JSON-serialized
and stored at out/classFiles.json
. If you want to return a file or a set
of files as the result of a Target
, write them to disk within your Task.dest
folder and return a PathRef()
that referencing the files or folders
you want to return:
def classFiles = Task {
println("Generating classfiles")
os.proc("javac", allSources().map(_.path), "-d", Task.dest)
.call(cwd = Task.dest)
PathRef(Task.dest)
}
def jar = Task {
println("Generating jar")
os.copy(classFiles().path, Task.dest, mergeFolders = true)
os.copy(resources().path, Task.dest, mergeFolders = true)
os.proc("jar", "-cfe", Task.dest / "foo.jar", "foo.Foo", ".").call(cwd = Task.dest)
PathRef(Task.dest / "foo.jar")
}
> ./mill jar
Generating classfiles
Generating jar
> ./mill show jar
".../out/jar.dest/foo.jar"
Note that os.pwd
of the Mill process is set to an empty sandbox/
folder by default.
This is to stop you from accidentally reading and writing files to the base repository root,
which would cause problems with Mill’s caches not invalidating properly or files from different
tasks colliding and causing issues.
You should never use os.pwd
or rely on the process working directory, and always explicitly
use Task.dest
or the .path
of upstream PathRef
s when accessing files. In the rare case where
you truly need the Mill project root folder, you can access it via Task.workspace
Dependent Targets
Targets can depend on other targets via the foo()
syntax.
The graph of inter-dependent targets is evaluated in topological order; that
means that the body of a target will not even begin to evaluate if one of its
upstream dependencies has failed. Similar, even if the upstream targets is
not used in one branch of an if
condition, it will get computed regardless
before the if
condition is even considered.
The following example demonstrates this behavior, with the println
defined
in def largeFile
running even though the largeFile()
branch of the
if
conditional does not get used:
def largeFile = Task {
println("Finding Largest File")
allSources()
.map(_.path)
.filter(_.ext == "java")
.maxBy(os.read.lines(_).size)
}
def hugeFileName = T{
if (lineCount() > 999) largeFile().last
else "<no-huge-file>"
}
> ./mill show lineCount
16
> ./mill show hugeFileName # This still runs `largestFile` even though `lineCount() < 999`
Finding Largest File
"<no-huge-file>"
Custom Types
uPickle comes with built-in support for most Scala primitive types and
builtin data structures: tuples, collections, PathRef
s, etc. can be
returned and automatically serialized/de-serialized as necessary. One
notable exception is case class
es: if you want return your own
case class
, you must mark it JSON-serializable by adding the following
implicit
to its companion object:
case class ClassFileData(totalFileSize: Long, largestFile: String)
object ClassFileData {
implicit val rw: upickle.default.ReadWriter[ClassFileData] = upickle.default.macroRW
}
def summarizeClassFileStats = T{
val files = os.walk(classFiles().path)
ClassFileData(
totalFileSize = files.map(os.size(_)).sum,
largestFile = files.maxBy(os.size(_)).last
)
}
> ./mill show summarizeClassFileStats
{
"totalFileSize": ...,
"largestFile": "..."
}
Commands
def run(mainClass: String, args: String*) = Task.Command {
os.proc(
"java",
"-cp", s"${classFiles().path}:${resources().path}",
mainClass,
args
)
.call(stdout = os.Inherit)
}
Defined using Task.Command {…}
syntax, Command
s can run arbitrary code, with
dependencies declared using the same foo()
syntax (e.g. classFiles()
above).
Commands can be parametrized, but their output is not cached, so they will
re-evaluate every time even if none of their inputs have changed.
A command with no parameter is defined as def myCommand() = Task.Command {…}
.
It is a compile error if ()
is missing.
Targets can take command line params, parsed by the MainArgs
library. Thus the signature def run(mainClass: String, args: String*)
takes
params of the form --main-class <str> <arg1> <arg2> … <argn>
:
> ./mill run --main-class foo.Foo hello world
Foo.value: 31337
args: hello world
foo.txt resource: My Example Text
Command line arguments can take most primitive types: String
, Int
, Boolean
, etc.,
along with Option[T]
representing optional values and Seq[T]
representing repeatable values,
and mainargs.Flag
representing flags and mainargs.Leftover[T]
representing any command line
arguments not parsed earlier. Default values for command line arguments are also supported.
See the mainargs documentation for more details:
-
[MainArgs Library Documentation](MainArgs)
By default, all command parameters need to be named, except for variadic parameters
of type T*
or mainargs.Leftover[T]
. You can use the flag --allow-positional-command-args
to allow arbitrary arguments to be passed positionally, as shown below:
> ./mill run foo.Foo hello world # this raises an error because `--main-class` is not given
error: Missing argument: --mainClass <str>
Expected Signature: run
--mainClass <str>
args <str>...
...
> ./mill --allow-positional run foo.Foo hello world # this succeeds due to --allow-positional
Foo.value: 31337
args: hello world
foo.txt resource: My Example Text
Like Targets, a command only evaluates after all its upstream dependencies have completed, and will not begin to run if any upstream dependency has failed.
Commands are assigned the same scratch/output folder out/run.dest/
as
Targets are, and its returned metadata stored at the same out/run.json
path for consumption by external tools.
Commands can only be defined directly within a Module
body.
Overrides
Tasks can be overriden, with the overriden task callable via super
.
You can also override a task with a different type of task, e.g. below
we override sourceRoots
which is a Task.Sources
with a T{}
target:
trait Foo extends Module {
def sourceRoots = Task.Sources(millSourcePath / "src")
def sourceContents = T{
sourceRoots()
.flatMap(pref => os.walk(pref.path))
.filter(_.ext == "txt")
.sorted
.map(os.read(_))
}
}
trait Bar extends Foo {
def additionalSources = Task.Sources(millSourcePath / "src2")
def sourceRoots = Task { super.sourceRoots() ++ additionalSources() }
}
object bar extends Bar
> ./mill show bar.sourceContents # includes both source folders
[
"File Data From src/",
"File Data From src2/"
]
Other Tasks
Anonymous Tasks
package build
import mill._, define.Task
def data = Task.Source(millSourcePath / "data")
def anonTask(fileName: String): Task[String] = Task.Anon {
os.read(data().path / fileName)
}
def helloFileData = Task { anonTask("hello.txt")() }
def printFileData(fileName: String) = Task.Command {
println(anonTask(fileName)())
}
You can define anonymous tasks using the Task.Anon {…}
syntax. These are
not runnable from the command-line, but can be used to share common code you
find yourself repeating in Target
s and Command
s.
Anonymous task’s output does not need to be JSON-serializable, their output is not cached, and they can be defined with or without arguments. Unlike Targets or Commands, anonymous tasks can be defined anywhere and passed around any way you want, until you finally make use of them within a downstream target or command.
While an anonymous task foo
's own output is not cached, if it is used in a
downstream target baz
and the upstream target bar
hasn’t changed,
baz
's cached output will be used and foo
's evaluation will be skipped
altogether.
> ./mill show helloFileData
"Hello"
> ./mill printFileData --file-name hello.txt
Hello
> ./mill printFileData --file-name world.txt
World!
Inputs
package build
import mill._
def myInput = Task.Input {
os.proc("git", "rev-parse", "HEAD").call(cwd = Task.workspace)
.out
.text()
.trim()
}
A generalization of Sources, Task.Input
s are tasks that re-evaluate
every time (unlike Anonymous Tasks), containing an
arbitrary block of code.
Inputs can be used to force re-evaluation of some external property that may
affect your build. For example, if I have a Target bar
that
calls out to git
to compute the latest commit hash and message directly,
that target does not have any Task
inputs and so will never re-compute
even if the external git
status changes:
def gitStatusTarget = Task {
"v-" +
os.proc("git", "log", "-1", "--pretty=format:%h-%B ")
.call(cwd = Task.workspace)
.out
.text()
.trim()
}
> git init .
> git commit --allow-empty -m "Initial-Commit"
> ./mill show gitStatusTarget
"v-...-Initial-Commit"
> git commit --allow-empty -m "Second-Commit"
> ./mill show gitStatusTarget # Mill didn't pick up the git change!
"v-...-Initial-Commit"
gitStatusTarget
will not know that git rev-parse
can change, and will
not know to re-evaluate when your git log
does change. This means
gitStatusTarget
will continue to use any previously cached value, and
gitStatusTarget
's output will be out of date!
To fix this, you can wrap your git log
in a Task.Input
:
def gitStatusInput = Task.Input {
os.proc("git", "log", "-1", "--pretty=format:%h-%B ")
.call(cwd = Task.workspace)
.out
.text()
.trim()
}
def gitStatusTarget2 = Task { "v-" + gitStatusInput() }
This makes gitStatusInput
to always re-evaluate every build, and only if
the output of gitStatusInput
changes will gitStatusTarget2
re-compute
> git commit --allow-empty -m "Initial-Commit"
> ./mill show gitStatusTarget2
"v-...-Initial-Commit"
> git commit --allow-empty -m "Second-Commit"
> ./mill show gitStatusTarget2 # Mill picked up git change
"v-...-Second-Commit"
Note that because Task.Input
s re-evaluate every time, you should ensure that the
code you put in Task.Input
runs quickly. Ideally it should just be a simple check
"did anything change?" and any heavy-lifting should be delegated to downstream
targets where it can be cached if possible.
Persistent Targets
Persistent targets defined using Task.Persistent
are similar to normal
Target
s, except their Task.dest
folder is not cleared before every
evaluation. This makes them useful for caching things on disk in a more
fine-grained manner than Mill’s own Target-level caching.
Below is a semi-realistic example of using a Task.Persistent
target:
package build
import mill._, scalalib._
import java.util.Arrays
import java.io.ByteArrayOutputStream
import java.util.zip.GZIPOutputStream
def data = Task.Source(millSourcePath / "data")
def compressedData = Task.Persistent{
println("Evaluating compressedData")
os.makeDir.all(Task.dest / "cache")
os.remove.all(Task.dest / "compressed")
for(p <- os.list(data().path)){
val compressedPath = Task.dest / "compressed" / s"${p.last}.gz"
val bytes = os.read.bytes(p)
val hash = Arrays.hashCode(bytes)
val cachedPath = Task.dest / "cache" / hash.toHexString
if (!os.exists(cachedPath)) {
println("Compressing: " + p.last)
os.write(cachedPath, compressBytes(bytes))
} else {
println("Reading Cached from disk: " + p.last)
}
os.copy(cachedPath, compressedPath, createFolders = true)
}
os.list(Task.dest / "compressed").map(PathRef(_))
}
def compressBytes(input: Array[Byte]) = {
val bos = new ByteArrayOutputStream(input.length)
val gzip = new GZIPOutputStream(bos)
gzip.write(input)
gzip.close()
bos.toByteArray
}
In this example, we implement a compressedData
target that takes a folder
of files in inputData
and compresses them, while maintaining a cache of
compressed contents for each file. That means that if the inputData
folder
is modified, but some files remain unchanged, those files would not be
unnecessarily re-compressed when compressedData
evaluates.
Since persistent targets have long-lived state on disk that lives beyond a
single evaluation, this raises the possibility of the disk contents getting
into a bad state and causing all future evaluations to fail. It is left up
to the person implementing the Task.Persistent
to ensure their implementation
is eventually consistent. You can also use mill clean
to manually purge
the disk contents to start fresh.
> ./mill show compressedData
Evaluating compressedData
Compressing: hello.txt
Compressing: world.txt
[
".../hello.txt.gz",
".../world.txt.gz"
]
> ./mill compressedData # when no input changes, compressedData does not evaluate at all
> sed -i.bak 's/Hello/HELLO/g' data/hello.txt
> ./mill compressedData # when one input file changes, only that file is re-compressed
Compressing: hello.txt
Reading Cached from disk: world.txt
> ./mill clean compressedData
> ./mill compressedData
Evaluating compressedData
Compressing: hello.txt
Compressing: world.txt
Workers
Mill workers defined using Task.Worker
are long-lived in-memory objects that
can persistent across multiple evaluations. These are similar to persistent
targets in that they let you cache things, but the fact that they let you
cache the worker object in-memory allows for greater performance and
flexibility: you are no longer limited to caching only serializable data
and paying the cost of serializing it to disk every evaluation. This example
uses a Worker to provide simple in-memory caching for compressed files.
package build
import mill._, scalalib._
import java.util.Arrays
import java.io.ByteArrayOutputStream
import java.util.zip.GZIPOutputStream
def data = Task.Source(millSourcePath / "data")
def compressWorker = Task.Worker{ new CompressWorker(Task.dest) }
def compressedData = T{
println("Evaluating compressedData")
for(p <- os.list(data().path)){
os.write(
Task.dest / s"${p.last}.gz",
compressWorker().compress(p.last, os.read.bytes(p))
)
}
os.list(Task.dest).map(PathRef(_))
}
class CompressWorker(dest: os.Path){
val cache = collection.mutable.Map.empty[Int, Array[Byte]]
def compress(name: String, bytes: Array[Byte]): Array[Byte] = {
val hash = Arrays.hashCode(bytes)
if (!cache.contains(hash)) {
val cachedPath = dest / hash.toHexString
if (!os.exists(cachedPath)) {
println("Compressing: " + name)
cache(hash) = compressBytes(bytes)
os.write(cachedPath, cache(hash))
}else{
println("Cached from disk: " + name)
cache(hash) = os.read.bytes(cachedPath)
}
}else {
println("Cached from memory: " + name)
}
cache(hash)
}
}
def compressBytes(input: Array[Byte]) = {
val bos = new ByteArrayOutputStream(input.length)
val gzip = new GZIPOutputStream(bos)
gzip.write(input)
gzip.close()
bos.toByteArray
}
Common things to put in workers include:
-
References to third-party daemon processes, e.g. Webpack or wkhtmltopdf, which perform their own in-memory caching
-
Classloaders containing plugin code, to avoid classpath conflicts while also avoiding classloading cost every time the code is executed
Workers live as long as the Mill process. By default, consecutive mill
commands in the same folder will re-use the same Mill process and workers,
unless --no-server
is passed which will terminate the Mill process and
workers after every command. Commands run repeatedly using --watch
will
also preserve the workers between them.
Workers can also make use of their Task.dest
folder as a cache that persist
when the worker shuts down, as a second layer of caching. The example usage
below demonstrates how using the --no-server
flag will make the worker
read from its disk cache, where it would have normally read from its
in-memory cache
> ./mill show compressedData
Evaluating compressedData
Compressing: hello.txt
Compressing: world.txt
[
".../hello.txt.gz",
"...world.txt.gz"
]
> ./mill compressedData # when no input changes, compressedData does not evaluate at all
> sed -i.bak 's/Hello/HELLO/g' data/hello.txt
> ./mill compressedData # not --no-server, we read the data from memory
Compressing: hello.txt
Cached from memory: world.txt
> ./mill compressedData # --no-server, we read the data from disk
Compressing: hello.txt
Cached from disk: world.txt
Mill uses workers to manage long-lived instances of the Zinc Incremental Scala Compiler and the Scala.js Optimizer. This lets us keep them in-memory with warm caches and fast incremental execution.
Autoclosable
Workers
As Workers may also hold limited resources, it may be necessary to free up these resources once a worker is no longer needed. This is especially the case, when your worker tasks depends on other tasks and these tasks change, as Mill will then also create a new worker instance.
To implement resource cleanup, your worker can implement java.lang.AutoCloseable
.
Once the worker is no longer needed, Mill will call the close()
method on it before any newer version of this worker is created.
import mill._
import java.lang.AutoCloseable
class MyWorker() extends AutoCloseable {
// ...
override def close() = { /* cleanup and free resources */ }
}
def myWorker = Task.Worker { new MyWorker() }
Using ScalaModule.run as a task
package build
import mill._, scalalib._
import mill.util.Jvm
object foo extends ScalaModule {
def scalaVersion = "2.13.8"
def moduleDeps = Seq(bar)
def ivyDeps = Agg(ivy"com.lihaoyi::mainargs:0.4.0")
def sources = T{
bar.runner().run(args = super.sources())
Seq(PathRef(Task.dest))
}
}
object bar extends ScalaModule{
def scalaVersion = "2.13.8"
def ivyDeps = Agg(ivy"com.lihaoyi::os-lib:0.10.7")
}
This example demonstrates using Mill ScalaModule
s as build tasks: rather
than defining the task logic in the build.mill
, we instead put the build
logic within the bar
module as bar/src/Bar.scala
. In this example, we use
Bar.scala
as a source-code pre-processor on the foo
module source code:
we override foo.sources
, passing the super.sources()
and bar.runClasspath
to bar.runner().run
along with a Task.dest
, and returning a PathRef(Task.dest)
as the new foo.sources
.
> mill foo.run
...
Foo.value: HELLO
This example does a trivial string-replace of "hello" with "HELLO", but is
enough to demonstrate how you can use Mill ScalaModule
s to implement your
own arbitrarily complex transformations. This is useful for build logic that
may not fit nicely inside a build.mill
file, whether due to the sheer lines
of code or due to dependencies that may conflict with the Mill classpath
present in build.mill
bar.runner().run
by default inherits the mainClass
, forkEnv
, forkArgs
,
from the owning module bar
, and the working directory from the calling task’s
Task.dest
. You can also pass in these parameters explicitly to run()
as named
arguments if you wish to override the defaults
trait Runner{
def run(args: os.Shellable,
mainClass: String = null,
forkArgs: Seq[String] = null,
forkEnv: Map[String, String] = null,
workingDir: os.Path = null,
useCpPassingJar: java.lang.Boolean = null)
(implicit ctx: Ctx): Unit
}