The Mill Execution Model
This page does a deep dive on how Mill evaluates your build tasks, so you can better understand what Mill is doing behind the scenes when building your project.
Example Project
For the purposes of this article, we will be using the following example build as the basis for discussion:
// build.mill
package build
import mill._, javalib._
object foo extends JavaModule {}
object bar extends JavaModule {
def moduleDeps = Seq(foo)
/** Total number of lines in module source files */
def lineCount = Task {
allSourceFiles().map(f => os.read.lines(f.path).size).sum
}
/** Generate resources using lineCount of sources */
override def resources = Task {
os.write(Task.dest / "line-count.txt", "" + lineCount())
Seq(PathRef(Task.dest))
}
}
This is a simple two-module build with two JavaModule
s, one that depends on the other.
There is a custom task bar.lineCount
implemented that replaces the default resources/
folder with a generated resource file for use at runtime, as a simple example of a
Custom Build Logic.
This expects the source layout:
foo/
src/
*.java files
package.mill (optional)
bar/
src/
*.java files
package.mill (optional)
build.mill
You can operate on this build via commands such as
> ./mill bar.compile
> ./mill foo.run
> ./mill _.assembly # evaluates both foo.compile and bar.compile
For the purposes of this article, we will consider what happens when you run
./mill _.assembly
on the above example codebase.
Primary Phases
Compilation
Initial .mill
build files:
bar/
package.mill #optional
foo/
package.mill #optional
build.mill
This stage involves compiling your build.mill
and any
subfolder package.mill files into JVM classfiles.
Mill build files are written in Scala, so this is done
using the normal Mill Scala compilation toolchain (mill.scalalib.ScalaModule
), with
some minor pre-processing to turn .mill
files into valid .scala
files.
Compilation of your build is global but incremental: running any ./mill
command
requires that you compile all build.mill
and package.mill
files in your entire
project, which can take some time the first time you run a ./mill
command in a project.
However, once that is done, updates to any .mill
file are re-compiled incrementally,
such that updates can happen relatively quickly even in large projects.
After compilation, the .mill
files are converted into JVM classfiles as shown below:
bar/
package.class
foo/
package.class
build.class
These classfiles are dynamically loaded into the Mill process and instantiated into
a concrete Mill RootModule
object, which is then used in the subsequent tasks below:
Resolution
Resolution converts the Mill task selector _.assembly
the list of
Tasks given from the command line. This explores the build
and package
files generated in the Compilation step above, instantiates the Modules
and Tasks as necessary, and returns a list of the final tasks that
were selected by selector:
Mill starts from the RootModule
instantiated after Compilation, and uses
Java reflection to walk the tree of modules and tasks to find the tasks that match
your given selector.
Task and module resolution is lazy, so only modules that are required by the given
selector _.assembly
are instantiated. This can help keep task resolution fast even
when working within a large codebase by avoiding instantiation of modules that are
unrelated to the selector you are running.
Planning
Planning is the step of turning the tasks selected during Resolution into a full build graph that includes all transitive upstream dependencies. This is done by traversing the graph of task dependencies, and generates a (simplified) task graph as shown below:
In this graph, we can see that even though Resolution only selected foo.assembly
and bar.assembly
, their upstream task graph requires tasks such as foo.compile
,
bar.compile
, as well as our custom task bar.lineCount
and our override of bar.resources
.
Evaluation
The last phase is execution. Execution depends not only on the tasks you selected at the command line, and those discovered during Resolution, but also what input files changed on disk. Tasks that were not affected by input changes may have their value loaded from cache (if already evaluated earlier) or skipped entirely (e.g. due to Selective Execution).
For example, a change to foo/src/*.java
would affect the foo.sources
task, which
would invalidate and cause evaluation of the tasks highlighted in red below:
On the other hand a change to bar/src/*.java
would affect the bar.sources
task, which
would invalidate and cause evaluation of the tasks highlighted in red below:
In the example changing bar/src/*.java
, Mill may also take the opportunity to parallelize
things:
-
bar.compile
andbar.classPath
can on a separate thread frombar.lineCount
andbar.resources
-
bar.assembly
must wait for bothbar.classPath
andbar.resources
to complete before proceeding.
This parallelization is automatically done by Mill, and requires no effort from the user to enable. The exact parallelism may depend on the number of CPU cores available and exactly when each task starts and how long it takes to run, but Mill will generally parallelize things where possible to minimize the time taken to execute your tasks.
Some other things to note:
-
Tasks have their metadata cached to <task>.json files in the out/ folder, with any files created by the task are cached in <task>.dest/ folders. These file paths are all automatically assigned by Mill.
-
Mill treats builtin tasks (e.g.
compile
) and user-defined (e.g.lineCount
) exactly the same. Both get automatically cached or skipped when not needed, and parallelized where possible. This happens without the task author needing to do anything to enable caching or parallelization -
Mill evaluation does not care about the module structure of
foo
andbar
. Mill modules are simply a way to define and re-use parts of the task graph, but it is the task graph that matters during evaluation
Bootstrapping
One part of the Mill evaluation model that is skimmed over above is what happens before
Compilation: how does Mill actually get everything necessary to compile your build.mill
and package.mill
files? This is called bootstrapping, and proceeds roughly in the following phases:
-
Mill’s bootstrap script first checks if the right version of Mill is already present, and if not it downloads the assembly jar to
~/.mill/download
-
Mill instantiates an in-memory
MillBuildRootModule.BootstrapModule
, which is a hard-codedbuild.mill
used for bootstrapping Mill -
If there is a meta-build present
mill-build/build.mill
, Mill processes that first and uses theMillBuildRootModule
returned for the next steps. Otherwise it uses theMillBuildRootModule.BootstrapModule
directly -
Mill evaluates the
MillBuildRootModule
to parse thebuild.mill
, generate a list ofivyDeps
as well as appropriately wrapped Scala code that we can compile, and compiles it to classfiles (Compilation above)
For most users, you do not need to care about the details of the Mill bootstrapping
process, except to know that you only need a JVM installed to begin with and
Mill will download everything necessary from the standard Maven Central package repository
starting from just the bootstrap script (available as ./mill
for Linux/Mac and ./mill.bat
for Windows). The documentation for The Mill Meta Build
goes into more detail of how you can configure and make use of it.
Consequences of the Mill Execution Model
This four-phase evaluation model has consequences for how you structure your build. For example:
-
You can have arbitrary code outside of
Task
s that helps set up your task graph and module hierarchy, e.g. computing what keys exist in aCross
module, or specifying yourdef moduleDeps
. This code runs during Resolution -
You can have arbitrary code inside of
Task
s, to perform your build actions. This code runs during Evaluation -
But your code inside of
Task
s cannot influence the shape of the task graph or module hierarchy, as all Resolution logic happens first before any Evaluation of theTask
s bodies.
This should not be a problem for most builds, but it is something to be aware
of. In general, we have found that having "two places" to put code - outside of
Task
s to run during Planning or inside of Task
s to run during
Evaluation - is generally enough flexibility for most use cases. You
can generally just write "direct style" business logic you need - in the example
above counting the lints in allSourceFiles
- and Mill handles all the caching,
invalidation, and parallelism for you without any additional work.
Caching in Mill
Apart from fine-grained caching of Task
s during Evaluation, Mill also
performs incremental evaluation of the other phases. This helps ensure
the overall workflow remains fast even for large projects:
-
-
Done on-demand and incrementally using the Scala incremental compiler Zinc.
-
If some of the files
build.mill
imported changed but not others, only the changed files are re-compiled before theRootModule
is re-instantiated -
In the common case where
build.mill
was not changed at all, this step is skipped entirely and theRootModule
object simply re-used from the last run.
-
-
-
If the
RootModule
was re-used, then all previously-instantiated modules are simply-re-used -
Any modules that are lazily instantiated during Resolution are also re-used.
-
-
-
Planning is relatively quick most of the time, and is not currently cached.
-
-
-
Task
s are evaluated in dependency order -
Cached Tasks only re-evaluate if their input
Task
s change. -
Persistent Taskss preserve the
Task.dest
folder on disk between runs, allowing for finer-grained caching than Mill’s default task-by-task caching and invalidation -
Workers are kept in-memory between runs where possible, and only invalidated if their input
Task
s change as well. -
Task
s in general are invalidated if the code they depend on changes, at a method-level granularity via callgraph reachability analysis. See #2417 for more details
-
This approach to caching does assume a certain programming style inside your Mill build:
-
Mill may-or-may-not instantiate the modules in your
build.mill
the first time you run something (due to laziness) -
Mill may-or-may-not re-instantiate the modules in your
build.mill
in subsequent runs (due to caching) -
Mill may-or-may-not re-execute any particular task depending on caching, but your code needs to work either way.
-
Execution of any task may-or-may-not happen in parallel with other unrelated tasks, and may happen in arbitrary order
Your build code code needs to work regardless of which order they are executed in.
However, for code written in a typical Scala style (which tends to avoid side effects),
and limits filesystem operations to the Task.dest
folder, this is not a problem at all.
One thing to note is for code that runs during Resolution: any reading of
external mutable state needs to be wrapped in an interp.watchValue{…}
wrapper. This ensures that Mill knows where these external reads are, so that
it can check if their value changed and if so re-instantiate RootModule
with
the new value.