The Mill Execution Model

This page does a deep dive on how Mill evaluates your build tasks, so you can better understand what Mill is doing behind the scenes when building your project.

Example Project

For the purposes of this article, we will be using the following example build as the basis for discussion:

// build.mill
package build
import mill._, javalib._

object foo extends JavaModule {}

object bar extends JavaModule {
  def moduleDeps = Seq(foo)

  /** Total number of lines in module source files */
  def lineCount = Task {
    allSourceFiles().map(f => os.read.lines(f.path).size).sum
  }

  /** Generate resources using lineCount of sources */
  override def resources = Task {
    os.write(Task.dest / "line-count.txt", "" + lineCount())
    Seq(PathRef(Task.dest))
  }
}

This is a simple two-module build with two JavaModules, one that depends on the other. There is a custom task bar.lineCount implemented that replaces the default resources/ folder with a generated resource file for use at runtime, as a simple example of a Custom Build Logic.

This expects the source layout:

foo/
    src/
        *.java files
    package.mill (optional)
bar/
    src/
        *.java files
    package.mill (optional)
build.mill

You can operate on this build via commands such as

> ./mill bar.compile

> ./mill foo.run

> ./mill _.assembly # evaluates both foo.compile and bar.compile

For the purposes of this article, we will consider what happens when you run ./mill _.assembly on the above example codebase.

Primary Phases

Compilation

Initial .mill build files:

bar/
    package.mill #optional
foo/
    package.mill #optional
build.mill

This stage involves compiling your build.mill and any subfolder package.mill files into JVM classfiles. Mill build files are written in Scala, so this is done using the normal Mill Scala compilation toolchain (mill.scalalib.ScalaModule), with some minor pre-processing to turn .mill files into valid .scala files.

Compilation of your build is global but incremental: running any ./mill command requires that you compile all build.mill and package.mill files in your entire project, which can take some time the first time you run a ./mill command in a project. However, once that is done, updates to any .mill file are re-compiled incrementally, such that updates can happen relatively quickly even in large projects.

After compilation, the .mill files are converted into JVM classfiles as shown below:

bar/
    package.class
foo/
    package.class
build.class

These classfiles are dynamically loaded into the Mill process and instantiated into a concrete Mill RootModule object, which is then used in the subsequent tasks below:

Resolution

Resolution converts the Mill task selector _.assembly the list of Tasks given from the command line. This explores the build and package files generated in the Compilation step above, instantiates the Modules and Tasks as necessary, and returns a list of the final tasks that were selected by selector:

G build build foo foo build->foo bar bar build->bar foo.assembly foo.assembly foo->foo.assembly bar.assembly bar.assembly bar->bar.assembly

Mill starts from the RootModule instantiated after Compilation, and uses Java reflection to walk the tree of modules and tasks to find the tasks that match your given selector.

Task and module resolution is lazy, so only modules that are required by the given selector _.assembly are instantiated. This can help keep task resolution fast even when working within a large codebase by avoiding instantiation of modules that are unrelated to the selector you are running.

Planning

Planning is the step of turning the tasks selected during Resolution into a full build graph that includes all transitive upstream dependencies. This is done by traversing the graph of task dependencies, and generates a (simplified) task graph as shown below:

G cluster_0 foo cluster_1 bar foo.sources foo.sources foo.compile foo.compile foo.sources->foo.compile foo.classPath foo.classPath foo.compile->foo.classPath foo.assembly foo.assembly foo.classPath->foo.assembly bar.compile bar.compile foo.classPath->bar.compile bar.classPath bar.classPath foo.classPath->bar.classPath foo.resources foo.resources foo.resources->foo.assembly bar.sources bar.sources bar.sources->bar.compile bar.lineCount bar.lineCount bar.sources->bar.lineCount bar.compile->bar.classPath bar.assembly bar.assembly bar.classPath->bar.assembly bar.resources bar.resources bar.lineCount->bar.resources bar.resources->bar.assembly

In this graph, we can see that even though Resolution only selected foo.assembly and bar.assembly, their upstream task graph requires tasks such as foo.compile, bar.compile, as well as our custom task bar.lineCount and our override of bar.resources.

Evaluation

The last phase is execution. Execution depends not only on the tasks you selected at the command line, and those discovered during Resolution, but also what input files changed on disk. Tasks that were not affected by input changes may have their value loaded from cache (if already evaluated earlier) or skipped entirely (e.g. due to Selective Execution).

For example, a change to foo/src/*.java would affect the foo.sources task, which would invalidate and cause evaluation of the tasks highlighted in red below:

G cluster_0 foo cluster_1 bar foo.sources foo.sources foo.compile foo.compile foo.sources->foo.compile foo.classPath foo.classPath foo.compile->foo.classPath foo.assembly foo.assembly foo.classPath->foo.assembly bar.compile bar.compile foo.classPath->bar.compile bar.classPath bar.classPath foo.classPath->bar.classPath foo.resources foo.resources foo.resources->foo.assembly bar.sources bar.sources bar.sources->bar.compile bar.lineCount bar.lineCount bar.sources->bar.lineCount bar.compile->bar.classPath bar.assembly bar.assembly bar.classPath->bar.assembly bar.resources bar.resources bar.lineCount->bar.resources bar.resources->bar.assembly

On the other hand a change to bar/src/*.java would affect the bar.sources task, which would invalidate and cause evaluation of the tasks highlighted in red below:

G cluster_1 bar cluster_0 foo foo.sources foo.sources foo.compile foo.compile foo.sources->foo.compile foo.classPath foo.classPath foo.compile->foo.classPath foo.assembly foo.assembly foo.classPath->foo.assembly bar.compile bar.compile foo.classPath->bar.compile bar.classPath bar.classPath foo.classPath->bar.classPath foo.resources foo.resources foo.resources->foo.assembly bar.sources bar.sources bar.sources->bar.compile bar.lineCount bar.lineCount bar.sources->bar.lineCount bar.compile->bar.classPath bar.assembly bar.assembly bar.classPath->bar.assembly bar.resources bar.resources bar.lineCount->bar.resources bar.resources->bar.assembly

In the example changing bar/src/*.java, Mill may also take the opportunity to parallelize things:

  • bar.compile and bar.classPath can on a separate thread from bar.lineCount and bar.resources

  • bar.assembly must wait for both bar.classPath and bar.resources to complete before proceeding.

This parallelization is automatically done by Mill, and requires no effort from the user to enable. The exact parallelism may depend on the number of CPU cores available and exactly when each task starts and how long it takes to run, but Mill will generally parallelize things where possible to minimize the time taken to execute your tasks.

Some other things to note:

  • Tasks have their metadata cached to <task>.json files in the out/ folder, with any files created by the task are cached in <task>.dest/ folders. These file paths are all automatically assigned by Mill.

  • Mill treats builtin tasks (e.g. compile) and user-defined (e.g. lineCount) exactly the same. Both get automatically cached or skipped when not needed, and parallelized where possible. This happens without the task author needing to do anything to enable caching or parallelization

  • Mill evaluation does not care about the module structure of foo and bar. Mill modules are simply a way to define and re-use parts of the task graph, but it is the task graph that matters during evaluation

Bootstrapping

One part of the Mill evaluation model that is skimmed over above is what happens before Compilation: how does Mill actually get everything necessary to compile your build.mill and package.mill files? This is called bootstrapping, and proceeds roughly in the following phases:

  1. Mill’s bootstrap script first checks if the right version of Mill is already present, and if not it downloads the assembly jar to ~/.mill/download

  2. Mill instantiates an in-memory MillBuildRootModule.BootstrapModule, which is a hard-coded build.mill used for bootstrapping Mill

  3. If there is a meta-build present mill-build/build.mill, Mill processes that first and uses the MillBuildRootModule returned for the next steps. Otherwise it uses the MillBuildRootModule.BootstrapModule directly

  4. Mill evaluates the MillBuildRootModule to parse the build.mill, generate a list of ivyDeps as well as appropriately wrapped Scala code that we can compile, and compiles it to classfiles (Compilation above)

For most users, you do not need to care about the details of the Mill bootstrapping process, except to know that you only need a JVM installed to begin with and Mill will download everything necessary from the standard Maven Central package repository starting from just the bootstrap script (available as ./mill for Linux/Mac and ./mill.bat for Windows). The documentation for The Mill Meta Build goes into more detail of how you can configure and make use of it.

Consequences of the Mill Execution Model

This four-phase evaluation model has consequences for how you structure your build. For example:

  1. You can have arbitrary code outside of Tasks that helps set up your task graph and module hierarchy, e.g. computing what keys exist in a Cross module, or specifying your def moduleDeps. This code runs during Resolution

  2. You can have arbitrary code inside of Tasks, to perform your build actions. This code runs during Evaluation

  3. But your code inside of Tasks cannot influence the shape of the task graph or module hierarchy, as all Resolution logic happens first before any Evaluation of the Tasks bodies.

This should not be a problem for most builds, but it is something to be aware of. In general, we have found that having "two places" to put code - outside of Tasks to run during Planning or inside of Tasks to run during Evaluation - is generally enough flexibility for most use cases. You can generally just write "direct style" business logic you need - in the example above counting the lints in allSourceFiles - and Mill handles all the caching, invalidation, and parallelism for you without any additional work.

The hard boundary between these two phases is what lets users easily query and visualize their module hierarchy and task graph without running them: using inspect, plan, visualize, etc.. This helps keep your Mill build discoverable even as the build.mill codebase grows.

Caching in Mill

Apart from fine-grained caching of Tasks during Evaluation, Mill also performs incremental evaluation of the other phases. This helps ensure the overall workflow remains fast even for large projects:

  1. Compilation:

    • Done on-demand and incrementally using the Scala incremental compiler Zinc.

    • If some of the files build.mill imported changed but not others, only the changed files are re-compiled before the RootModule is re-instantiated

    • In the common case where build.mill was not changed at all, this step is skipped entirely and the RootModule object simply re-used from the last run.

  2. Resolution:

    • If the RootModule was re-used, then all previously-instantiated modules are simply-re-used

    • Any modules that are lazily instantiated during Resolution are also re-used.

  3. Planning

    • Planning is relatively quick most of the time, and is not currently cached.

  4. Evaluation:

    • Tasks are evaluated in dependency order

    • Cached Tasks only re-evaluate if their input Tasks change.

    • Persistent Taskss preserve the Task.dest folder on disk between runs, allowing for finer-grained caching than Mill’s default task-by-task caching and invalidation

    • Workers are kept in-memory between runs where possible, and only invalidated if their input Tasks change as well.

    • Tasks in general are invalidated if the code they depend on changes, at a method-level granularity via callgraph reachability analysis. See #2417 for more details

This approach to caching does assume a certain programming style inside your Mill build:

  • Mill may-or-may-not instantiate the modules in your build.mill the first time you run something (due to laziness)

  • Mill may-or-may-not re-instantiate the modules in your build.mill in subsequent runs (due to caching)

  • Mill may-or-may-not re-execute any particular task depending on caching, but your code needs to work either way.

  • Execution of any task may-or-may-not happen in parallel with other unrelated tasks, and may happen in arbitrary order

Your build code code needs to work regardless of which order they are executed in. However, for code written in a typical Scala style (which tends to avoid side effects), and limits filesystem operations to the Task.dest folder, this is not a problem at all.

One thing to note is for code that runs during Resolution: any reading of external mutable state needs to be wrapped in an interp.watchValue{…​} wrapper. This ensures that Mill knows where these external reads are, so that it can check if their value changed and if so re-instantiate RootModule with the new value.