Selective Execution

Mill allows you to filter the tests and other tasks you execute by limiting them to those affected by a code change. This is useful in managing large codebases where running the entire test suite in CI is often very slow, so you only want to run the tests or tasks that are affected by the changes you are making.

This is done via the following commands:

  • mill selective.prepare <selector>: run on the codebase before the code change, stores a snapshot of task inputs and implementations

  • mill selective.run <selector>: run on the codebase after the code change, runs tasks in the given <selector> which are affected by the code changes that have happen since selective.prepare was run

  • mill selective.resolve <selector>: a dry-run version of selective.run, prints out the tasks in <selector> that are affected by the code changes and would have run, without actually tunning them.

For example, if you want to run all tests related to the code changes in a pull request branch, you can do that as follows:

> git checkout main # start from the target branch of the PR

> ./mill selective.prepare

> git checkout pull-request-branch # go to the pull request branch

> ./mill selective.run __.test

The example below demonstrates selective test execution on a small 3-module Java build, where bar depends on foo but qux is standalone:

build.mill (download, browse)
package build
import mill._, javalib._

trait MyModule extends JavaModule {
  object test extends JavaTests with TestModule.Junit4
}

object foo extends MyModule {
  def moduleDeps = Seq(bar)
}

object bar extends MyModule

object qux extends MyModule

In this example, qux.test starts off failing with an error, while foo.test and bar.test pass successfully. Normally, running __.test will run all three test suites and show both successes and the one failure:

> mill __.test
error: Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...
Test run qux.QuxTests finished: 1 failed, 0 ignored, 1 total, ...

However, this is not always what you want. For example:

  • If you are validating a pull request in CI that only touches bar/, you do not want the failure in qux.test to fail your tests, because you know that qux.test does not depend on bar/ and thus the failure cannot be related to your changes.

  • Even if qux.test wasn’t failing, running it on a pull request that changes bar/ is wasteful, taking up compute resources to run tests that could not possibly be affected by the code change in question.

To solve this, you can run selective.prepare before the code change, then selective.run after the code change, to only run the tests downstream of the change (below, foo.test and bar.test):

> mill selective.prepare __.test

> echo '//' >> bar/src/bar/Bar.java # emulate the code change

> mill selective.resolve __.test # dry-run selective execution to show what would get run
foo.test.test
bar.test.test

> mill selective.run __.test
Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...

Similarly, if we make a change qux/, using selective execution will only run tests in qux.test, and skip those in foo.test and bar.test. These examples all use __.test to selectively run tasks named .test, but you can use selective execution on any subset of tasks by specifying them in the selector.

Selective execution is very useful for larger codebases, where you are usually changing only small parts of it, and thus only want to run the tests related to your changes. This keeps CI times fast and prevents unrelated breakages from affecting your CI runs.

selective.run relies on an out/mill-selective-execution.json file generated by seletive.prepare in order to work, and will report an error if that file is missing. You can also zero out that file to explicitly tell selective.run to run all given tasks non-selectively, which is convenient if you want to conditionally disable selective execution (e.g. perhaps you want to perform selective execution on pre-merge on pull requests but not post-merge on the main branch)

Reproducibility and Determinism

Selective execution relies on the inputs to your project being deterministic and reproducible, except for the code changes between the two versions, so that Mill can compare the state of the build inputs before and after and only run tasks downstream of those that changed. This is usually the case, but there are some subtleties to be aware of:

  • Dynamic Task.Input to capture Git metadata must be disabled, e.g. using mill-vcs-version. The easiest way to do this is to guard such dynamic inputs on an environment variable, such that in most scenarios it returns a constant "SNAPSHOT" string, and only when necessary do you pass in the environment variable to compute a real version (e.g. during publishing)

def myProjectVersion: T[String] = Task.Input {
  if (Task.env.contains("MY_PROJECT_STABLE_VERSION")) VcsVersion.calcVcsState(Task.log).format()
  else "SNAPSHOT"
}
  • The filesystem layout and position of the before/after codebases must be exactly the same. This is not an issue when running selective.prepare/selective.run on the same folder on one machine, but if the two calls are run on separate machines you need to make sure the directory path is the same.

  • You must use the same Operating System amd Filesystem, as differences there will cause the filesystem signatures to change and thus spuriously trigger downstream tasks. e.g. you cannot run selective.prepare on a Windows machine and selective.run on Linux

  • Filesystem permissions must be preserved before/after. e.g. running selective,run} on different Github Actions machines sharing artifacts can cause issues as upload-artifact/download-artifact does not preserve filesystem permissions. If this is an issue, you can run chmod -R . 777 before each of selective.{prepare,run} to ensure they have the exact same filesystem permissions.

Debugging Selective Execution

  • Use selective.resolve before selective.run: this will print out what it was going to run, and can give you a chance to eyeball if the list of targets to run makes sense or not

  • Look at out/mill-invalidation-tree.json, whether on disk locally or printing it out (e.g via cat) on your CI machines to diagnose issues there. This would give you a richer view of what source tasks or inputs are the ones actually triggered the invalidation, and what tasks were just invalidated due to being downstream of them.