Selective Test Execution

Mill allows you to filter the tests and other tasks you execute by limiting them to those affected by a code change. This is useful in managing large codebases where running the entire test suite in CI is often very slow, so you only want to run the tests or tasks that are affected by the changes you are making.

This is done via the following commands:

  • mill selective.prepare <selector>: run on the codebase before the code change, stores a snapshot of task inputs and implementations

  • mill selective.run <selector>: run on the codebase after the code change, runs tasks in the given <selector> which are affected by the code changes that have happen since selective.prepare was run

  • mill selective.resolve <selector>: a dry-run version of selective.run, prints out the tasks in <selector> that are affected by the code changes and would have run, without actually tunning them.

For example, if you want to run all tests related to the code changes in a pull request branch, you can do that as follows:

> git checkout main # start from the target branch of the PR

> ./mill selective.prepare

> git checkout pull-request-branch # go to the pull request branch

> ./mill selective.run __.test

The example below demonstrates selective test execution on a small 3-module Java build, where bar depends on foo but qux is standalone:

build.mill (download, browse)
package build
import mill._, javalib._

trait MyModule extends JavaModule {
  object test extends JavaTests with TestModule.Junit4
}

object foo extends MyModule {
  def moduleDeps = Seq(bar)
}

object bar extends MyModule

object qux extends MyModule
G qux qux qux.test qux.test qux->qux.test bar bar bar.test bar.test bar->bar.test foo foo bar->foo foo.test foo.test foo->foo.test

In this example, qux.test starts off failing with an error, while foo.test and bar.test pass successfully. Normally, running __.test will run all three test suites and show both successes and the one failure:

G qux.test qux.test qux qux qux->qux.test bar bar bar.test bar.test bar->bar.test foo foo bar->foo foo.test foo.test foo->foo.test
> mill __.test
error: Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...
Test run qux.QuxTests finished: 1 failed, 0 ignored, 1 total, ...

However, this is not always what you want. For example:

  • If you are validating a pull request in CI that only touches bar/, you do not want the failure in qux.test to fail your tests, because you know that qux.test does not depend on bar/ and thus the failure cannot be related to your changes.

  • Even if qux.test wasn’t failing, running it on a pull request that changes bar/ is wasteful, taking up compute resources to run tests that could not possibly be affected by the code change in question.

To solve this, you can run selective.prepare before the code change, then selective.run after the code change, to only run the tests downstream of the change (below, foo.test and bar.test):

> mill selective.prepare __.test

> echo '//' >> bar/src/bar/Bar.java # emulate the code change

> mill selective.resolve __.test # dry-run selective execution to show what would get run
foo.test.test
bar.test.test

> mill selective.run __.test
Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...

As we only touched bar's source files, we only need to run tests for

G qux qux qux.test qux.test qux->qux.test bar bar bar.test bar.test bar->bar.test foo foo bar->foo foo.test foo.test foo->foo.test

Similarly, if we make a change qux/, using selective execution will only run tests in qux.test, and skip those in foo.test and bar.test. These examples all use __.test to selectively run tasks named .test, but you can use selective execution on any subset of tasks by specifying them in the selector.

Selective execution is very useful for larger codebases, where you are usually changing only small parts of it, and thus only want to run the tests related to your changes. This keeps CI times fast and prevents unrelated breakages from affecting your CI runs.

selective.run relies on an out/mill-selective-execution.json file generated by seletive.prepare in order to work, and will report an error if that file is missing. You can also zero out that file to explicitly tell selective.run to run all given tasks non-selectively, which is convenient if you want to conditionally disable selective execution (e.g. perhaps you want to perform selective execution on pre-merge on pull requests but not post-merge on the main branch)

Although selective execution is most commonly used for testing, it is a general-purpose tool that can be used to selectively run any Mill tasks based on the code that changed.

Reproducibility and Determinism

Selective execution relies on the inputs to your project being deterministic and reproducible, except for the code changes between the two versions, so that Mill can compare the state of the build inputs before and after and only run tasks downstream of those that changed. This is usually the case, but there are some subtleties to be aware of:

  • Dynamic Task.Input to capture Git metadata must be disabled, e.g. using mill-vcs-version. The easiest way to do this is to guard such dynamic inputs on an environment variable, such that in most scenarios it returns a constant "SNAPSHOT" string, and only when necessary do you pass in the environment variable to compute a real version (e.g. during publishing)

def myProjectVersion: T[String] = Task.Input {
  if (Task.env.contains("MY_PROJECT_STABLE_VERSION")) VcsVersion.calcVcsState(Task.log).format()
  else "SNAPSHOT"
}
  • The filesystem layout and position of the before/after codebases must be exactly the same. This is not an issue when running selective.prepare/selective.run on the same folder on one machine, but if the two calls are run on separate machines you need to make sure the directory path is the same.

  • You must use the same Operating System amd Filesystem, as differences there will cause the filesystem signatures to change and thus spuriously trigger downstream tasks. e.g. you cannot run selective.prepare on a Windows machine and selective.run on Linux

  • Filesystem permissions must be preserved before/after. e.g. running selective,run} on different Github Actions machines sharing artifacts can cause issues as upload-artifact/download-artifact does not preserve filesystem permissions. If this is an issue, you can run chmod -R . 777 before each of selective.{prepare,run} to ensure they have the exact same filesystem permissions.

Debugging Selective Execution

  • Use selective.resolve before selective.run: this will print out what it was going to run, and can give you a chance to eyeball if the list of targets to run makes sense or not

  • Look at out/mill-invalidation-tree.json, whether on disk locally or printing it out (e.g via cat) on your CI machines to diagnose issues there. This would give you a richer view of what source tasks or inputs are the ones actually triggered the invalidation, and what tasks were just invalidated due to being downstream of them.

Limitations

  • Selective execution can only work at the Mill Task granularity. e.g. When working with Java/Scala/Kotlin modules and test modules, the granularity of selection is at entire modules. That means that if your modules are individually large, selective execution may not be able to significantly narrow down the set of tests that need to run

  • Selective execution usually cannot narrow down the set of integration tests to run. Integration tests by their nature depend on the entire application or system, and run test cases that exercise different parts of it. But selective execution works at the task level and can only see that every integration test depends on the entire codebase, and so any change in the entire codebase could potentially affect any integration test, so selective execution will select all of them.

  • Selective execution is coarser-grained than runtime task caching. e.g. If you add a newline to a foo/src/Foo.java file and run foo.testCached, selective testing only knows that foo.sources changed and foo.testCached is downstream of it, but it cannot know that when you run foo.compile on the changed sources, the compilation output is unchanged, and so .testCached can be skipped. This is inherent in the nature of selective execution, which does its analysis without evaluation-time information and thus will always be more conservative than the task skipping and cache-reuse that is done during evaluation.