Selective Test Execution

Mill allows you to filter the tests and other tasks you execute by limiting them to those affected by a code change. This is useful in managing large codebases where running the entire test suite in CI is often very slow, so you only want to run the tests or tasks that are affected by the changes you are making.

This is done via the following commands:

mill selective.prepare <selector>: run on the codebase before the code change, stores a snapshot of task inputs and implementations
mill selective.run <selector>: run on the codebase after the code change, runs tasks in the given <selector> which are affected by the code changes that have happened since selective.prepare was run
mill selective.resolve <selector>: a dry-run version of selective.run, prints out the tasks in <selector> that are affected by the code changes and would have run, without actually running them.

For example, if you want to run all tests related to the code changes in a pull request branch, you can do that as follows:

> git checkout main # start from the target branch of the PR

> ./mill selective.prepare

> git checkout pull-request-branch # go to the pull request branch

> ./mill selective.run __.test

The example below demonstrates selective test execution on a small 3-module Java build, where bar depends on foo but qux is standalone:

build.mill (download, browse)

package build
import mill.*, javalib.*

trait MyModule extends JavaModule {
  object test extends JavaTests, TestModule.Junit4
}

object foo extends MyModule {
  def moduleDeps = Seq(bar)
}

object bar extends MyModule

object qux extends MyModule

In this example, qux.test starts off failing with an error, while foo.test and bar.test pass successfully. Normally, running __.test will run all three test suites and show both successes and the one failure:

> ./mill __.test
error: Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...
Test run qux.QuxTests finished: 1 failed, 0 ignored, 1 total, ...

However, this is not always what you want. For example:

If you are validating a pull request in CI that only touches bar/, you do not want the failure in qux.test to fail your tests, because you know that qux.test does not depend on bar/ and thus the failure cannot be related to your changes.
Even if qux.test wasn’t failing, running it on a pull request that changes bar/ is wasteful, taking up compute resources to run tests that could not possibly be affected by the code change in question.

To solve this, you can run selective.prepare before the code change, then selective.run after the code change, to only run the tests downstream of the change (below, foo.test and bar.test):

> ./mill selective.prepare __.test

> echo '//' >> bar/src/bar/Bar.java # emulate the code change

> ./mill selective.resolve __.test # dry-run selective execution to show what would get run
foo.test.testForked
bar.test.testForked

> ./mill selective.run __.test
Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...

As we only touched bar's source files, we only need to run tests for

For a more detailed report of how the changed inputs resulted in the selected tasks being chosen, you can also use selective.resolveChanged to print out the upstream input tasks that Mill found had changed and will cause downstream tasks to be selected, or selective.resolveTree to print out the selected tasks as a JSON tree illustrating the relationships between the invalidated inputs (at the root of the tree) and the selected tasks (at the leaves of the tree)

> ./mill selective.resolveChanged __.test
bar.sources

> ./mill selective.resolveTree __.test
{
  "bar.sources": {
    "bar.allSources": {
      "bar.allSourceFiles": {
        "bar.compile": {
          "bar.localRunClasspath": {
            "bar.localClasspath": {
              "bar.test.transitiveLocalClasspath": {
                "bar.test.runClasspath": {
                  "bar.test.testForked": {}
                }
              },
              "foo.test.transitiveLocalClasspath": {
                "foo.test.runClasspath": {
                  "foo.test.testForked": {}
                }
              }
            }
          }
        }
      }
    }
  }
}

Similarly, if we make a change qux/, using selective execution will only run tests in qux.test, and skip those in foo.test and bar.test. These examples all use __.test to selectively run tasks named .test, but you can use selective execution on any subset of tasks by specifying them in the selector.

Selective execution is very useful for larger codebases, where you are usually changing only small parts of it, and thus only want to run the tests related to your changes. This keeps CI times fast and prevents unrelated breakages from affecting your CI runs.

selective.run relies on an out/mill-selective-execution.json file generated by seletive.prepare in order to work, and will report an error if that file is missing. You can also zero out that file to explicitly tell selective.run to run all given tasks non-selectively, which is convenient if you want to conditionally disable selective execution (e.g. perhaps you want to perform selective execution on pre-merge on pull requests but not post-merge on the main branch)

Although selective execution is most commonly used for testing, it is a general-purpose tool that can be used to selectively run any Mill tasks based on the code that changed.

Reproducibility and Determinism

Selective execution relies on the inputs to your project being deterministic and reproducible, except for the code changes between the two versions, so that Mill can compare the state of the build inputs before and after and only run tasks downstream of those that changed. This is usually the case, but there are some subtleties to be aware of:

Dynamic Task.Input to capture Git metadata must be disabled, e.g. using mill-vcs-version. The easiest way to do this is to guard such dynamic inputs on an environment variable, such that in most scenarios it returns a constant "SNAPSHOT" string, and only when necessary do you pass in the environment variable to compute a real version (e.g. during publishing)

def myProjectVersion: T[String] = Task.Input {
  if (Task.env.contains("MY_PROJECT_STABLE_VERSION")) VcsVersion.calcVcsState(Task.log).format()
  else "SNAPSHOT"
}

The filesystem layout and position of the before/after codebases must be exactly the same. This is not an issue when running selective.prepare/selective.run on the same folder on one machine, but if the two calls are run on separate machines you need to make sure the directory path is the same.
You must use the same Operating System amd Filesystem, as differences there will cause the filesystem signatures to change and thus spuriously trigger downstream tasks. e.g. you cannot run selective.prepare on a Windows machine and selective.run on Linux
Filesystem permissions must be preserved before/after. e.g. running selective,run} on different Github Actions machines sharing artifacts can cause issues as upload-artifact/download-artifact does not preserve filesystem permissions. If this is an issue, you can run chmod -R . 777 before each of selective.{prepare,run} to ensure they have the exact same filesystem permissions.

Debugging Selective Execution

Use selective.resolve before selective.run: this will print out what it was going to run, and can give you a chance to eyeball if the list of tasks to run makes sense or not
Look at out/mill-invalidation-tree.json, whether on disk locally or printing it out (e.g via cat) on your CI machines to diagnose issues there. This would give you a richer view of what source tasks or inputs are the ones actually triggered the invalidation, and what tasks were just invalidated due to being downstream of them.

Limitations

Selective execution can only work at the Mill Task granularity. e.g. When working with Java/Scala/Kotlin modules and test modules, the granularity of selection is at entire modules. That means that if your modules are individually large, selective execution may not be able to significantly narrow down the set of tests that need to run
Selective execution usually cannot narrow down the set of integration tests to run. Integration tests by their nature depend on the entire application or system, and run test cases that exercise different parts of it. But selective execution works at the task level and can only see that every integration test depends on the entire codebase, and so any change in the entire codebase could potentially affect any integration test, so selective execution will select all of them.
Selective execution is coarser-grained than runtime task caching. e.g. If you add a newline to a foo/src/Foo.java file and run foo.testCached, selective testing only knows that foo.sources changed and foo.testCached is downstream of it, but it cannot know that when you run foo.compile on the changed sources, the compilation output is unchanged, and so .testCached can be skipped. This is inherent in the nature of selective execution, which does its analysis without evaluation-time information and thus will always be more conservative than the task skipping and cache-reuse that is done during evaluation.