Selective Test Execution
Mill allows you to filter the tests and other tasks you execute by limiting them to those affected by a code change. This is useful in managing large codebases where running the entire test suite in CI is often very slow, so you only want to run the tests or tasks that are affected by the changes you are making.
This is done via the following commands:
-
mill selective.prepare <selector>
: run on the codebase before the code change, stores a snapshot of task inputs and implementations -
mill selective.run <selector>
: run on the codebase after the code change, runs tasks in the given<selector>
which are affected by the code changes that have happen sinceselective.prepare
was run -
mill selective.resolve <selector>
: a dry-run version ofselective.run
, prints out the tasks in<selector>
that are affected by the code changes and would have run, without actually tunning them.
For example, if you want to run all tests related to the code changes in a pull request branch, you can do that as follows:
> git checkout main # start from the target branch of the PR
> ./mill selective.prepare
> git checkout pull-request-branch # go to the pull request branch
> ./mill selective.run __.test
The example below demonstrates selective test execution on a small 3-module Java build,
where bar
depends on foo
but qux
is standalone:
package build
import mill._, javalib._
trait MyModule extends JavaModule {
object test extends JavaTests with TestModule.Junit4
}
object foo extends MyModule {
def moduleDeps = Seq(bar)
}
object bar extends MyModule
object qux extends MyModule
In this example, qux.test
starts off failing with an error, while foo.test
and
bar.test
pass successfully. Normally, running __.test
will run all three test
suites and show both successes and the one failure:
> mill __.test
error: Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...
Test run qux.QuxTests finished: 1 failed, 0 ignored, 1 total, ...
However, this is not always what you want. For example:
-
If you are validating a pull request in CI that only touches
bar/
, you do not want the failure inqux.test
to fail your tests, because you know thatqux.test
does not depend onbar/
and thus the failure cannot be related to your changes. -
Even if
qux.test
wasn’t failing, running it on a pull request that changesbar/
is wasteful, taking up compute resources to run tests that could not possibly be affected by the code change in question.
To solve this, you can run selective.prepare
before the code change, then selective.run
after the code change, to only run the tests downstream of the change (below, foo.test
and bar.test
):
> mill selective.prepare __.test
> echo '//' >> bar/src/bar/Bar.java # emulate the code change
> mill selective.resolve __.test # dry-run selective execution to show what would get run
foo.test.test
bar.test.test
> mill selective.run __.test
Test run foo.FooTests finished: 0 failed, 0 ignored, 1 total, ...
Test run bar.BarTests finished: 0 failed, 0 ignored, 1 total, ...
As we only touched bar
's source files, we only need to run tests for
Similarly, if we make a change qux/
, using selective execution will only run tests
in qux.test
, and skip those in foo.test
and bar.test
.
These examples all use __.test
to selectively run tasks named .test
, but you can
use selective execution on any subset of tasks by specifying them in the selector.
Selective execution is very useful for larger codebases, where you are usually changing only small parts of it, and thus only want to run the tests related to your changes. This keeps CI times fast and prevents unrelated breakages from affecting your CI runs.
selective.run
relies on an out/mill-selective-execution.json
file generated by
seletive.prepare
in order to work, and will report an error if that file is missing.
You can also zero out that file to explicitly tell selective.run
to run all given
tasks non-selectively, which is convenient if you want to conditionally disable selective
execution (e.g. perhaps you want to perform selective execution on pre-merge on pull
requests but not post-merge on the main branch)
Although selective execution is most commonly used for testing, it is a general-purpose tool that can be used to selectively run any Mill tasks based on the code that changed.
Reproducibility and Determinism
Selective execution relies on the inputs to your project being deterministic and reproducible, except for the code changes between the two versions, so that Mill can compare the state of the build inputs before and after and only run tasks downstream of those that changed. This is usually the case, but there are some subtleties to be aware of:
-
Dynamic
Task.Input
to capture Git metadata must be disabled, e.g. using mill-vcs-version. The easiest way to do this is to guard such dynamic inputs on an environment variable, such that in most scenarios it returns a constant"SNAPSHOT"
string, and only when necessary do you pass in the environment variable to compute a real version (e.g. during publishing)
def myProjectVersion: T[String] = Task.Input {
if (Task.env.contains("MY_PROJECT_STABLE_VERSION")) VcsVersion.calcVcsState(Task.log).format()
else "SNAPSHOT"
}
-
The filesystem layout and position of the before/after codebases must be exactly the same. This is not an issue when running
selective.prepare
/selective.run
on the same folder on one machine, but if the two calls are run on separate machines you need to make sure the directory path is the same. -
You must use the same Operating System amd Filesystem, as differences there will cause the filesystem signatures to change and thus spuriously trigger downstream tasks. e.g. you cannot run
selective.prepare
on a Windows machine andselective.run
on Linux -
Filesystem permissions must be preserved before/after. e.g. running
selective,run}
on different Github Actions machines sharing artifacts can cause issues asupload-artifact
/download-artifact
does not preserve filesystem permissions. If this is an issue, you can runchmod -R . 777
before each ofselective.{prepare,run}
to ensure they have the exact same filesystem permissions.
Debugging Selective Execution
-
Use
selective.resolve
beforeselective.run
: this will print out what it was going to run, and can give you a chance to eyeball if the list of targets to run makes sense or not -
Look at out/mill-invalidation-tree.json, whether on disk locally or printing it out (e.g via
cat
) on your CI machines to diagnose issues there. This would give you a richer view of what source tasks or inputs are the ones actually triggered the invalidation, and what tasks were just invalidated due to being downstream of them.
Limitations
-
Selective execution can only work at the Mill Task granularity. e.g. When working with Java/Scala/Kotlin modules and test modules, the granularity of selection is at entire modules. That means that if your modules are individually large, selective execution may not be able to significantly narrow down the set of tests that need to run
-
Selective execution usually cannot narrow down the set of integration tests to run. Integration tests by their nature depend on the entire application or system, and run test cases that exercise different parts of it. But selective execution works at the task level and can only see that every integration test depends on the entire codebase, and so any change in the entire codebase could potentially affect any integration test, so selective execution will select all of them.
-
Selective execution is coarser-grained than runtime task caching. e.g. If you add a newline to a
foo/src/Foo.java
file and runfoo.testCached
, selective testing only knows thatfoo.sources
changed andfoo.testCached
is downstream of it, but it cannot know that when you runfoo.compile
on the changed sources, the compilation output is unchanged, and so.testCached
can be skipped. This is inherent in the nature of selective execution, which does its analysis without evaluation-time information and thus will always be more conservative than the task skipping and cache-reuse that is done during evaluation.