What does a Build Tool do?

Li Haoyi, 13 February 2025

The most common question I get asked about the Mill build tool is: what does a build tool even do? Even software developers may not be familiar with the idea: they may run pip install and python foo.py, javac Foo.java, or go build or some other language-specific CLI directly. They may have a folder full of custom Bash scripts they use during development. Or they may develop software on a team where someone else has set up the tooling that they use. This blog post explores what build tools are all about, why they are important to most software projects as they scale, and how they work under the hood.

What is a Build Tool?

A build tool is a program that automates selecting which tools or tasks to run - and how to run them - whenever you want to do something in your codebase.

A hello-world program may get by with javac Foo.java to compile your code and java Foo to run it, or even a single command python foo.py. But for most real-world projects a single "user facing" task may require a lot of work to be done first. The job of the build tool is to help decide what work needs to be done to accomplish what you want, and how best to perform that work with caching and parallelism to complete it as quickly and as efficiently as possible.

Build Tools as Task Orchestrators

For example, in a real-world project, before running your code you may first need to:

  • Use the configured direct third-party dependencies to resolve the transitive dependencies

  • Download all transitive third-party library dependencies to files on disk

  • Compile internal library dependencies, i.e. upstream modules in the same codebase

  • Run code generators on IDL files (e.g. protobuf, openapi, thrift, etc.) to generate source files

  • Run the compiler (if your language has one) on source files to generate binary files

  • Package the binary files into an executable

  • Run the static asset pipeline on input files to generate static assets (e.g. assets for video games or websites)

So although the developer may have just asked to run their code, it is only after all these tasks are done does it make sense to begin running it! However, it’s not as simple as running every one of the tasks listed above one after another. You also need to decide which ones to run and how to run them:

  • You want to skip any tasks that have been run earlier whose outputs you can re-use to save time (e.g. by not generating static assets if they have already been generated)

  • You want to make sure to re-run tasks whose inputs have changed, discarding any out-of-date output (e.g. if a static input file is changed, then the static asset pipeline needs to be re-run)

  • You want to parallelize tasks which are independent of each other (e.g. running the compiler can happen in parallel with running static asset pipeline)

  • You want to not parallelize tasks which depend on one another, and thus cannot happen in parallel (e.g. running the compiler must happen after running code generators, since we need to compile the generated code)

Working on any real-world codebase will involve a large variety of tasks: resolving and downloading dependencies, autoformatting, linting, code-generation, compiling, testing, packaging and publishing, and so on. These tasks depend on each other in various, ways, and it is the job of the build tool to take what you (as a developer) want to do, and select which tasks to run and how to run them in order to accomplish what you want.

Build Tools across Languages

Although programming languages are very different, developers in all languages have similar requirements. Thus most languages have one-or-more build tools, that then integrate with the equivalent external tools and services specific to that language:

Language Build Tool Package Repository Compiler …​

Java

Maven, Gradle, Mill

Maven Central

javac

…​

Python

Poetry

PyPI

mypy

…​

Javascript

Webpack, Vite

NPM

tsc

…​

Ruby

Rake

RubyGems

Sorbet

…​

Rust

Cargo

Crates

rustc

…​

Generally each language’s build tools will come with integrations for the language’s other tooling by default: package repositories, compilers, linters, etc. There are also language agnostic build tools out there like Make, or multi-language build tools such as Bazel which is optimized for large monorepos that often contain multiple languages. The Mill build tool also supports multiple languages: Java, Kotlin, Scala, Python, and (experimentally) Javascript, allowing it to be used for Multi-Language Builds.

Build Tools for Small Codebases

Most projects start off small, and while they can use a build tool, do not need one. A single-file program or script can get by with using its language’s built-in CLI to compile/run/test their code, and for small codebases these workflows are fast enough that parallelization or caching are not needed.

Nevertheless, even in a small project, a simple task like "installing third party dependencies" can be surprisingly challenging. Consider this scenario working with a simple Python script:

  • You start off running pip install something, run your code with python foo.py. Great!

  • The project bumped the version of something required, and now everyone working on the project must pip install it again, or else they will get weird errors from having the wrong version installed

  • If you checkout an old release branch of the project, you need to remember to pip install to go back to the old version of something. And then pip install the new version when you come back to the main branch later!

  • What if you end up with two scripts, foo.py and bar.py, that each requires different versions of the same something? Now you need to remember to manually python -m venv to create a virtual environment for each script, and remember to update each virtualenv whenever that script’s required something version changes.

As you can see, even for the single problem of "installing third-party dependencies", keeping everything in sync and everyone’s laptops up to date can be a real headache!

Developer workflows tend to get more complex and slower over time as projects progress and codebases grow. This can end up being a significant burden on the project’s developers, and slow down the project overall:

  • Each project developer ends up spending mental energy to juggle a dozen different tools to run linters, code-generation, autoformatting, unit-testing, integration-testing, etc., making sure to run them in the right order for things to work.

  • They also end up spending time just waiting, as their growing codebase takes longer and longer to compile, and their growing test suite takes longer and longer to run

Thus, even in small codebases it often makes sense to use a build tool. The manual work necessary to get a small codebase running can be surprisingly non-trivial, and as the codebase grows (as they tend to do) the build tool becomes even more necessary over time.

Build Tools vs Build Scripts

One common alternative to build tools is to write scripts to help manage the development of the project: e.g. you may find a dev/ folder full of scripts like run-tests.sh or lint.sh, and so on. These scripts may be written in Bash, Python, Ruby, Javascript, and so on. While this works, the approach of scripting your development workflows directly in Bash runs into trouble as the codebase grows:

  • Initially, these scripts serve to just enumerate the necessary steps to do something in your code, e.g.

    • integration-tests.sh may require download-dependencies.sh, codegen.sh, compile.sh, package.sh to be run in order first

    • unit-tests.sh on the other hand may need download-dependencies.sh, codegen.sh, compile.sh, but not need package.sh

  • Next, these scripts inevitably begin performing rudimentary caching: e.g. if you ran unit-tests.sh earlier and now want to run integration-tests.sh, we can skip download-dependencies.sh, codegen.sh, compile.sh, but need to now run package.sh

  • Eventually, these scripts inevitably start parallelizing parts of the workflow: e.g. codegen.sh and download-dependencies.sh may be able to run in parallel, while compile.sh can only run after both of those are finished. Your unit-tests and integration-tests may also run in parallel to save time.

At this stage, your scripts have their own ad-hoc dependency-management, caching, and parallelization engine! Because your main focus is on your actual project, this ad-hoc engine will never be in great shape: the performance won’t be optimal, the error messages and usability won’t be great, and bugs and issues will tend to linger. This will be a drag on your productivity, since even if your focus is on your main project, you still need to interact with your build scripts constantly throughout the work day.

At its core, a build tool basically automates these things that you would have implemented yourself anyway: it provides the ordering of tasks, parallelism, caching, and probably does so better than you could implement in your own ad-hoc build scripts. So even though any developer should be able to wrangle Bash or Python enough to get the ordering/parallelism/caching they need, they probably shouldn’t actually do it and just use an off-the-shelf build tool which has all these problems already solved.

Build Tools as Custom Task Runtimes

Most codebases have some amount of custom tasks and workflows. While many workflows are standardized - e.g. using the same Java compiler, Python interpreter, etc. - it is almost inevitable that over time the codebase will pick up workflows unique to its place in the business and organization. For example:

  • Custom code generation, to integrate with some internal RPC system no-one else uses

  • Custom linters, to cover common mistakes that your developers tend to make

  • Custom deployment artifacts, to deploy to a new cloud platform that hasn’t become popular yet

The default way of handling this customization is the aforementioned folder-full-of-scripts, where you have a do-custom-thing.sh script to run your custom logic. While this does work, it can be problematic for a number of reasons:

  1. Bash scripts are not an easy programming environment to work in, so custom tasks implemented as scripts tend to be buggy and fragile. Even implementing logic like "if-else" or "for-loops" in Bash can be error-prone and easy to mess up!

  2. Non-Bash scripting languages tend to have portability issues: e.g. Python scripts tend to be difficult to run reliably on different machines which may have different Python versions or dependencies installed, and Ruby scripts may have issues running on Windows.

  3. You usually want caching and parallelism in your custom tasks in order to make your workflows performant, and correctly and efficiently implementing a caching/parallelization engine in Bash (or some other scripting language) can be quite a challenge!

Most build tools thus provide some kind of "plugin system" to let you implement your custom logic in a more comfortable programming environment than Bash: Maven’s MOJO interface interface lets you write plugins in Java, Webpack allows you to write Webpack Plugins in Javascript, Bazel provides the Starlark Language for writing extensions, and so on. The Mill build tool’s custom logic is written in Scala and runs on the JVM, and thus comes with typechecking, IDE support, access to the standard JVM libraries and package repositories and people are already used to.

How custom tasks and workflows are written does not matter for small projects where customizations are trivial. But in larger projects with a non-trivial amount of custom logic, it becomes very important. Providing a safe, easy-to-use way to customize your build is thus a big benefit of using a build tool, one that gets increasingly important as the project grows.

Build Tools for Large Codebases and Monorepos

Twice now we’ve mentioned that build tools get more important as projects grow, so it’s worth calling out the need for build tools on very-large-codebases: those with 100 to 1,000 to even 10,000 developers actively working on it. Perhaps 1,000,000s or 10,000,000s of lines of code. Very-large-codebases have an acute need for something to help manage the development workflows, which "monorepo" build tools like Bazel provides. Some examples of "large codebase" requirements are:

  • Selective Testing, to avoid running the entire test suite (which may take hours) by only running the tests related to a change. e.g. Bazel supports this via the third party bazel-diff package, while Mill supports this built-in as Selective Test Execution

  • Multi-language support: e.g. a Java server with a Javascript frontend with a Python ML workflows. e.g. Bazel has rules{lang} for a wide variety of languages (e.g. rules_java, rules_python, rules_go), and Mill has support for Multi-Language Builds

  • Distributed caching and execution: allowing different CI machines or developer laptops to share compiled artifacts so a module compiled on one machine and be re-used on another, or submitting a large workflow to a cluster of machines to parallelize it more than you could on a single laptop. This has traditionally been something only Bazel supports, though over time more build tools are adopting these techniques

In large codebases, using a build tool is no longer an optional nice-to-have, but becomes table-stakes. In large codebases you need the parallelism and caching that a build tool provides, otherwise you may end up waiting hours to compile and test each small change. You need the ability to customize and extend the build logic, in some way that doesn’t become a rat’s nest of shell scripts. You need to build 3-5 different languages in order to generate a single web-service or executable. This is where tools like Bazel, Pants, or Buck really shine, although other tools like Gradle or Mill also support some of the features necessary for working with large codebases.

See the following blog post for a deeper discussion on what features a "monorepo build tool" provides and why they are necessary:

How Build Tools Work: The Build Graph

After all this talk about what a build tool is and why you would need one, it is worth exploring how most modern build tools work. At their core, most modern build tools are some kind of graph evaluation engine.

For example, consider the various tasks we mentioned earlier:

  • Use the configured direct third-party dependencies to resolve the transitive dependencies

  • Download all transitive third-party library dependencies to files on disk

  • Compile upstream internal library dependencies

  • Run code generators on IDL files to generate source files

  • Run the compiler on source files to generate binary files

  • Package the binary files to generate executable

  • Run the static asset pipeline on input files to generate static assets

We might even include a few more:

  • Run unit tests on binary files to generate a test report

  • Run integration tests on executable to generate a test report

At their core, most build steps are of the form

  • Run TOOL on INPUT1, INPUT2, …​ to generate OUTPUT

Which can be visualized as a node in a graph

G input1 input1 tool tool input1->tool output output tool->output input2 input2 input2->tool

If we consider the tasks we looked at earlier, it might form a graph as shown below, where the boxes are the tasks, non-boxed text labels are the input files, and the arrows are the dataflow between them

G direct_deps direct_deps resolve_deps resolve_deps direct_deps->resolve_deps compile compile resolve_deps->compile unit_test unit_test compile->unit_test package package compile->package code_gen code_gen code_gen->compile sources sources sources->compile static_input_files static_input_files asset_pipeline asset_pipeline static_input_files->asset_pipeline integration_test integration_test asset_pipeline->integration_test package->integration_test

Ordering on the Build Graph

It is from this graph representation that most build tools are able to work their magic. For example, if you ask to run unit_test (blue), then the build tool can traverse the graph edges (red) to find it needs to ensure compile, code_gen, and resolve_deps need to be run (red)

G direct_deps direct_deps resolve_deps resolve_deps direct_deps->resolve_deps compile compile resolve_deps->compile unit_test unit_test compile->unit_test package package compile->package code_gen code_gen code_gen->compile sources sources sources->compile static_input_files static_input_files asset_pipeline asset_pipeline static_input_files->asset_pipeline integration_test integration_test asset_pipeline->integration_test package->integration_test

Furthermore, the build tool is able to figure out what tasks need to run in what order, simply by doing a topological sort of the subset it needs. In this case, it knows that resolve_deps and code_gen must both finish running before compile can begin, and compile must finish running before unit_test can begin.

Caching and Invalidation via the Build Graph

If you then subsequently ask to run integration_test, the build tool can see that compile, code_gen, and resolve_deps were run earlier and can be re-used (green) while package and asset_pipeline need to be run

G direct_deps direct_deps resolve_deps resolve_deps direct_deps->resolve_deps compile compile resolve_deps->compile unit_test unit_test compile->unit_test package package compile->package code_gen code_gen code_gen->compile sources sources sources->compile static_input_files static_input_files asset_pipeline asset_pipeline static_input_files->asset_pipeline integration_test integration_test asset_pipeline->integration_test package->integration_test

If you then change a source file in sources and ask to run integration_test again, the build tool can again traverse the graph edges and see that:

  • resolve_deps, code_gen, and asset_pipeline are not downstream of sources and can be reused

  • compile and package are downstream of sources and need to be re-run

  • unit_test is not needed for integration_test, and so can be ignored

G direct_deps direct_deps resolve_deps resolve_deps direct_deps->resolve_deps compile compile resolve_deps->compile unit_test unit_test compile->unit_test package package compile->package code_gen code_gen code_gen->compile sources sources sources->compile static_input_files static_input_files asset_pipeline asset_pipeline static_input_files->asset_pipeline integration_test integration_test asset_pipeline->integration_test package->integration_test

Thus, having a model of the build graph is fundamental to how most build tools work their magic. When the user asks to run unit_test or integration_test, the build tool can simply traversal build graph to automatically determine which tasks need to run in order to do that, and which tasks have earlier output that can be re-used.

Parallelism on the Build Graph

Apart from caching, the build graph is also useful for automatically parallelizing your build tasks. For example, consider a case where we want to do a clean build (i.e. no caching) of unit_test and integration_test. From the build graph, Mill is able to determine:

  • resolve_deps, code_gen, and asset_pipeline can immediately start running in parallel (green)

  • compile, package, and integration_test must run sequentially (red), only starting once resolve_deps and code_gen is complete

  • unit_test (blue) can run in parallel with asset_pipeline, package or integration_test, but only can start once compile is complete

G direct_deps direct_deps resolve_deps resolve_deps direct_deps->resolve_deps compile compile resolve_deps->compile unit_test unit_test compile->unit_test package package compile->package code_gen code_gen code_gen->compile sources sources sources->compile static_input_files static_input_files asset_pipeline asset_pipeline static_input_files->asset_pipeline integration_test integration_test asset_pipeline->integration_test package->integration_test

Most modern build tools do this kind of graph-based parallelism automatically. In contrast to most programming languages and application frameworks where you need to set up parallelism yourself, in a build tool you don’t need to fiddle with threads, locks, semaphores, futures, actors, and so on. You just define the shape of the build graph using whatever configuration format or language the build tool provides, and you get parallelism for free.

Languages for defining your Build Graph

Every build tool provides some format for defining the build graph data structure. There isn’t any industry-wide standard text format for graph data structures, so each build tool comes up with something on their own. Here we’ll look at how it is done in a few common build tools:

Bazel

Bazel uses the Starlark language, a dialect of Python. Below I show an example BUILD file from their documentation on using Bazel for C/C++. The Python functions like cc_library or cc_binary are called rules and by calling the function you create a target with the given name and dependencies on upstream targets (deps) and source files (srcs):

cc_library(
    name = "hello-greet",
    srcs = ["hello-greet.cc"],
    hdrs = ["hello-greet.h"],
)

cc_binary(
    name = "hello-world",
    srcs = ["hello-world.cc"],
    deps = [":hello-greet"],
)
G main/hello-greet.cc\nmain/hello-greet.h main/hello-greet.cc main/hello-greet.h //main:hello-greet //main:hello-greet main/hello-greet.cc\nmain/hello-greet.h->//main:hello-greet //main:hello-world //main:hello-world //main:hello-greet->//main:hello-world main/hello-world.cc main/hello-world.cc main/hello-world.cc->//main:hello-world

In Bazel, the implementation of what rules like cc_library actually do is done by upstream build libraries implemented in Java or Starlark. You can also define your own custom rules in .bzl files, or use other people’s rules from one of the rules_{lang} repos on Bazel-Contrib, if you need functionality that Bazel does not come built in with

Gradle

Gradle lets you define custom tasks in either Kotlin or Groovy. Below I show an example custom task from their documentation written in Kotlin, that adds a new packageApp task:

val packageApp = tasks.register<Zip>("packageApp") {
    from(layout.projectDirectory.file("run.sh"))                // input - run.sh file
    from(tasks.jar) {                                           // input - jar task output
        into("libs")
    }
    from(configurations.runtimeClasspath) {                     // input - jar of dependencies
        into("libs")
    }
    destinationDirectory.set(layout.buildDirectory.dir("dist")) // output - location of the zip file
    archiveFileName.set("myApplication.zip")                    // output - name of the zip file
}
G run.sh run.sh packageApp packageApp run.sh->packageApp tasks.jar tasks.jar tasks.jar->packageApp configurations.runClassPath configurations.runClassPath configurations.runClassPath->packageApp

packageApp depends on the run.sh source file, the output of the tasks tasks.jar and configurations.runClasspath, and some other miscellanious configuration. The kotlin code is a bit idiosyncratic with the from and into helpers, and in this case needs to integrate with the pre-defined Zip class that represents this task type and comes from upstream in Gradle. But the end result of this syntax is to define a small snippet of the build graph as shown above, and it is this graph that ends up being important in your build system.

Mill

Mill uses Scala for it’s build file format, and lets you write normal method defs to define custom tasks in your build graph. The different methods can call each other, e.g. def resources below can call lineCount() or super.resources(), and these method calls become the edges in your build graph:

/** Total number of lines in module source files */
def lineCount = Task {
  allSourceFiles().map(f => os.read.lines(f.path).size).sum
}

/** Generate resources using lineCount of sources */
override def resources = Task {
  os.write(Task.dest / "line-count.txt", "" + lineCount())
  super.resources() ++ Seq(PathRef(Task.dest))
}
G allSourceFiles allSourceFiles lineCount lineCount allSourceFiles->lineCount resources resources lineCount->resources run run resources->run resources.super resources.super resources.super->resources

Perhaps the most interesting thing about Mill is that you can write arbitrary code as part of your defs: above you can see the code for iterating over files, counting their lines, writing out a line-count.txt file, and so on. This is in contrast with Bazel or Gradle, where your Starlark/Kotlin/Groovy code is really just a fancy YAML file that configures build logic defined "elsewhere".

This arbitrary-code approach is similar to the approach taken by the venerable Make tool, but running on the JVM comes with additional advantages:

  • IDE support (e.g. in IntelliJ and VSCode): your IDE is able to provide autocomplete, error-highlighting, and navigation up and down your build graph. This is something impossible in make (which runs Bash scripts too dynamic for your IDE to analyze) or config-based tools like Bazel or Gradle (where the code you write just configures an execution-engine that your IDE doesn’t understand)

  • The compiler can check for mistakes: this is especially important in a build tool as unlike application code, build config tends to be written by non-experts and not covered by unit or integration tests. Thus any assistance in catching bugs is very valuable, and here the compiler provides a lot of help for free.

  • You can use any JVM library. e.g. The os.read.lines or os.write functions are from OS-Lib, but you can download any library you want from Java’s Maven Central repository and use it in your build with full functionality and IDE support. The JVM ecosystem is deep and broad, and you can find libraries to do almost anything imaginable available for free. Mill lets you make full use of this to customize your build to do exactly what you need.

Mill thus provides a safe, ease-to-use programming environment for working with your build pipelines. And although you can write arbitrary code in each task, the Task{ } wrapper automatically provides parallelism, caching, and other things you want in your build tool.

Conclusion

Although modern build tools may look very different on the surface, most of them are surprisingly similar once you peek under the covers. Bazel’s StarLark config, Gradle’s Groovy/Kotlin config, Mill’s Scala config, all end up boiling down to a build graph similar to the one above with only minor differences. And although the way they execute their tasks using the build graph does differ, at their most fundamental level they use the sort of graph traversals that I discuss above.

At their core, most build tools have the same goal. A build tool takes the wide variety of tasks a developer needs to do during development, and automatically run them as efficiently and quickly as possible, with caching and parallelism and whatever other optimizations it can find.

Hopefully this blog post has given you a better appreciation for how build tools do what they do, and perhaps give you some insight next time you need to debug a build tool that is doing something wrong!