What does a Build Tool do?
Li Haoyi, 13 February 2025
The most common question I get asked about the Mill build tool is: what does a build
tool even do? Even software developers may not be familiar with the idea: they may run
pip install
and python foo.py
, javac Foo.java
, or go build
or some other
language-specific CLI directly. They may have a folder full of custom Bash scripts
they use during development. Or they may develop software on a team where someone
else has set up the tooling that they use. This blog post explores what build tools
are all about, why they are important to most software projects as they scale, and
how they work under the hood.
What is a Build Tool?
A build tool is a program that automates selecting which tools or tasks to run - and how to run them - whenever you want to do something in your codebase.
A hello-world program may get by with javac Foo.java
to compile your code and
java Foo
to run it, or even a single command python foo.py
. But for most real-world
projects a single "user facing" task may require a lot of work to be done first. The
job of the build tool is to help decide what work needs to be done to accomplish what
you want, and how best to perform that work with caching and parallelism to complete it
as quickly and as efficiently as possible.
Build Tools as Task Orchestrators
For example, in a real-world project, before running your code you may first need to:
-
Use the configured direct third-party dependencies to resolve the transitive dependencies
-
Download all transitive third-party library dependencies to files on disk
-
Compile internal library dependencies, i.e. upstream modules in the same codebase
-
Run code generators on IDL files (e.g. protobuf, openapi, thrift, etc.) to generate source files
-
Run the compiler (if your language has one) on source files to generate binary files
-
Package the binary files into an executable
-
Run the static asset pipeline on input files to generate static assets (e.g. assets for video games or websites)
So although the developer may have just asked to run
their code, it is only after all
these tasks are done does it make sense to begin running it! However, it’s not as
simple as running every one of the tasks listed above one after another. You also
need to decide which ones to run and how to run them:
-
You want to skip any tasks that have been run earlier whose outputs you can re-use to save time (e.g. by not generating static assets if they have already been generated)
-
You want to make sure to re-run tasks whose inputs have changed, discarding any out-of-date output (e.g. if a static input file is changed, then the static asset pipeline needs to be re-run)
-
You want to parallelize tasks which are independent of each other (e.g. running the compiler can happen in parallel with running static asset pipeline)
-
You want to not parallelize tasks which depend on one another, and thus cannot happen in parallel (e.g. running the compiler must happen after running code generators, since we need to compile the generated code)
Working on any real-world codebase will involve a large variety of tasks: resolving and downloading dependencies, autoformatting, linting, code-generation, compiling, testing, packaging and publishing, and so on. These tasks depend on each other in various, ways, and it is the job of the build tool to take what you (as a developer) want to do, and select which tasks to run and how to run them in order to accomplish what you want.
Build Tools across Languages
Although programming languages are very different, developers in all languages have similar requirements. Thus most languages have one-or-more build tools, that then integrate with the equivalent external tools and services specific to that language:
Language | Build Tool | Package Repository | Compiler | … |
---|---|---|---|---|
Java |
… |
|||
Python |
… |
|||
Javascript |
… |
|||
Ruby |
… |
|||
Rust |
… |
Generally each language’s build tools will come with integrations for the language’s other tooling by default: package repositories, compilers, linters, etc. There are also language agnostic build tools out there like Make, or multi-language build tools such as Bazel which is optimized for large monorepos that often contain multiple languages. The Mill build tool also supports multiple languages: Java, Kotlin, Scala, Python, and (experimentally) Javascript, allowing it to be used for Multi-Language Builds.
Build Tools for Small Codebases
Most projects start off small, and while they can use a build tool, do not need one. A single-file program or script can get by with using its language’s built-in CLI to compile/run/test their code, and for small codebases these workflows are fast enough that parallelization or caching are not needed.
Nevertheless, even in a small project, a simple task like "installing third party dependencies" can be surprisingly challenging. Consider this scenario working with a simple Python script:
-
You start off running
pip install something
, run your code withpython foo.py
. Great! -
The project bumped the version of
something
required, and now everyone working on the project mustpip install
it again, or else they will get weird errors from having the wrong version installed -
If you checkout an old release branch of the project, you need to remember to
pip install
to go back to the old version ofsomething
. And thenpip install
the new version when you come back to themain
branch later! -
What if you end up with two scripts,
foo.py
andbar.py
, that each requires different versions of the samesomething
? Now you need to remember to manuallypython -m venv
to create a virtual environment for each script, and remember to update each virtualenv whenever that script’s requiredsomething
version changes.
As you can see, even for the single problem of "installing third-party dependencies", keeping everything in sync and everyone’s laptops up to date can be a real headache!
Developer workflows tend to get more complex and slower over time as projects progress and codebases grow. This can end up being a significant burden on the project’s developers, and slow down the project overall:
-
Each project developer ends up spending mental energy to juggle a dozen different tools to run linters, code-generation, autoformatting, unit-testing, integration-testing, etc., making sure to run them in the right order for things to work.
-
They also end up spending time just waiting, as their growing codebase takes longer and longer to compile, and their growing test suite takes longer and longer to run
Thus, even in small codebases it often makes sense to use a build tool. The manual work necessary to get a small codebase running can be surprisingly non-trivial, and as the codebase grows (as they tend to do) the build tool becomes even more necessary over time.
Build Tools vs Build Scripts
One common alternative to build tools is to write scripts to help manage the development
of the project: e.g. you may find a dev/
folder full of scripts like run-tests.sh
or
lint.sh
, and so on. These scripts may be written in Bash, Python, Ruby, Javascript, and so on.
While this works, the approach of scripting your development
workflows directly in Bash runs into trouble as the codebase grows:
-
Initially, these scripts serve to just enumerate the necessary steps to do something in your code, e.g.
-
integration-tests.sh
may requiredownload-dependencies.sh
,codegen.sh
,compile.sh
,package.sh
to be run in order first -
unit-tests.sh
on the other hand may needdownload-dependencies.sh
,codegen.sh
,compile.sh
, but not needpackage.sh
-
-
Next, these scripts inevitably begin performing rudimentary caching: e.g. if you ran
unit-tests.sh
earlier and now want to runintegration-tests.sh
, we can skipdownload-dependencies.sh
,codegen.sh
,compile.sh
, but need to now runpackage.sh
-
Eventually, these scripts inevitably start parallelizing parts of the workflow: e.g.
codegen.sh
anddownload-dependencies.sh
may be able to run in parallel, whilecompile.sh
can only run after both of those are finished. Yourunit-tests
andintegration-tests
may also run in parallel to save time.
At this stage, your scripts have their own ad-hoc dependency-management, caching, and parallelization engine! Because your main focus is on your actual project, this ad-hoc engine will never be in great shape: the performance won’t be optimal, the error messages and usability won’t be great, and bugs and issues will tend to linger. This will be a drag on your productivity, since even if your focus is on your main project, you still need to interact with your build scripts constantly throughout the work day.
At its core, a build tool basically automates these things that you would have implemented yourself anyway: it provides the ordering of tasks, parallelism, caching, and probably does so better than you could implement in your own ad-hoc build scripts. So even though any developer should be able to wrangle Bash or Python enough to get the ordering/parallelism/caching they need, they probably shouldn’t actually do it and just use an off-the-shelf build tool which has all these problems already solved.
Build Tools as Custom Task Runtimes
Most codebases have some amount of custom tasks and workflows. While many workflows are standardized - e.g. using the same Java compiler, Python interpreter, etc. - it is almost inevitable that over time the codebase will pick up workflows unique to its place in the business and organization. For example:
-
Custom code generation, to integrate with some internal RPC system no-one else uses
-
Custom linters, to cover common mistakes that your developers tend to make
-
Custom deployment artifacts, to deploy to a new cloud platform that hasn’t become popular yet
The default way of handling this customization is the aforementioned
folder-full-of-scripts, where you have a do-custom-thing.sh
script to run
your custom logic. While this does work, it can be problematic for a number of reasons:
-
Bash scripts are not an easy programming environment to work in, so custom tasks implemented as scripts tend to be buggy and fragile. Even implementing logic like "if-else" or "for-loops" in Bash can be error-prone and easy to mess up!
-
Non-Bash scripting languages tend to have portability issues: e.g. Python scripts tend to be difficult to run reliably on different machines which may have different Python versions or dependencies installed, and Ruby scripts may have issues running on Windows.
-
You usually want caching and parallelism in your custom tasks in order to make your workflows performant, and correctly and efficiently implementing a caching/parallelization engine in Bash (or some other scripting language) can be quite a challenge!
Most build tools thus provide some kind of "plugin system" to let you implement your custom logic in a more comfortable programming environment than Bash: Maven’s MOJO interface interface lets you write plugins in Java, Webpack allows you to write Webpack Plugins in Javascript, Bazel provides the Starlark Language for writing extensions, and so on. The Mill build tool’s custom logic is written in Scala and runs on the JVM, and thus comes with typechecking, IDE support, access to the standard JVM libraries and package repositories and people are already used to.
How custom tasks and workflows are written does not matter for small projects where customizations are trivial. But in larger projects with a non-trivial amount of custom logic, it becomes very important. Providing a safe, easy-to-use way to customize your build is thus a big benefit of using a build tool, one that gets increasingly important as the project grows.
Build Tools for Large Codebases and Monorepos
Twice now we’ve mentioned that build tools get more important as projects grow, so it’s worth calling out the need for build tools on very-large-codebases: those with 100 to 1,000 to even 10,000 developers actively working on it. Perhaps 1,000,000s or 10,000,000s of lines of code. Very-large-codebases have an acute need for something to help manage the development workflows, which "monorepo" build tools like Bazel provides. Some examples of "large codebase" requirements are:
-
Selective Testing, to avoid running the entire test suite (which may take hours) by only running the tests related to a change. e.g. Bazel supports this via the third party bazel-diff package, while Mill supports this built-in as Selective Test Execution
-
Multi-language support: e.g. a Java server with a Javascript frontend with a Python ML workflows. e.g. Bazel has
rules{lang}
for a wide variety of languages (e.g. rules_java, rules_python, rules_go), and Mill has support for Multi-Language Builds -
Distributed caching and execution: allowing different CI machines or developer laptops to share compiled artifacts so a module compiled on one machine and be re-used on another, or submitting a large workflow to a cluster of machines to parallelize it more than you could on a single laptop. This has traditionally been something only Bazel supports, though over time more build tools are adopting these techniques
In large codebases, using a build tool is no longer an optional nice-to-have, but becomes table-stakes. In large codebases you need the parallelism and caching that a build tool provides, otherwise you may end up waiting hours to compile and test each small change. You need the ability to customize and extend the build logic, in some way that doesn’t become a rat’s nest of shell scripts. You need to build 3-5 different languages in order to generate a single web-service or executable. This is where tools like Bazel, Pants, or Buck really shine, although other tools like Gradle or Mill also support some of the features necessary for working with large codebases.
See the following blog post for a deeper discussion on what features a "monorepo build tool" provides and why they are necessary:
How Build Tools Work: The Build Graph
After all this talk about what a build tool is and why you would need one, it is worth exploring how most modern build tools work. At their core, most modern build tools are some kind of graph evaluation engine.
For example, consider the various tasks we mentioned earlier:
-
Use the configured direct third-party dependencies to resolve the transitive dependencies
-
Download all transitive third-party library dependencies to files on disk
-
Compile upstream internal library dependencies
-
Run code generators on IDL files to generate source files
-
Run the compiler on source files to generate binary files
-
Package the binary files to generate executable
-
Run the static asset pipeline on input files to generate static assets
We might even include a few more:
-
Run unit tests on binary files to generate a test report
-
Run integration tests on executable to generate a test report
At their core, most build steps are of the form
-
Run TOOL on INPUT1, INPUT2, … to generate OUTPUT
Which can be visualized as a node in a graph
If we consider the tasks we looked at earlier, it might form a graph as shown below, where the boxes are the tasks, non-boxed text labels are the input files, and the arrows are the dataflow between them
Ordering on the Build Graph
It is from this graph representation that most build tools are able to work
their magic. For example, if you ask to run unit_test
(blue), then the build tool
can traverse the graph edges (red) to find it needs to ensure compile
, code_gen
, and
resolve_deps
need to be run (red)
Furthermore, the build tool is able to figure out what tasks need to run in what
order, simply by doing a topological sort of the subset it needs. In this case,
it knows that resolve_deps
and code_gen
must both finish running before
compile
can begin, and compile
must finish running before unit_test
can begin.
Caching and Invalidation via the Build Graph
If you then subsequently ask to run integration_test
, the build tool can see that
compile
, code_gen
, and resolve_deps
were run earlier and can be re-used (green)
while package
and asset_pipeline
need to be run
If you then change a source file in sources
and ask to run integration_test
again, the build tool can again traverse the graph edges and see that:
-
resolve_deps
,code_gen
, andasset_pipeline
are not downstream ofsources
and can be reused -
compile
andpackage
are downstream ofsources
and need to be re-run -
unit_test
is not needed forintegration_test
, and so can be ignored
Thus, having a model of the build graph is fundamental to how most build
tools work their magic. When the user asks to run unit_test
or integration_test
,
the build tool can simply traversal build graph to automatically determine which
tasks need to run in order to do that, and which tasks have earlier output that can
be re-used.
Parallelism on the Build Graph
Apart from caching, the build graph is also useful for automatically parallelizing your build tasks.
For example, consider a case where we want to do a clean build (i.e. no caching)
of unit_test
and integration_test
. From the build graph, Mill is able to determine:
-
resolve_deps
,code_gen
, andasset_pipeline
can immediately start running in parallel (green) -
compile
,package
, andintegration_test
must run sequentially (red), only starting onceresolve_deps
andcode_gen
is complete -
unit_test
(blue) can run in parallel withasset_pipeline
,package
orintegration_test
, but only can start oncecompile
is complete
Most modern build tools do this kind of graph-based parallelism automatically. In contrast to most programming languages and application frameworks where you need to set up parallelism yourself, in a build tool you don’t need to fiddle with threads, locks, semaphores, futures, actors, and so on. You just define the shape of the build graph using whatever configuration format or language the build tool provides, and you get parallelism for free.
Languages for defining your Build Graph
Every build tool provides some format for defining the build graph data structure. There isn’t any industry-wide standard text format for graph data structures, so each build tool comes up with something on their own. Here we’ll look at how it is done in a few common build tools:
Bazel
Bazel uses the Starlark language, a dialect of Python.
Below I show an example BUILD
file from their documentation on using Bazel for C/C++.
The Python functions like cc_library
or cc_binary
are called rules and by calling
the function you create a target
with the given name
and dependencies on upstream
targets (deps
) and source files (srcs
):
cc_library(
name = "hello-greet",
srcs = ["hello-greet.cc"],
hdrs = ["hello-greet.h"],
)
cc_binary(
name = "hello-world",
srcs = ["hello-world.cc"],
deps = [":hello-greet"],
)
In Bazel, the implementation of what rules like cc_library
actually do is
done by upstream build libraries implemented in Java or Starlark. You can
also define your own custom rules in .bzl files,
or use other people’s rules from one of the
rules_{lang}
repos on Bazel-Contrib,
if you need functionality that Bazel does not come built in with
Gradle
Gradle lets you define custom tasks in either Kotlin or Groovy. Below I show an
example custom task from their documentation
written in Kotlin, that adds a new packageApp
task:
val packageApp = tasks.register<Zip>("packageApp") {
from(layout.projectDirectory.file("run.sh")) // input - run.sh file
from(tasks.jar) { // input - jar task output
into("libs")
}
from(configurations.runtimeClasspath) { // input - jar of dependencies
into("libs")
}
destinationDirectory.set(layout.buildDirectory.dir("dist")) // output - location of the zip file
archiveFileName.set("myApplication.zip") // output - name of the zip file
}
packageApp
depends on the run.sh
source file, the output of the tasks
tasks.jar
and configurations.runClasspath
, and some other miscellanious configuration.
The kotlin code is a bit idiosyncratic with the from
and into
helpers, and in this case
needs to integrate with the pre-defined Zip
class that represents this task type and
comes from upstream in Gradle. But the end result of this syntax is to define a
small snippet of the build graph as shown above, and it is this graph that ends up
being important in your build system.
Mill
Mill uses Scala for it’s build file format, and lets you write
normal method def
s to define custom tasks
in your build graph. The different methods can call each other, e.g. def resources
below can
call lineCount()
or super.resources()
, and these method calls become the edges in your build graph:
/** Total number of lines in module source files */
def lineCount = Task {
allSourceFiles().map(f => os.read.lines(f.path).size).sum
}
/** Generate resources using lineCount of sources */
override def resources = Task {
os.write(Task.dest / "line-count.txt", "" + lineCount())
super.resources() ++ Seq(PathRef(Task.dest))
}
Perhaps the most interesting thing about Mill is that you can write arbitrary code as
part of your def
s: above you can see the code for iterating over files, counting
their lines, writing out a line-count.txt
file, and so on. This is in contrast with Bazel
or Gradle, where your Starlark/Kotlin/Groovy code is really just a fancy YAML file that
configures build logic defined "elsewhere".
This arbitrary-code approach is similar to the approach taken by the venerable Make tool, but running on the JVM comes with additional advantages:
-
IDE support (e.g. in IntelliJ and VSCode): your IDE is able to provide autocomplete, error-highlighting, and navigation up and down your build graph. This is something impossible in
make
(which runs Bash scripts too dynamic for your IDE to analyze) or config-based tools like Bazel or Gradle (where the code you write just configures an execution-engine that your IDE doesn’t understand) -
The compiler can check for mistakes: this is especially important in a build tool as unlike application code, build config tends to be written by non-experts and not covered by unit or integration tests. Thus any assistance in catching bugs is very valuable, and here the compiler provides a lot of help for free.
-
You can use any JVM library. e.g. The
os.read.lines
oros.write
functions are from OS-Lib, but you can download any library you want from Java’s Maven Central repository and use it in your build with full functionality and IDE support. The JVM ecosystem is deep and broad, and you can find libraries to do almost anything imaginable available for free. Mill lets you make full use of this to customize your build to do exactly what you need.
Mill thus provides a safe, ease-to-use programming environment for working with your
build pipelines. And although you can write arbitrary code in each task, the
Task{ }
wrapper automatically provides parallelism, caching, and other things
you want in your build tool.
Conclusion
Although modern build tools may look very different on the surface, most of them are surprisingly similar once you peek under the covers. Bazel’s StarLark config, Gradle’s Groovy/Kotlin config, Mill’s Scala config, all end up boiling down to a build graph similar to the one above with only minor differences. And although the way they execute their tasks using the build graph does differ, at their most fundamental level they use the sort of graph traversals that I discuss above.
At their core, most build tools have the same goal. A build tool takes the wide variety of tasks a developer needs to do during development, and automatically run them as efficiently and quickly as possible, with caching and parallelism and whatever other optimizations it can find.
Hopefully this blog post has given you a better appreciation for how build tools do what they do, and perhaps give you some insight next time you need to debug a build tool that is doing something wrong!