The Mill Build Engineering Blog

Welcome to the Mill build engineering blog! This is the home for articles on technical topics related to JVM platform tooling and language-agnostic build tooling, some specific to the Mill build tool but mostly applicable to anyone working on build tooling for large codebases in JVM and non-JVM languages.

Understanding JVM Garbage Collector Performance

Li Haoyi, 10 January 2025

Garbage collectors are a core part of many programming languages. While they generally work well, on occasion when they go wrong they can fail in very unintuitive ways. This article will discuss the fundamental design of how garbage collectors work, and tie it to real benchmarks of how GCs perform on the Java Virtual Machine. You should come away with a deeper understanding of how the JVM garbage collector works and concrete ways you can work to improve its performance in your own real-world projects.

How JVM Executable Assembly Jars Work

Li Haoyi, 2 January 2025

One feature of the Mill JVM build tool is that the assembly jars it creates are directly executable:

> ./mill show foo.assembly # generate the assembly jar
"ref:v0:bd2c6c70:/Users/lihaoyi/test/out/foo/assembly.dest/out.jar"

> out/foo/assembly.dest/out.jar # run the assembly jar directly
Hello World

Other JVM build tools also can generate assemblies, but most need you to run them via java -jar or java -cp, or require you to use jlink or jpackage which are much more heavyweight and troublesome to set up. Mill automates that, and while not groundbreaking, it is a nice convenience that makes your JVM code built with Mill fit more nicely into command-line centric workflows common in modern software systems.

This article will discuss how Mill’s executable assemblies are implemented, so perhaps other build tools and toolchains will be able to provide the same convenience

How To Manage Flaky Tests in your CI Workflows

Li Haoyi, 1 January 2025

Many projects suffer from the problem of flaky tests: tests that pass or fail non-deterministically. These cause confusion, slow development cycles, and endless arguments between individuals and teams in an organization.

This article dives deep into working with flaky tests, from the perspective of someone who built the first flaky test management systems at Dropbox and Databricks and maintained the related build and CI workflows over the past decade. The issue of flaky tests can be surprisingly unintuitive, with many "obvious" approaches being ineffective or counterproductive. But it turns out there are right and wrong answers to many of these issues, and we will discuss both so you can better understand what managing flaky tests is all about.

Faster CI with Selective Testing

Li Haoyi, 24 December 2024

Selective testing is a key technique necessary for working with any large codebase or monorepo: picking which tests to run to validate a change or pull-request, because running every test every time is costly and slow. This blog post will explore what selective testing is all about, the different approaches you can take with selective testing, based on my experience working on developer tooling and CI for the last decade at Dropbox and Databricks. Lastly, we will discuss how the Mill build tool supports selective testing.

Why Use a Monorepo Build Tool?

Li Haoyi, 17 December 2024

Software build tools mostly fall into two categories:

  1. Single-language build tools, e.g. Maven (Java), Poetry (Python), Cargo (Rust)

  2. Monorepo build tools targeting large codebases, e.g. Bazel, Pants, Buck, and Mill

One question that comes up constantly is why do people use Monorepo build tools? Tools like Bazel are orders of magnitude more complicated and hard to use than tools like Poetry or Cargo, so why do people use them at all?

How Fast Does Java Compile?

Li Haoyi, 29 November 2024

Java compiles have the reputation for being slow, but that reputation does not match today’s reality. Nowadays the Java compiler can compile "typical" Java code at over 100,000 lines a second on a single core. That means that even a million line project should take more than 10s to compile in a single-threaded fashion, and should be even faster in the presence of parallelism