Why does Mill use Scala?

One question that comes up a lot among Mill users is why use Scala as the language to configure your build? Why not YAML, XML, TOML, Bash, Groovy, Python, Java, or any of the other hundred programming and configuration languages in widespread use today? Scala is definitely a niche language, but it also has some unique properties that make it especially suitable to be used for configuring the build system of a small or large project.

For the purposes of this page, we will break down this topic into three top-level questions: why Mill uses a general-purpose programming language, why Mill uses the Scala Language, and why Mill wants to run on the Java Virtual Machine

Why a General Purpose Language?

While Mill uses a general-purpose programming language (Scala), many build tools use restricted config languages instead, or their own custom tool-specific languages. Why that is the case is an interesting discussion.

Why Not Config Languages?

Many build tools use restricted config languages rather than a general-purpose language:

Language

Tool

XML

Maven, MSBuild, Ant

TOML

pyproject.toml, Cargo, Poetry

JSON

NPM

YAML

Bleep

At a first glance, using a restricted language is tempting: restricted languages are simpler than general purpose languages, with less complexity. However, build systems are often fundamentally complex systems, especially as codebases or organizations grow. Often projects find themselves with custom build system requirements that do not fit nicely into these "simple metadata formats":

While most "common" workflows in "common" build tools will have some built in support or plugin, it may not do exactly what you need, the plugin may be unmaintained, or you may hit some requirement unique to your business. When using a restricted config language to configure your build, and you hit one of these unusual requirements, there are usually four outcomes:

  1. Find some third-party "plugin" (and these build systems always have plugins!) written in a general-purpose language to use in your XML/TOML/JSON file, or write your own plugin if none of the off-the-shelve plugins exactly match your requirements (which is very likely!)

  2. Have your build tool delegate the logic to some "run bash script" step, and implement your custom logic in the bash script that the build tool wraps

  3. Take the build tool and wrap it in a bash script, that implements the custom logic and configures the build tool dynamically on the fly by generating XML/TOML/JSON files or passing environment variables or properties

  4. Extend your XML/TOML/JSON config language with general-purpose programming language features: conditionals, variables, functions, etc. (see Github Config Expressions, AWS Cloudformation Functions, Helm Chart Templating)

Anyone who has been in industry for some time has likely seen all of these outcomes at various companies and in various codebases. The fundamental issue is that build systems are inherently complex, and if your built tool language does not accommodate that complexity, the complexity ends up leaking elsewhere outside the purview of the build tool.

Mill’s choice of using a general purpose language does mean more complexity in the core language or build tool, but it also allows you to move complexity into the build tool rather than having it live in third-party plugins, ad-hoc bash scripts, or weird pseudo-languages embedded in your config. A general-purpose language likely has better IDE integration, tooling, safety, performance, and ecosystem support than bash scripts or embedded-pseudo-config-languages, and if your build system complexity has to live somewhere, it’s better to write it in a general-purpose language where it can be properly managed, as long as a suitable general-purpose language can be found.

Why Not A Custom Language?

Many build tools do use custom languages:

The basic issue with custom languages is that although they may in theory match your build tool requirements perfectly, they lack in all other areas that a widely-used general-purpose language does well:

  • IDE support in all popular IDEs and editors

  • Library ecosystem of publicly available helpers and plugins

  • Package publishing and distribution infrastructure

  • Tooling: debuggers, profilers

  • Quality of the language itself: things like type systems, basic syntax, standard library, error messages, etc.

A custom tool-specific language implemented in a few person-months will definitely be much less polished in all these areas that a widely-used general-purpose language that has been gradually improved upon for a decade or two. While in theory a custom language could catch up with enough staffing to implement all these features, in practice even projects like Make that are used for decades fall behind niche general-purpose languages when it comes to the supporting ecosystem above. As an example, how do you publish a re-usable makefile that others can depend on and adjust to fit their requirements? And how is the IDE experience of navigating around large Makefiles?

Using a general-purpose language to configure a build tool provides all these things out of the box. Provided, again, that a suitable language can be found!

Why the Scala Language?

Conciseness

A build language has to be concise; although Java and C are popular and widely used, you rarely see people writing their build logic in Java or C (with some exceptions), and even XML is pretty rare these days (with Maven being the notable exception). Programming and Configuration language verbosity is a spectrum, and the languages used to configure the build are typically in the less-verbose end of the spectrum:

Language

Tool

Python (or StarLark)

Bazel, Pants, Buck, Scons

Groovy

Gradle

Ruby

Rake

Javascript

Grunt, Gulp, etc.

While some tools go even more concise (e.g. Bash, Make, etc.), typically this Python/Groovy/Ruby/TOML/YAML level of conciseness is where most build tools end up.

Given that, Scala fits right in: it is neither too verbose (like Java/C++/XML), nor is it as terse as the syntaxes of Bash or Make. Mill’s bundled libraries like Requests-Scala or OS-Lib, they would not look out of place in any Python or Ruby codebase. Scala’s balance between conciseness and verbosity is more or less what we want when configuring a build.

Static Typing

Scala is a statically typed language. That has many consequences: good performance, powerful linting frameworks (e.g. Scalafix), good toolability, and protection against many classes of "dumb bugs".

For a build tool like Mill, perhaps what matters most is: toolability and protection against dumb bugs.

Most developers using a build tool are not build tool experts, and have no desire to become build tool experts. They will forever be cargo-culting examples they find online, copy-pasting from other parts of the codebase, or blindly fumbling their customizations. It is in this context that Mill’s static typing really shines: what such "perpetual beginners" need most is help understanding/navigating the build logic, and help checking their proposed changes for dumb mistakes. And there will be dumb mistakes, because most people are not and will never be build-tool experts or enthusiasts

To that end, Mill’s static typing gives it a big advantage here. It’s IDE support is much better compared to Maven or compared to Gradle, and that is largely due to the way Scala’s static types give the IDE more to work with than more dynamic languages. And, while Scala’s static typing won’t catch every subtle bug, it does do a good job at catching the dumb bugs that non-experts will typically make when configuring their build system

Almost every programming language these days is statically typed to some degree, Python has MyPy, Ruby has Sorbet, Javascript has TypeScript, and so on. But Scala has static typing built deeply into the core of the language, and so it works more smoothly than other languages which have static typing bolted on after-the-fact: The syntax is slicker, the IDEs work better, the error reporting is friendlier. And that’s why Scala’s static typing really shines when used in Mill builds even for non-experts with no prior background in Scala.

Functional and Object-Oriented Features

Scala is perhaps the language that sits most squarely on the fence between functional and object-oriented programming:

  • It provides functional features from basic first-class functions immutability, all the way to more advanced techniques like Typeclasses

  • It also provides object oriented features, again from basic classes and overrides to more advanced mixin trait composition and implicit conversions

Mill makes heavy use of both the functional and object-oriented features of the Scala language. As discussed in the section on Mill Design Principles, Mill models the build graph using the functional call graph of your methods, while Mill models the module hierarchy using the object graph of your modules. And this is not just a superficial resemblance, but the semantics deeply match what you would expect in a hybrid functional/object-oriented program: Mill supports instantiating modules, subclasses, inheritance via extends, override, super, and so on.

While these are non-trivial semantics, they are semantics that should be immediately familiar to anyone who has ever passed programming 101 in college. You already know how override works or how super works in Mill, even if nobody told you! This approach of "making your build code feel just like your application code" is the key to Mill’s approachability to people from varying backgrounds, and to allow the "perpetual non-experts" typically modifying a build system to do so in a familiar and intuitive manner even if they know nothing about the Scala language.

Why the JVM Runtime?

Dynamic Classloading

One often-under-appreciated facet of the Java Virtual Machine is its ability to do dynamic classloading. This functionality is largely irrelevant in the backend-service space that Java is often used in (where the entire codebase is present during deployment), and has largely failed as a mechanism for running un-trusted potentially-malicious code in a safe sandbox (see Applets).

However, in the case of a build system, the need is different: you need to dynamically build, load, and run a wide variety of mostly-trusted code. Most build systems do not provide any hard security boundaries, and assume the code you get from your source control system is not malicious. But build systems need to be pluggable, with the same build system potentially being used to manage a wide variety of different tools and frameworks.

It is in this context that the JVM’s dynamic classloading shines, and Mill goes all in dynamic classloading. Features like import $ivy, Running Dynamic JVM Code, or the Mill Meta-Build would be difficult-to-impossible in less-dynamic platforms like Go, Swift, Rust, or C++. Mill simultaneously takes advantage of the Scala language’s Static Typing, while also leaning heavily on the JVM’s dynamic nature: it uses classloader hierarchies, dynamic class loading and unloading, isolated and partially-isolated classloaders, bytecode instrumentation, the whole works. It wouldn’t be a stretch to say that a build tool like Mill could not be written on top of any other platform than the JVM it runs on today.

Huge JVM Tooling Ecosystem

The JVM ecosystem is huge, not just for the Java language but also things like Kotlin, Scala, Android, and so on. IDEs, debuggers, profilers, heap analyzers, if a software tool exists you can bet there is an equivalent or integration with the JVM ecosystem.

From the perspective of IDE support, Mill is able to get (almost) full support for understanding and navigating its build.mill files, basically for free: IntelliJ already has deep support for understanding JVM code, classfiles, classpaths, the Scala language itself, and so on. VSCode also works pretty well out-of-the-box with minimal modifications.

Apart from the IDE, the Java ecosystem has perhaps some of the best tooling available of any programming ecosystem, both free and proprietary, and Mill makes heavy use of it. If a build is stuck, you can use jstack to see what it is doing. If a build is slow or running out of memory, you can hook it up to JProfiler or Yourkit to see what is taking up space.

Lastly there is the wealth of libraries: if something has a programming language integration, there probably is one for Java, and Mill can make use of any Java libraries seamlessly as part of the build using import $ivy or dynamic classloading. With Mill, the ability to directly import any JVM artifact on the planet without needing a purpose-built plugin open ups an enormous about of possibilities: anything that can be done in the Java ecosystem can be done as part of your Mill build with a single import $ivy.

Built-in Publishing Infrastructure

The last major benefit Mill gets from running on the JVM is the publishing infrastructure: primarily Sonatype’s Maven Central. Mill has a rich and constantly growing set of Third-Party Plugins that are published on Maven Central for people to use, and anyone can easily write and publish their own. While Maven Central isn’t perfect, it does a solid job as a package repository: hosting an enormous catalog of artifacts for the Java community to build upon, with nice properties such as namespacing, discoverability, immutability, and code signing. Apart from Maven Central itself, there is a wealth of other hosted or self-hosted JVM package repositories available for you to choose.

Mill makes heavy use of Maven Central and the rest of the Java publishing infrastructure: Mill’s own artifacts are all published on Maven Central, Mill builds can resolve any artifact from Maven Central to use in your build, and anyone can publish their own plugins to Maven Central for free. it is easy to configure alternate repositories, and Mill provides a wealth of tools and techniques for working with JVM dependencies.

Most build tools end up with some half-baked plugin distribution model: downloading source code off of Github, ad-hoc package formats or zip files, published artifacts that can be sneakily changed or even deleted after the fact, and so on. Mill instead relies on the widely-used publishing and distribution system that every JVM project already uses, providing a predictable and well-designed publishing and artifact distribution experience far beyond what can be provided by most other build tools.