How to Compile Java into Native Binaries with Mill and Graal

Li Haoyi, 1 February 2025

One recent development is the ability to compile Java programs into self-contained native binaries. This provides more convenient single-file distributions, faster startup time, and lower memory footprint, at a cost of slower creation time and limitations around reflection and dynamic classloading. This article explores how you can get started building your Java program into a native binary, using the Mill build tool and the Graal native-image compiler, and how to think about the benefits and challenges of doing so.

An Example Java Program

To get started building a Java native binary, we will use the following example program:

foo/src/foo/Foo.java

package foo;

import net.sourceforge.argparse4j.ArgumentParsers;
import net.sourceforge.argparse4j.inf.ArgumentParser;
import net.sourceforge.argparse4j.inf.Namespace;
import org.thymeleaf.TemplateEngine;
import org.thymeleaf.context.Context;

public class Foo {
  public static String generateHtml(String text) {
    Context context = new Context();
    context.setVariable("text", text);
    return new TemplateEngine().process("<h1 th:text=\"${text}\"></h1>", context);
  }

  public static void main(String[] args) {
    ArgumentParser parser = ArgumentParsers.newFor("template")
        .build()
        .defaultHelp(true)
        .description("Inserts text into a HTML template");

    parser.addArgument("-t", "--text").required(true).help("text to insert");

    Namespace ns = null;
    try {
      ns = parser.parseArgs(args);
    } catch (Exception e) {
      System.out.println(e.getMessage());
      System.exit(1);
    }

    System.out.println(generateHtml(ns.getString("text")));
  }
}

Foo.java is a simple Java program with two dependencies - ArgParse4J and Thymeleaf - that takes input via CLI flags and generates an HTML snippet it prints to stdout. While a bit contrived, this is intended to be a simple program using common third-party dependencies.

To build Foo.java using Mill, we can use the following build configuration:

.mill-version

0.12.6

build.mill

package build
import mill._, javalib._
import mill.api.ModuleRef

object foo extends JavaModule {
  def mvnDeps = Seq(
    mvn"net.sourceforge.argparse4j:argparse4j:0.9.0",
    mvn"org.thymeleaf:thymeleaf:3.1.1.RELEASE",
    mvn"org.slf4j:slf4j-nop:2.0.7"
  )
}

Now, using a ./mill bootstrap script, we can run this program as follows:

$ ./mill foo.run --text "hello-world"
[55/55] foo.run
[55] <h1>hello-world</h1>

Or turn it into an Executable Assembly Jar that can be run outside of the build tool:

$ ./mill show foo.assembly
".../out/foo/assembly.dest/out.jar"

$ out/foo/assembly.dest/out.jar --text "hello world"
<h1>hello world</h1>

Building a Native Image Binary

To use Mill to build a Graal Native Image out of Foo.java, we need to tweak the config above:

build.mill

package build
import mill._, javalib._
import mill.api.ModuleRef

object foo extends JavaModule with NativeImageModule {
  def mvnDeps = Seq(
    mvn"net.sourceforge.argparse4j:argparse4j:0.9.0",
    mvn"org.thymeleaf:thymeleaf:3.1.1.RELEASE",
    mvn"org.slf4j:slf4j-nop:2.0.7"
  )

  def jvmWorker = ModuleRef(JvmWorkerGraalvm)

  def nativeImageOptions = Seq(
    "--no-fallback",
    "-H:IncludeResourceBundles=net.sourceforge.argparse4j.internal.ArgumentParserImpl"
  )
}

object JvmWorkerGraalvm extends JvmWorkerModule {
  def jvmId = "graalvm-community:23.0.1"
}

Notable changes:

foo now needs to inherit from NativeImageModule
We need to override jvmWorker to point at our own custom JvmWorkerGraalvm, using the version of Graal that we want to use to build our native image. This uses Mill’s ability to Configuring JVM Versions to download the necessary Graal distribution as necessary
We need to pass in some nativeImageOptions: in this case --no-fallback and -H:IncludeResourceBundles, the latter of which is necessary to support the ArgParse4J library that the example project depends on

Now, we can use build a native image using foo.nativeImage:

$ ./mill show foo.nativeImage
".../out/foo/nativeImage.dest/native-executable"

$ out/foo/nativeImage.dest/native-executable --text "hello world"
<h1>hello world</h1>

You can download this example below:

Building Native Image Binaries with Graal VM

You can also access the native-image tool directly from the Mill download folder, if you want to use it directly or view its --help documentation:

$ ~/Github/mill/mill show foo.nativeImageTool
".../graalvm-community-openjdk-17.0.9+9.1/Contents/Home/bin/native-image"

$ .../graalvm-community-openjdk-17.0.9+9.1/Contents/Home/bin/native-image --help
GraalVM Native Image (https://www.graalvm.org/native-image/)

This tool can ahead-of-time compile Java code to native executables.

Usage: native-image [options] class [imagename] [options]
           (to build an image for a class)
   or  native-image [options] -jar jarfile [imagename] [options]
           (to build an image for a jar file)
   or  native-image [options] -m <module>[/<mainclass>] [options]
       native-image [options] --module <module>[/<mainclass>] [options]
           (to build an image for a module)

where options include:

    @argument files       one or more argument files containing options
    -cp <class search path of directories and zip/jar files>\
...

Native Image v.s. Executable Assembly

At a glance, the difference between the traditional executable assembly and the Graal native image we built above can be summarized below:

Executable Assembly

Native Image

Creation Time

0.8s

24.7s

Executable Size

2.5mb

17mb

Startup Time

235ms

62ms

Steady State Performance

190 iter/s

180 iter/s

Memory Footprint

373mb

20mb

JVM required to run

Yes

OS/CPU-Specific executable

Yes

The remainder of this section will dive into the details of how each number was measured, and a discussion of what these changes really mean.

Creation Time

JVM Executable assemblies are generally very cheap to create, whereas Graal native image executables can take very long. For this tiny example project, we can see below that the executable assembly takes about ~1s to create, while the native image takes ~25s:

Executable Assembly

$ time ./mill show foo.assembly
[1-41] [info] compiling 1 Java source...
".../out/foo/assembly.dest/out.jar"
./mill show foo.assembly  0.12s user 0.06s system 21% cpu 0.818 total

Native Image

$ time ./mill show foo.nativeImage
[1-50] GraalVM Native Image: Generating 'native-executable' (executable)...
...
[1-50] [2/8] Performing analysis...  [****]                                                                     (7.9s @ 0.77GB)
...
[1-50] Finished generating 'native-executable' in 26.0s.
".../out/foo/nativeImage.dest/native-executable"
./mill show foo.nativeImage  0.70s user 1.11s system 7% cpu 24.762 total

Executable Size

Graal native image binaries are typically larger than the equivalent executable assembly:

$ ls -lh out/foo/assembly.dest/out.jar
-rwxr-xr-x  1 lihaoyi  staff   2.5M Jan 16 15:33 out/foo/assembly.dest/out.jar

$ ls -lh out/foo/nativeImage.dest/native-executable
-rwxr-xr-x  1 lihaoyi  staff    17M Jan 16 15:34 out/foo/nativeImage.dest/native-executable

Here, the assembly out.jar is ~2.5mb, while the native native-executable is ~17mb, even for a tiny hello-world application using some trivial libraries. In general native image binaries can be pretty large, which can have consequences in download sizes or deployment times as you try to distribute these binaries to servers or users.

Startup Time

Executable assembly jars typically take longer than Graal native executables to run. For this small example project, we can see the Executable assembly takes about ~235ms to run, while the native image takes ~60ms

Executable Assembly

$ time ./out/foo/assembly.dest/out.jar --text hello-world
<h1>hello-world</h1>
./out/foo/assembly.dest/out.jar --text hello-world
0.35s user 0.04s system 165% cpu 0.235 total

Native Image

$ time ./out/foo/nativeImage.dest/native-executable --text hello-world
<h1>hello-world</h1>
./out/foo/nativeImage.dest/native-executable --text hello-world
0.04s user 0.01s system 87% cpu 0.062 total

The ~175ms speedup shown is for a tiny example program, and can be expected to grow for larger Java applications which normally can take multiple seconds to start up. Nevertheless, whether this speedup is significant depends on the use case: for long-lived webservers saving a few seconds on startup may not matter, but for short-lived command line tools this startup overhead may dominate the actual work the program is trying to do, and saving 100s to 1000s of milliseconds with a native binary can be worthwhile. The Mill build tool itself is distributed as native binaries: this saves Mill ~100-200ms every time it is run from the command line, which goes a long way to ensuring it feels snappy and responsive to users.

Steady-State Performance

To do a rough benchmark of the steady-state performance of the executable assembly and native executable, we can adjust our Java program to run the same logic in a loop, and every ~1s print out how many iterations of the loop have occurred:

   public static void main(String[] args) {
+    long count = 0;
+    long prevTime = System.currentTimeMillis();
+    String global = null;
+    while(count >= 0){
       ArgumentParser parser = ArgumentParsers.newFor("template")
           .build()
           .defaultHelp(true)
@@ -28,7 +32,15 @@ public class Foo {
         System.out.println(e.getMessage());
         System.exit(1);
       }
+      global = generateHtml(ns.getString("text"));
+      if (System.currentTimeMillis() - prevTime > 1000){
+        prevTime = System.currentTimeMillis();
+        System.out.println(count);
+        count = 0;
+      }
+      count++;
+    }

-    System.out.println(generateHtml(ns.getString("text")));
+    System.out.println(global);
   }
 }

Now, if we re-build our assembly and native image and run it, we can see the number of iterations per second they are able to achieve below:

Executable Assembly

$ ./out/foo/assembly.dest/out.jar --text hello-world
135
170
178
188
191
192
192
189
190
188
195
185
182

Native Image

$ time ./out/foo/nativeImage.dest/native-executable --text hello-world
171
163
180
173
182
182
181
184
181
181
182
183
181

As you can see, the executable assembly and native image both have comparable performance, although the executable assembly starts off lower (135 vs 171) for the first iteration due to JVM warmup time, but eventually reaches a higher steady state than the native image (~190 vs ~180).

While again this is for a toy program, for larger applications the same pattern applies: Graal native binaries avoid the slow startup that JVM applications often exhibit, but in exchange may not quite reach the same peak steady-state performance that a long-lived JVM application would typically achieve.

Memory Usage

While our programs are looping, we can also see how much memory they take via top:

Executable Assembly

$ jps
58547 MillMain
86276 MillDaemonMain
24895 Jps
9263 Foo
1071 Main

$ top | grep 9263
9263   java             0.0  00:20.41 32/1   1   134    373M  0B    0B    9263  42892 running  *0[1]       0.00000 0.00000    501 93089     9569   5005      2470      387381     104652     75938      9       0        0.0   0      0      lihaoyi            N/A    N/A   N/A   N/A   N/A   N/A

Native Image

$ ps aux | grep native-executable
lihaoyi          43880  46.1  0.1 408681792  30176 s000  S+    3:40PM   0:05.84 ./out/foo/nativeImage.dest/native-executable --text hello-world
lihaoyi          86276   0.0  2.1 420349904 720416 s000  S     3:14PM   1:00.88 /Library/Java/JavaVirtualMachines/amazon-corretto-17.jdk/Contents/Home/bin/java -cp /Users/lihaoyi/.cache/mill/download/0.12.5-68-e4bf78 mill.runner.MillDaemonMain /Users/lihaoyi/Github/mill/blog/modules/ROOT/attachments/7-graal-native-executables/out/mill-daemon/aa508f0984fd2811f6c6d8fae1362f1774e4f5f7-1
lihaoyi          48496   0.0  0.0 408626896   1376 s002  S+    3:40PM   0:00.00 grep native-executable

$ top | grep 43880
43880  native-executabl 0.0  00:10.19 3/1    0   26     20M   0B    0B    43880 42892 running  *0[1]       0.00000 0.00000    501 695907    44380  8100      4045      153233     8177       24637      313     0        0.0   0      0      lihaoyi            N/A    N/A   N/A   N/A   N/A   N/A

The column 373M and 20M are the respective memory footprints of the executable assembly and native image binary. In this small program, the native image uses almost 20x less memory than the JVM executable assembly! That is a very significant reduction in resource footprint

Portability and Hermeticity

Executable assembly jars require a JVM installed globally in order to run. In a way they are not hermetic, since the globally-installed JVM can differ resulting in the assembly behaving differently at runtime. However, it does mean that the executable assembly is typically portable across different operating systems and CPU architectures: as long as there is a JVM installed, the executable assembly can be run.

Native images are the opposite: they do not depend on a globally installed JVM, and thus can be run even in environments where pre-installing a JVM is inconvenient. On the other hand, the fact that the native executable is OS/CPU-specific means that you need to specifically generate separate native executables for each platform you want to support.

The Mill build tool takes advantage of this hermeticity for easier installation: it’s Mill Native Executable can be run on systems without a JVM installed at all. Mill still needs a JVM later on, e.g. to compile and run user code, and so the native launcher downloads one on-demand automatically from the Coursier JVM Index. But bootstrapping with a native launcher means there’s one less thing for people to do during setup and installation, and one less thing to go wrong and cause the user to get stuck.

Native Image Limitations

Now that we’ve seen many iof the benefits of Graal native images binaries over traditional executable assemblies, it’s worth discussing the limitations:

No Cross Building

Graal can only create native binaries targeting the system on which it is running. That means that if you want to create binaries for {Linux,Windows,Mac}x{Intel,ARM}, you need 6 different machines in order to build the 6 binaries and somehow aggregate them together for publication or deployment. This is not a blocker, but can definitely be inconvenient v.s. some other toolchains which allow you to build native binaries for all targets on a single machine.

No Windows-ARM support

Graal does not support Windows-Arm64 yet (https://github.com/oracle/graal/issues/9215). While that traditionally would not have been a problem, Windows-ARM is getting more popular over time, with new laptops like my new flagship Surface Laptop 7 running on an ARM processor. You simply cannot build Java code into Graal native image binaries that work on Windows-Arm64 at this time, and thus have to fall back to traditional executable assemblies

Creation Performance

Graal native image binaries are much slower to create than executable assemblies, as we saw above: the example program took ~1s to compile into an executable assembly, but ~25s to compile into a native image! That means you probably do not want to do day-to-day iterative development on native images: instead you may want to iterate using traditional JVM assemblies, and only build native images for integration testing and deployment.

Reflection and Dynamic Classloading

Graal native image binaries do not work with Java reflection and dynamic classloading by default, unless specifically configured. Almost every Java program, library, and framework uses some degree of reflection and dynamic classloading, and so you do have to spend the effort to configure Graal appropriately. We saw a glimpse of that above in the -H:IncludeResourceBundles flag we needed to pass to make ArgParse4j work in our toy example, and this will need to be done dozens more times for any real-world application making heavy use of real-world Java frameworks and libraries.

A full discussion of how to handle reflection and dynamic classloading when building Graal native images is beyond the scope of this article, but depending on what framework you may be using there may be existing support.

Frameworks like Micronaut or Quarkus are designed from scratch to minimize reflection to allow native image generation
Older frameworks like Spring Boot have also introduced support, making it easy to configure Graal to handle the pattern of reflection and classloading that the framework performs

Cross-Publishing Graal Native Binaries on Github Actions

Although Graal doesn’t let you cross-build from a single platform, you can still easily publish artifacts for all supported versions by taking advantage of CI systems like Github Actions that provide worker machines on different platforms.

For Mill, which is distributed as native binaries, we maintain a matrix of Github actions jobs running on Mac, Windows, and Linux to create these binaries and upload them to Maven Central for users.

on:
  push:
    tags:
      - '**'
  workflow_dispatch:

jobs:
  publish-sonatype:
    # when in master repo, publish all tags and manual runs on main
    if: github.repository == 'com-lihaoyi/mill'
    runs-on: ${{ matrix.os }}

    # only run one publish job for the same sha at the same time
    # e.g. when a main-branch push is also tagged
    concurrency: publish-sonatype-${{ matrix.os }}-${{ github.sha }}
    strategy:
      matrix:
        include:
        - os: ubuntu-latest
          coursierarchive: ""
          publishartifacts: __.publishArtifacts

        - os: ubuntu-24.04-arm
          coursierarchive: ""
          publishartifacts: dist.native.publishArtifacts

        - os: macos-13
          coursierarchive: ""
          publishartifacts: dist.native.publishArtifacts

        - os: macos-latest
          coursierarchive: ""
          publishartifacts: dist.native.publishArtifacts

        - os: windows-latest
          coursierarchive: C:/coursier-arc
          publishartifacts: dist.native.publishArtifacts

        # No windows-arm support becaues Graal native image doesn't support it
        # https://github.com/oracle/graal/issues/9215
    env:
      MILL_STABLE_VERSION: 1
      MILL_SONATYPE_USERNAME: ${{ secrets.SONATYPE_USERNAME }}
      MILL_SONATYPE_PASSWORD: ${{ secrets.SONATYPE_PASSWORD }}
      MILL_PGP_SECRET_BASE64: ${{ secrets.SONATYPE_PGP_PRIVATE_KEY }}
      MILL_PGP_PASSPHRASE: ${{ secrets.SONATYPE_PGP_PRIVATE_KEY_PASSWORD }}
      LANG: "en_US.UTF-8"
      LC_MESSAGES: "en_US.UTF-8"
      LC_ALL: "en_US.UTF-8"
      COURSIER_ARCHIVE_CACHE: ${{ matrix.coursierarchive }}
      REPO_ACCESS_TOKEN: ${{ secrets.REPO_ACCESS_TOKEN }}
    steps:
      - uses: actions/setup-java@v4
        with:
          distribution: 'temurin'
          java-version: '11'

      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }

      - run: ./mill -i mill.scalalib.PublishModule/ --publishArtifacts ${{ matrix.publishartifacts }}

Note that the default ubuntu-latest job publishes __.publishArtifacts (all artifacts), while the other platform-specific jobs publish only dist.native.publishArtifacts (the native artifacts in the dist.native folder). This ensures that the non-native jars which are portable get published only once across all platforms, while the native CPU-specific binary gets published once per platform

Each job overrides artifactName based on os.name and os.arch such that it publishes to a different artifact on Maven Central, and we override def jar to replace the default .jar artifact with our native image:

def artifactOsSuffix = Task {
  val osName = System.getProperty("os.name").toLowerCase
  if (osName.contains("mac")) "mac"
  else if (osName.contains("windows")) "windows"
  else "linux"
}

def artifactCpuSuffix = Task {
  System.getProperty("os.arch") match {
    case "x86_64" => "amd64"
    case s => s
  }
}

override def artifactName = s"${super.artifactName()}-${artifactOsSuffix()}-${artifactCpuSuffix()}"

override def jar = nativeImage()

This results in the following artifacts being published:

# JVM platform-agnostic artifact
com.lihaoyi:mill-dist:0.12.6
# native platform-specific artifacts
com.lihaoyi:mill-dist-native-mac-amd64:0.12.6
com.lihaoyi:mill-dist-native-mac-aarch64:0.12.6
com.lihaoyi:mill-dist-native-linux-amd64:0.12.6
com.lihaoyi:mill-dist-native-linux-aarch64:0.12.6
com.lihaoyi:mill-dist-native-windows-amd64:0.12.6

These artifacts can be seen online:

Central Sonatype Search

And downloaded via

> curl https://repo1.maven.org/maven2/com/lihaoyi/mill-dist-native-mac-aarch64/0.12.6/mill-dist-native-mac-aarch64-0.12.6.jar -o mill-dist-native
> chmod +x mill-dist-native
> ./mill-dist-native version
0.12.6

Any application using these binaries can similarly look at the OS/CPU they are running on and resolve the appropriate executable for them to use.

Conclusion

Graal native images are a pretty cool technology that give Java developers a new superpower: the ability to package your Java program into a native binary that can be run without needing a JVM installed, starts much more quickly, and uses much less memory. There are some caveats around creation times, binary sizes, and runtime reflection, so they may not be suitable for all scenarios. But they are a useful tool in the toolbox that helps bridge the gap between the "Java" world and the world of native command-line tools on Linux, Mac, or Windows.