Running Dynamic JVM Code

While import $ivy is convenient, it comes with limitations as the JVM library it imports is global to your build:

  1. The library has to be resolved and downloaded before any part of your build starts. If your codebase is large and most parts of your build don’t use that library, needing to download the library when working on parts that don’t need it can be wasteful

  2. The library can only have one version across the entire build. This can be an issue if you need to have multiple versions of the library used in different parts of your build. e.g. different parts of a large Groovy codebase may use different versions of the Groovy interpreter, and so the Groovy interpreter cannot be included via import $ivy because the different versions would collide.

  3. The library cannot be built as part of your main build. While it is possible to build it as part of your Meta-Build, that comes with additional complexity and limitations. In a large codebase, you often end up building modules that are shared between production deployments as well as local tooling: in such cases import $ivy is not a good fit

In scenarios where these limitations cause issues, Mill provides other ways to run arbitrary JVM code apart from import $ivy.

Subprocesses

This example demonstrates how to resolve a third-party library from Maven Central, but instead of using import $ivy (which loads the library as part of the main build) we use:

  • defaultResolver().resolveDeps to resolve the dependencies from Maven Central, converting org:name:version coordinates (and their transitive dependencies) to `PathRef`s referring to files on disk

  • Jvm.runSubprocess, which runs the given classpath files in a subprocess, starting from specified mainClass

While OS-Lib's os.call and os.spawn APIs can be used to create any processes, JVM subprocesses are common enough have enough idiosyncracies (e.g. classpaths) that Mill provides helper methods specifically for them.

build.mill (download, browse)
package build
import mill._, javalib._
import mill.util.Jvm

object foo extends JavaModule {
  def groovyClasspath: Task[Agg[PathRef]] = Task {
    defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
  }

  def groovyScript = Task.Source(millSourcePath / "generate.groovy")

  def groovyGeneratedResources = Task {
    Jvm.runSubprocess(
      mainClass = "groovy.ui.GroovyMain",
      classPath = groovyClasspath().map(_.path),
      mainArgs = Seq(
        groovyScript().path.toString,
        "Groovy!",
        (Task.dest / "groovy-generated.html").toString
      ),
      workingDir = Task.dest
    )
    PathRef(Task.dest)
  }

  def resources = super.resources() ++ Seq(groovyGeneratedResources())
}

For this example, we use the Groovy interpreter as our example third-party library. While often used as a groovy CLI command, Groovy is also available on Maven Central at the org.codehaus.groovy:groovy:3.0.9 coordinates. This lets us pull it into our build as a classpath comprising PathRefs to files on disk, and then run the Groovy JVM main method (in the class groovy.ui.GroovyMain) passing it our script file generate.groovy (wired into our build using an Source Task groovyScript) and arguments used to configure the generated file and tell the script where to write it to. generate.groovy generates a file on disk that we then wire into def resources, which is read at runtime by foo.run and printed to the terminal output as shown below:

> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Groovy!</p></body></html>

As mentioned above, defaultResolver().resolveDeps and Jvm.runSubprocess are an alternative to import $ivy, providing you more flexibility to resolve dependencies on-demand as part of your task graph only when necessary, and keeping it isolated from the build in a subprocess preventing classpath collisions.

In-process Isolated Classloaders

This example is similar to the earlier example running the Groovy interpreter in a subprocess, but:

  • We use Jvm.inprocess to load the Groovy interpreter classpath files into an in-memory in-process classloader,

  • loadClass/getMethod/invoke to call methods on those classes using Java reflection

build.mill (download, browse)
package build
import mill._, javalib._
import mill.util.Jvm

object foo extends JavaModule {
  def groovyClasspath: Task[Agg[PathRef]] = Task {
    defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
  }

  def groovyScript = Task.Source(millSourcePath / "generate.groovy")

  def groovyGeneratedResources = Task {
    Jvm.runClassloader(classPath = groovyClasspath().map(_.path)) { classLoader =>
      classLoader
        .loadClass("groovy.ui.GroovyMain")
        .getMethod("main", classOf[Array[String]])
        .invoke(
          null,
          Array[String](
            groovyScript().path.toString,
            "Groovy!",
            (Task.dest / "groovy-generated.html").toString
          )
        )
    }

    PathRef(Task.dest)
  }

  def resources = super.resources() ++ Seq(groovyGeneratedResources())
}

Note that unlike Jvm.runSubprocess, Jvm.runClassloader does not take a workingDir on mainArgs: it instead provides you an in-memory classLoader that contains the classpath you gave it. From there, you can use .loadClass and .getMethod to fish out the classes and methods you want, and .invoke to call them.

> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Groovy!</p></body></html>

Jvm.runClassloader has significantly less overhead than Jvm.runSubprocess: both in terms of wall-clock time and in terms of memory footprint. However, it does have somewhat less isolation, as the code is running inside your JVM and cannot be configured to have a separate working directory, environment variables, and other process-global configs. Which one is better to use differs on a case-by-case basis.

Classloader Worker Tasks

Althought running JVM bytecode via a one-off isolated classloader has less overhead than running it in a subprocess, the fact that the classloader needs to be created each time adds overhead: newly-created classloaders contain code that is not yet optimized by the JVM. When performance matters, you can put the classloader in a Task.Worker to keep it around, allowing the code internally to be optimized and stay optimized without being thrown away each time

This example is similar to the earlier example running the Groovy interpreter in a subprocess, but instead of using Jvm.runSubprocess we use ClassLoader.create to load the Groovy interpreter classpath files:

build.mill (download, browse)
package build
import mill._, javalib._
import mill.util.Jvm

object coursierModule extends CoursierModule

def groovyClasspath: Task[Agg[PathRef]] = Task {
  coursierModule.defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
}

def groovyWorker: Worker[java.net.URLClassLoader] = Task.Worker {
  Jvm.spawnClassloader(groovyClasspath().map(_.path).toSeq)
}

trait GroovyGenerateJavaModule extends JavaModule {
  def groovyScript = Task.Source(millSourcePath / "generate.groovy")

  def groovyGeneratedResources = Task {
    val oldCl = Thread.currentThread().getContextClassLoader
    Thread.currentThread().setContextClassLoader(groovyWorker())
    try {
      groovyWorker()
        .loadClass("groovy.ui.GroovyMain")
        .getMethod("main", classOf[Array[String]])
        .invoke(
          null,
          Array[String](
            groovyScript().path.toString,
            groovyGenerateArg(),
            (Task.dest / "groovy-generated.html").toString
          )
        )
    } finally Thread.currentThread().setContextClassLoader(oldCl)
    PathRef(Task.dest)
  }

  def groovyGenerateArg: T[String]
  def resources = super.resources() ++ Seq(groovyGeneratedResources())
}

object foo extends GroovyGenerateJavaModule {
  def groovyGenerateArg = "Foo Groovy!"
}
object bar extends GroovyGenerateJavaModule {
  def groovyGenerateArg = "Bar Groovy!"
}

Here we have two modules foo and bar, each of which makes use of groovyWorker to evaluate a groovy script to generate some resources. In this case, we invoke the main method of groovy.ui.GroovyMain, which also happens to require us to set the ContextClassLoader to work.

> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Foo Groovy!</p></body></html>

> ./mill bar.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Bar Groovy!</p></body></html>

Because the URLClassLoader within groovyWorker is long-lived, the code within the classloader can be optimized by the JVM runtime, and would have less overhead than if run in separate classloaders via Jvm.runClassloader. And because URLClassLoader already extends AutoCloseable, groovyWorker gets treated as an Autocloseable Worker automatically.

As mentioned in documentation for Worker Tasks, the classloader contained within groovyWorker above is initialized in a single-thread, but it may be used concurrently in a multi-threaded environment. Practically, that means that the classes and methods you are invoking within the classloader do not make use of un-synchronized global mutable variables. If the JVM logic within the classloader does rely on mutable state, see Caching and Re-using JVM subprocesses and classloaders below

Running a ScalaModule in a Subprocess

This example demonstrates using Mill ScalaModules as build tasks: rather than pulling the code we need off of Maven Central, we instead build the code within the bar module as bar/src/Bar.scala.

build.mill (download, browse)
package build
import mill._, scalalib._
import mill.util.Jvm

object foo extends ScalaModule {
  def scalaVersion = "2.13.8"
  def moduleDeps = Seq(bar)

  def sources = Task {
    bar.runner().run(args = super.sources())
    Seq(PathRef(Task.dest))
  }
}

object bar extends ScalaModule {
  def scalaVersion = "2.13.8"
  def ivyDeps = Agg(ivy"com.lihaoyi::os-lib:0.10.7")
}

In this example, we use Bar.scala as a source-code pre-processor on the foo module source code: we override foo.sources, passing the super.sources() and bar.runClasspath to bar.runner().run along with a Task.dest, and returning a PathRef(Task.dest) as the new foo.sources. bar also depends on a third party library OS-Lib. The runner().run subprocess runs inside the Task.dest folder of the enclosing task automatically.

> mill foo.run
...
Foo.value: HELLO

This example does a trivial string-replace of "hello" with "HELLO", but is enough to demonstrate how you can use Mill ScalaModules to implement your own arbitrarily complex transformations. This is useful for build logic that may not fit nicely inside a build.mill file, whether due to the sheer lines of code or due to dependencies that may conflict with the Mill classpath present in build.mill

bar.runner().run by default inherits the mainClass, forkEnv, forkArgs, from the owning module bar, and the working directory from the calling task’s Task.dest. You can also pass in these parameters explicitly to run() as named arguments if you wish to override the defaults.

trait Runner{
  def run(args: os.Shellable,
          mainClass: String = null,
          forkArgs: Seq[String] = null,
          forkEnv: Map[String, String] = null,
          workingDir: os.Path = null,
          useCpPassingJar: java.lang.Boolean = null)
         (implicit ctx: Ctx): Unit
}

Running a JavaModule in a Classloader

While the previously example showed how to use the runner().run helpers to run a ScalaModule's code, but you can also use JavaModules for this purpose, with a source code generator written in Java. We also run the bar code within an in-memory classloader via Jvm.runClassloader as we saw earlier:

build.mill (download, browse)
package build
import mill._, scalalib._
import mill.util.Jvm

object foo extends JavaModule {
  def moduleDeps = Seq(bar)

  def sources = Task {
    Jvm.runClassloader(classPath = bar.runClasspath().map(_.path)) { classLoader =>
      classLoader
        .loadClass("bar.Bar")
        .getMethod("main", classOf[Array[String]])
        .invoke(null, Array(Task.dest.toString) ++ super.sources().map(_.path.toString))
    }
    Seq(PathRef(Task.dest))
  }
}

object bar extends JavaModule

As mentioned in the section on In-process Isolated Classloaders, this provides less overhead over running bar's classpath in a subprocess, at the expense of the classloader providing weaker isolation than a subprocess. Thus we cannot rely on the working directory inside the bar.Bar code to be in the right place, and instead we need to pass in the Task.dest path explicitly.

> mill foo.run
...
Foo.value: HELLO

Caching and Re-using JVM subprocesses and classloaders

Java Virtual Machines are expensive to initialize and with a large memory footprint, therefore if you tend to be doing a lot of classloader or subprocess operations it makes sense to re-use the JVM. You can do this using the CachedFactory helper class, which makes it easy to cache, re-use, and teardown these expensive long-lived components.

The example below is similar to Running a JavaModule in a Classloader above, but in this case the bar.Bar class relies on class-level mutable state in its implementation, and so sharing the same URLClassLoader across different foo* tasks running on different threads is not thread-safe. To resolve this, we make the barWorker contain an instance of mill.api.CachedFactory, which ensures that the classloaders are created when necessary, cached/re-used where possible, and torn down properly when no longer necessary.

build.mill (download, browse)
package build
import mill._, scalalib._
import mill.util.Jvm
import mill.api.CachedFactory
import java.net.{URL, URLClassLoader}

trait FooModule extends JavaModule {
  def moduleDeps = Seq(bar)

  def sources = Task {
    barWorker().withValue(()) { classLoader =>
      classLoader
        .loadClass("bar.Bar")
        .getMethod("main", classOf[Array[String]])
        .invoke(null, Array(Task.dest.toString) ++ super.sources().map(_.path.toString))
    }
    Seq(PathRef(Task.dest))
  }
}

object foo1 extends FooModule
object foo2 extends FooModule
object foo3 extends FooModule

def barWorker: Worker[BarWorker] = Task.Worker {
  new BarWorker(bar.runClasspath().map(_.path).toSeq)
}

class BarWorker(runClasspath: Seq[os.Path]) extends CachedFactory[Unit, URLClassLoader] {
  def setup(key: Unit) = {
    println("Setting up Classloader")
    Jvm.spawnClassloader(runClasspath)
  }
  def teardown(key: Unit, value: URLClassLoader) = {
    println("Tearing down Classloader")
    value.close()
  }
  def maxCacheSize = 2
}

object bar extends JavaModule
> mill '{foo1,foo2,foo3}.run' # 3 classloaders are setup, one is torn down due to maxCacheSize
Setting up Classloader
Setting up Classloader
Setting up Classloader
Tearing down Classloader
Foo.value: HELLO

> mill clean # mill clean tears down the 2 remaining classloaders
Tearing down Classloader
Tearing down Classloader