Running Dynamic JVM Code
While import $ivy is convenient, it comes with limitations as the JVM library it imports is global to your build:
-
The library has to be resolved and downloaded before any part of your build starts. If your codebase is large and most parts of your build don’t use that library, needing to download the library when working on parts that don’t need it can be wasteful
-
The library can only have one version across the entire build. This can be an issue if you need to have multiple versions of the library used in different parts of your build. e.g. different parts of a large Groovy codebase may use different versions of the Groovy interpreter, and so the Groovy interpreter cannot be included via
import $ivy
because the different versions would collide. -
The library cannot be built as part of your main build. While it is possible to build it as part of your Meta-Build, that comes with additional complexity and limitations. In a large codebase, you often end up building modules that are shared between production deployments as well as local tooling: in such cases
import $ivy
is not a good fit
In scenarios where these limitations cause issues, Mill provides other ways to run arbitrary
JVM code apart from import $ivy
.
Subprocesses
This example demonstrates how to resolve a third-party library from Maven Central, but instead of using import $ivy (which loads the library as part of the main build) we use:
-
defaultResolver().resolveDeps
to resolve the dependencies from Maven Central, convertingorg:name:version
coordinates (and their transitive dependencies) to `PathRef`s referring to files on disk -
Jvm.runSubprocess
, which runs the given classpath files in a subprocess, starting from specifiedmainClass
While OS-Lib's os.call
and os.spawn
APIs
can be used to create any processes, JVM subprocesses are common enough have enough
idiosyncracies (e.g. classpaths) that Mill provides helper methods specifically for them.
package build
import mill._, javalib._
import mill.util.Jvm
object foo extends JavaModule {
def groovyClasspath: Task[Agg[PathRef]] = Task {
defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
}
def groovyScript = Task.Source(millSourcePath / "generate.groovy")
def groovyGeneratedResources = Task {
Jvm.runSubprocess(
mainClass = "groovy.ui.GroovyMain",
classPath = groovyClasspath().map(_.path),
mainArgs = Seq(
groovyScript().path.toString,
"Groovy!",
(Task.dest / "groovy-generated.html").toString
),
workingDir = Task.dest
)
PathRef(Task.dest)
}
def resources = super.resources() ++ Seq(groovyGeneratedResources())
}
For this example, we use the Groovy interpreter as our example
third-party library. While often used as a groovy
CLI command, Groovy is also available
on Maven Central at the org.codehaus.groovy:groovy:3.0.9
coordinates. This lets us pull
it into our build as a classpath comprising PathRef
s to files on disk, and then run the
Groovy JVM main method (in the class
groovy.ui.GroovyMain)
passing it our script file generate.groovy
(wired into our build using an
Source Task groovyScript
) and arguments
used to configure the generated file and tell the script where to write it to. generate.groovy
generates a file on disk that we then wire into def resources
, which is read at runtime
by foo.run
and printed to the terminal output as shown below:
> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Groovy!</p></body></html>
As mentioned above, defaultResolver().resolveDeps
and Jvm.runSubprocess
are an
alternative to import $ivy
, providing you more flexibility to resolve dependencies
on-demand as part of your task graph only when necessary, and keeping it isolated from
the build in a subprocess preventing classpath collisions.
In-process Isolated Classloaders
This example is similar to the earlier example running the Groovy interpreter in a subprocess, but:
-
We use
Jvm.inprocess
to load the Groovy interpreter classpath files into an in-memory in-process classloader, -
loadClass
/getMethod
/invoke
to call methods on those classes using Java reflection
package build
import mill._, javalib._
import mill.util.Jvm
object foo extends JavaModule {
def groovyClasspath: Task[Agg[PathRef]] = Task {
defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
}
def groovyScript = Task.Source(millSourcePath / "generate.groovy")
def groovyGeneratedResources = Task {
Jvm.runClassloader(classPath = groovyClasspath().map(_.path)) { classLoader =>
classLoader
.loadClass("groovy.ui.GroovyMain")
.getMethod("main", classOf[Array[String]])
.invoke(
null,
Array[String](
groovyScript().path.toString,
"Groovy!",
(Task.dest / "groovy-generated.html").toString
)
)
}
PathRef(Task.dest)
}
def resources = super.resources() ++ Seq(groovyGeneratedResources())
}
Note that unlike Jvm.runSubprocess
, Jvm.runClassloader
does not take a workingDir
on mainArgs
: it instead provides you an in-memory classLoader
that contains the
classpath you gave it. From there, you can use .loadClass
and .getMethod
to fish out
the classes and methods you want, and .invoke
to call them.
> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Groovy!</p></body></html>
Jvm.runClassloader
has significantly less overhead than Jvm.runSubprocess
: both in terms
of wall-clock time and in terms of memory footprint. However, it does have somewhat less
isolation, as the code is running inside your JVM and cannot be configured to have a separate
working directory, environment variables, and other process-global configs. Which one is
better to use differs on a case-by-case basis.
Classloader Worker Tasks
Althought running JVM bytecode via a one-off isolated classloader has less overhead
than running it in a subprocess, the fact that the classloader needs to be created
each time adds overhead: newly-created classloaders contain code that is not yet
optimized by the JVM. When performance matters, you can put the classloader in a
Task.Worker
to keep it around, allowing the code internally to be optimized and
stay optimized without being thrown away each time
This example is similar to the earlier example running the Groovy interpreter in
a subprocess, but instead of using Jvm.runSubprocess
we use ClassLoader.create
to
load the Groovy interpreter classpath files:
package build
import mill._, javalib._
import mill.util.Jvm
object coursierModule extends CoursierModule
def groovyClasspath: Task[Agg[PathRef]] = Task {
coursierModule.defaultResolver().resolveDeps(Agg(ivy"org.codehaus.groovy:groovy:3.0.9"))
}
def groovyWorker: Worker[java.net.URLClassLoader] = Task.Worker {
Jvm.spawnClassloader(groovyClasspath().map(_.path).toSeq)
}
trait GroovyGenerateJavaModule extends JavaModule {
def groovyScript = Task.Source(millSourcePath / "generate.groovy")
def groovyGeneratedResources = Task {
mill.api.ClassLoader.withContextClassLoader(groovyWorker()) {
groovyWorker()
.loadClass("groovy.ui.GroovyMain")
.getMethod("main", classOf[Array[String]])
.invoke(
null,
Array[String](
groovyScript().path.toString,
groovyGenerateArg(),
(Task.dest / "groovy-generated.html").toString
)
)
}
PathRef(Task.dest)
}
def groovyGenerateArg: T[String]
def resources = super.resources() ++ Seq(groovyGeneratedResources())
}
object foo extends GroovyGenerateJavaModule {
def groovyGenerateArg = "Foo Groovy!"
}
object bar extends GroovyGenerateJavaModule {
def groovyGenerateArg = "Bar Groovy!"
}
Here we have two modules foo
and bar
, each of which makes use of groovyWorker
to evaluate a groovy script to generate some resources. In this case, we invoke the main
method of groovy.ui.GroovyMain
, which also happens to require us to set the
ContextClassLoader
to work.
> ./mill foo.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Foo Groovy!</p></body></html>
> ./mill bar.run
Contents of groovy-generated.html is <html><body><h1>Hello!</h1><p>Bar Groovy!</p></body></html>
Because the URLClassLoader
within groovyWorker
is long-lived, the code within the
classloader can be optimized by the JVM runtime, and would have less overhead than if
run in separate classloaders via Jvm.runClassloader
. And because URLClassLoader
already extends AutoCloseable
, groovyWorker
gets treated as an
Autocloseable Worker automatically.
As mentioned in documentation for Worker Tasks,
the classloader contained within groovyWorker above is initialized in a single-thread,
but it may be used concurrently in a multi-threaded environment. Practically, that means
that the classes and methods you are invoking within the classloader do not make use of
un-synchronized global mutable variables. If the JVM logic within the classloader does
rely on mutable state, see Caching and Re-using JVM subprocesses and classloaders below
|
Running a ScalaModule in a Subprocess
This example demonstrates using Mill ScalaModule
s as build tasks: rather
than pulling the code we need off of Maven Central, we instead build the code
within the bar
module as bar/src/Bar.scala
.
package build
import mill._, scalalib._
import mill.util.Jvm
object foo extends ScalaModule {
def scalaVersion = "2.13.8"
def moduleDeps = Seq(bar)
def sources = Task {
bar.runner().run(args = super.sources())
Seq(PathRef(Task.dest))
}
}
object bar extends ScalaModule {
def scalaVersion = "2.13.8"
def ivyDeps = Agg(ivy"com.lihaoyi::os-lib:0.10.7")
}
In this example, we use
Bar.scala
as a source-code pre-processor on the foo
module source code:
we override foo.sources
, passing the super.sources()
and bar.runClasspath
to bar.runner().run
along with a Task.dest
, and returning a PathRef(Task.dest)
as the new foo.sources
. bar
also depends on a third party library OS-Lib.
The runner().run
subprocess runs inside the Task.dest
folder of the enclosing
task automatically.
> mill foo.run
...
Foo.value: HELLO
This example does a trivial string-replace of "hello" with "HELLO", but is
enough to demonstrate how you can use Mill ScalaModule
s to implement your
own arbitrarily complex transformations. This is useful for build logic that
may not fit nicely inside a build.mill
file, whether due to the sheer lines
of code or due to dependencies that may conflict with the Mill classpath
present in build.mill
bar.runner().run
by default inherits the mainClass
, forkEnv
, forkArgs
,
from the owning module bar
, and the working directory from the calling task’s
Task.dest
. You can also pass in these parameters explicitly to run()
as named
arguments if you wish to override the defaults.
trait Runner{
def run(args: os.Shellable,
mainClass: String = null,
forkArgs: Seq[String] = null,
forkEnv: Map[String, String] = null,
workingDir: os.Path = null,
useCpPassingJar: java.lang.Boolean = null)
(implicit ctx: Ctx): Unit
}
Running a JavaModule in a Classloader
While the previously example showed how to use the runner().run
helpers
to run a ScalaModule
's code, but you can also use JavaModule
s for this
purpose, with a source code generator written in Java. We also run the
bar
code within an in-memory classloader via Jvm.runClassloader
as
we saw earlier:
package build
import mill._, scalalib._
import mill.util.Jvm
object foo extends JavaModule {
def moduleDeps = Seq(bar)
def sources = Task {
Jvm.runClassloader(classPath = bar.runClasspath().map(_.path)) { classLoader =>
classLoader
.loadClass("bar.Bar")
.getMethod("main", classOf[Array[String]])
.invoke(null, Array(Task.dest.toString) ++ super.sources().map(_.path.toString))
}
Seq(PathRef(Task.dest))
}
}
object bar extends JavaModule
As mentioned in the section on
In-process Isolated Classloaders,
this provides less overhead over running bar
's classpath in a subprocess, at
the expense of the classloader providing weaker isolation than a subprocess.
Thus we cannot rely on the working directory inside the bar.Bar
code to be in the
right place, and instead we need to pass in the Task.dest
path explicitly.
> mill foo.run
...
Foo.value: HELLO
Caching and Re-using JVM subprocesses and classloaders
Java Virtual Machines are expensive to initialize and with a large memory footprint, therefore if you tend to be doing a lot of classloader or subprocess operations it makes sense to re-use the JVM. You can do this using the CachedFactory helper class, which makes it easy to cache, re-use, and teardown these expensive long-lived components.
The example below is similar to Running a JavaModule in a Classloader above,
but in this case the bar.Bar
class relies on class-level mutable state in its
implementation, and so sharing the same URLClassLoader
across different
foo*
tasks running on different threads is not thread-safe. To resolve this,
we make the barWorker
contain an instance of mill.api.CachedFactory
,
which ensures that the classloaders are created when necessary,
cached/re-used where possible, and torn down properly when no longer necessary.
package build
import mill._, scalalib._
import mill.util.Jvm
import mill.api.CachedFactory
import java.net.{URL, URLClassLoader}
trait FooModule extends JavaModule {
def moduleDeps = Seq(bar)
def sources = Task {
barWorker().withValue(()) { classLoader =>
classLoader
.loadClass("bar.Bar")
.getMethod("main", classOf[Array[String]])
.invoke(null, Array(Task.dest.toString) ++ super.sources().map(_.path.toString))
}
Seq(PathRef(Task.dest))
}
}
object foo1 extends FooModule
object foo2 extends FooModule
object foo3 extends FooModule
def barWorker: Worker[BarWorker] = Task.Worker {
new BarWorker(bar.runClasspath().map(_.path).toSeq)
}
class BarWorker(runClasspath: Seq[os.Path]) extends CachedFactory[Unit, URLClassLoader] {
def setup(key: Unit) = {
println("Setting up Classloader")
Jvm.spawnClassloader(runClasspath)
}
def teardown(key: Unit, value: URLClassLoader) = {
println("Tearing down Classloader")
value.close()
}
def maxCacheSize = 2
}
object bar extends JavaModule
> mill '{foo1,foo2,foo3}.run' # 3 classloaders are setup, one is torn down due to maxCacheSize
Setting up Classloader
Setting up Classloader
Setting up Classloader
Tearing down Classloader
Foo.value: HELLO
> mill clean # mill clean tears down the 2 remaining classloaders
Tearing down Classloader
Tearing down Classloader