Python Module Configuration

This page goes into more detail about the various configuration options for PythonModule.

Common Configuration Overrides

This example shows some of the common tasks you may want to override on a PythonModule: specifying the mainScript, adding additional sources/resources, generating sources, and setting typecheck/run options. Also we have support for forkenv and pythonOptions to allow user to add variables for environment and options for python respectively.

build.mill (download, browse)
package build
import mill._, pythonlib._

object foo extends PythonModule {
  // You can have arbitrary numbers of third-party libraries
  def pythonDeps = Seq("MarkupSafe==3.0.2", "Jinja2==3.1.4")

  // choose a main Script to run if there are multiple present
  def mainScript = Task.Source { millSourcePath / "custom-src" / "foo2.py" }

  // Add (or replace) source folders for the module to use
  def sources = Task.Sources {
    super.sources() ++ Seq(PathRef(millSourcePath / "custom-src"))
  }

  // Add (or replace) resource folders for the module to use
  def resources = Task.Sources {
    super.resources() ++ Seq(PathRef(millSourcePath / "custom-resources"))
  }

  // Generate sources at build time
  def generatedSources: T[Seq[PathRef]] = Task {
    val destPath = Task.dest / "generatedSources"
    os.makeDir.all(destPath)
    for (name <- Seq("A", "B", "C")) os.write(
      destPath / s"foo$name.py",
      s"""
class Foo$name:
    value = "hello $name"
      """.stripMargin
    )

    Seq(PathRef(destPath))
  }
  // Pass additional environmental variables when `.run` is called.
  def forkEnv: T[Map[String, String]] = Map("MY_CUSTOM_ENV" -> "my-env-value")

  // Additional Python options e.g. to Turn On Warnings and ignore import and resource warnings
  // we can use -Werror to treat warnings as errors
  def pythonOptions: T[Seq[String]] =
    Seq("-Wall", "-Wignore::ImportWarning", "-Wignore::ResourceWarning")

}

Note the use of millSourcePath, Task.dest, and PathRef when preforming various filesystem operations:

  1. millSourcePath: Base path of the module. For the root module, it’s the repo root. For inner modules, it’s the module path (e.g., foo/bar/qux for foo.bar.qux). Can be overridden if needed.

  2. Task.dest: Destination folder in the out/ folder for task output. Prevents filesystem conflicts and serves as temporary storage or output for tasks.

  3. PathRef: Represents the contents of a file or folder, not just its path, ensuring downstream tasks properly invalidate when contents change.

Typical Usage is given below:

> ./mill foo.run
...
Foo2.value: <h1>hello2</h1>
Foo.value: <h1>hello</h1>
FooA.value: hello A
FooB.value: hello B
FooC.value: hello C
MyResource: My Resource Contents
MyOtherResource: My Other Resource Contents
MY_CUSTOM_ENV: my-env-value
...

> ./mill show foo.bundle
".../out/foo/bundle.dest/bundle.pex"

> out/foo/bundle.dest/bundle.pex
...
Foo2.value: <h1>hello2</h1>
Foo.value: <h1>hello</h1>
FooA.value: hello A
FooB.value: hello B
FooC.value: hello C
MyResource: My Resource Contents
MyOtherResource: My Other Resource Contents
...

> sed -i.bak 's/import os/import os, warnings; warnings.warn("This is a test warning!")/g' foo/custom-src/foo2.py

> ./mill foo.run
...UserWarning: This is a test warning!...

Custom Tasks

This example shows how to define task that depend on other tasks:

  1. For generatedSources, we override the task and make it depend directly on pythonDeps to generate its source files. In this example, to include the list of dependencies as tuples in the value variable.

  2. For lineCount, we define a brand new task that depends on sources. That lets us access the line count at runtime using MY_LINE_COUNT env variable defined in forkEnv and print it when the program runs

build.mill (download, browse)
package build
import mill._, pythonlib._

object foo extends PythonModule {

  def pythonDeps = Seq("argparse==1.4.0", "jinja2==3.1.4")

  def mainScript = Task.Source { millSourcePath / "src" / "foo.py" }

  def generatedSources: T[Seq[PathRef]] = Task {
    val destPath = Task.dest / "generatedSources"
    os.makeDir.all(destPath)

    val prettyPythonDeps = pythonDeps().map { dep =>
      val parts = dep.split("==")
      s"""("${parts(0)}", "${parts(1)}")"""
    }.mkString(", ")

    os.write(
      destPath / s"myDeps.py",
      s"""
class MyDeps:
    value = [${prettyPythonDeps}]
   """.stripMargin
    )

    Seq(PathRef(destPath))
  }

  def lineCount: T[Int] = Task {
    sources()
      .flatMap(pathRef => os.walk(pathRef.path))
      .filter(_.ext == "py")
      .map(os.read.lines(_).size)
      .sum
  }

  def forkEnv: T[Map[String, String]] = Map("MY_LINE_COUNT" -> s"${lineCount()}")

  def printLineCount() = Task.Command { println(lineCount()) }

}

The above build defines the customizations to the Mill task graph shown below, with the boxes representing tasks defined or overriden above and the un-boxed labels representing existing Mill tasks:

G pythonDeps pythonDeps generatedSources generatedSources pythonDeps->generatedSources ... ... generatedSources->... run run ...->run sources sources lineCount lineCount sources->lineCount forkEnv forkEnv lineCount->forkEnv printLineCount printLineCount lineCount->printLineCount forkEnv->...

Mill lets you define new cached Tasks using the Task {…​} syntax, depending on existing Tasks e.g. foo.sources via the foo.sources() syntax to extract their current value, as shown in lineCount above. The return-type of a Task has to be JSON-serializable (using uPickle, one of Mill’s Bundled Libraries) and the Task is cached when first run until its inputs change (in this case, if someone edits the foo.sources files which live in foo/src). Cached Tasks cannot take parameters.

Note that depending on a task requires use of parentheses after the task name, e.g. pythonDeps(), sources() and lineCount(). This converts the task of type T[V] into a value of type V you can make use in your task implementation.

This example can be run as follows:

> ./mill foo.run --text hello
text:  hello
MyDeps.value:  [('argparse', '1.4.0'), ('jinja2', '3.1.4')]
My_Line_Count:  23

> ./mill show foo.lineCount
23

> ./mill foo.printLineCount
23

Custom tasks can contain arbitrary code. Whether you want to download files using requests.get, generate sources to feed into a Python Interpreter, or create some custom pex bundle with the files you want , all of these can simply be custom tasks with your code running in the Task {…​} block.

You can create arbitrarily long chains of dependent tasks, and Mill will handle the re-evaluation and caching of the tasks' output for you. Mill also provides you a Task.dest folder for you to use as scratch space or to store files you want to return:

  • Any files a task creates should live within Task.dest

  • Any files a task modifies should be copied into Task.dest before being modified.

  • Any files that a task returns should be returned as a PathRef to a path within Task.dest

That ensures that the files belonging to a particular task all live in one place, avoiding file-name conflicts, preventing race conditions when tasks evaluate in parallel, and letting Mill automatically invalidate the files when the task’s inputs change.

Overriding Tasks

build.mill (download, browse)
package build
import mill._, pythonlib._

object foo extends PythonModule {
  def sources = Task {
    val destPath = Task.dest / "src"
    os.makeDir.all(destPath)

    os.write(
      destPath / "foo.py",
      s"""
class Foo:
    def main(self) -> None:
        print("Hello World")

if __name__ ==  '__main__':
    Foo().main()
      """.stripMargin
    )
    Seq(PathRef(destPath))
  }

  def mainScript = Task { PathRef(sources().head.path / "foo.py") }

  def typeCheck = Task {
    println("Type Checking...")
    super.typeCheck()
  }

  def run(args: mill.define.Args) = Task.Command {
    typeCheck()
    println("Running..." + args.value.mkString(" "))
    super.run(args)()
  }
}

You can re-define tasks to override them, and use super if you want to refer to the originally defined task. The above example shows how to override typeCheck and run to add additional logging messages, and we override sources which was Task.Sources for the src/ folder with a plain T{…​} task that generates the necessary source files on-the-fly.

that this example replaces your src/ folder with the generated sources, as we are overriding the def sources task. If you want to add generated sources, you can either override generatedSources, or you can override sources and use super to include the original source folder with super:
object foo2 extends PythonModule {
  def generatedSources = Task {
    val destPath = Task.dest / "src"
    os.makeDir.all(destPath)
    os.write(destPath / "foo.py", """...""")
    Seq(PathRef(destPath))
  }

  def mainScript = Task { PathRef(generatedSources().head.path / "foo.py") }

}

object foo3 extends PythonModule {
  def sources = Task {
    val destPath = Task.dest / "src"
    os.makeDir.all(destPath)
    os.write(destPath / "foo.py", """...""")
    super.sources() ++ Seq(PathRef(destPath))
  }
  def mainScript = Task { PathRef(sources().head.path / "foo.py") }

}

In Mill builds the override keyword is optional.

> ./mill foo.run
Type Checking...
Success: no issues found in 1 source file
Running...
Hello World

Compilation & Execution Flags

build.mill (download, browse)
package build
import mill._, pythonlib._

object `package` extends RootModule with PythonModule {
  def mainScript = Task.Source { millSourcePath / "src" / "foo.py" }
  def pythonOptions = Seq("-Wall", "-Xdev")
  def forkEnv = Map("MY_ENV_VAR" -> "HELLO MILL!")
}

You can pass flags to the Python Interpreter via pythonOptions.

> ./mill run
HELLO MILL!

By default, run runs the code in a subprocess, and you can pass environment-variables via forkEnv.

PythonPath and Filesystem Resources

build.mill (download, browse)
package build
import mill._, pythonlib._

object foo extends PythonModule {
  def mainScript = Task.Source { millSourcePath / "src" / "foo.py" }

  def resources = Task.Sources { super.resources() ++ Seq(PathRef(millSourcePath / "custom")) }

  object test extends PythonTests with TestModule.Unittest {

    def otherFiles = Task.Source(millSourcePath / "other-files")

    def forkEnv: T[Map[String, String]] =
      super.forkEnv() ++ Map("OTHER_FILES_DIR" -> otherFiles().path.toString)
  }
}
> ./mill foo.run
Hello World Resource File

> ./mill foo.test
...
test_all (test.TestScript...) ... ok
...Ran 1 test...
OK
...

This section discusses how tests can depend on resources locally on disk. Mill provides two ways to do this: via the Python PYTHONPATH resources, and via the resource folder which is made available as the environment variable MILL_TEST_RESOURCE_DIR;

  • The PythonPath resources are useful when you want to fetch individual files, and are bundled with the application by the .bundle step when constructing an bundle pex for deployment. But they do not allow you to list folders or perform other filesystem operations.

  • The resource folder, available via MILL_TEST_RESOURCE_DIR, gives you access to the folder path of the resources on disk. This is useful in allowing you to list and otherwise manipulate the filesystem, which you cannot do with pythonPath resources. However, the MILL_TEST_RESOURCE_DIR only exists when running tests using Mill, and is not available when executing applications packaged for deployment via .bundle

  • Apart from resources/, you can provide additional folders to your test suite by defining a Task.Source (otherFiles above) and passing it to forkEnv. This provide the folder path as an environment variable that the test can make use of

Example application code demonstrating the techniques above can be seen below:

foo/resources/res/file.txt (browse)
Hello World Resource File
foo/test/resources/res/test-file-a.txt (browse)
Test Hello World Resource File A
foo/test/resources/res/test-file-b.txt (browse)
Test Hello World Resource File B
foo/test/other-files/other-file.txt (browse)
Other Hello World File
foo/src/foo.py (browse)
import importlib.resources


class Foo:
    def PythonPathResourceText(self, package, resourceName: str) -> None:
        resource_content = (
            importlib.resources.files(package).joinpath(resourceName).read_text()
        )
        return resource_content.strip()


if __name__ == "__main__":
    print(Foo().PythonPathResourceText("res", "file.txt"))
foo/test/src/test.py (browse)
import os
from pathlib import Path
import importlib.resources
import unittest
from foo import Foo  # type: ignore


class TestScript(unittest.TestCase):
    def test_all(self) -> None:
        appPythonPathResourceText = Foo().PythonPathResourceText("res", "file.txt")
        self.assertEqual(appPythonPathResourceText, "Hello World Resource File")

        testPythonPathResourceText = (
            importlib.resources.files("res")
            .joinpath("test-file-a.txt")
            .read_text()
            .strip()
        )
        self.assertEqual(testPythonPathResourceText, "Test Hello World Resource File A")

        testFileResourceFile = next(
            Path(os.getenv("MILL_TEST_RESOURCE_DIR")).rglob("test-file-b.txt"), None
        )
        with open(testFileResourceFile, "r", encoding="utf-8") as file:
            testFileResourceText = file.readline()
        self.assertEqual(testFileResourceText, "Test Hello World Resource File B")

        with open(
            Path(os.getenv("OTHER_FILES_DIR"), "other-file.txt"), "r", encoding="utf-8"
        ) as file:
            otherFileText = file.readline()
        self.assertEqual(otherFileText, "Other Hello World File")


if __name__ == "__main__":
    unittest.main()

Note that tests require that you pass in any files that they depend on explicitly. This is necessary so that Mill knows when a test needs to be re-run and when a previous result can be cached. This also ensures that tests reading and writing to the current working directory do not accidentally interfere with each others files, especially when running in parallel.

Mill runs test processes in a sandbox/ folder, not in your project root folder, to prevent you from accidentally accessing files without explicitly passing them. Thus you cannot just read resources off disk via with open("foo/resources/test-file-a.txt") as file. If you have legacy tests that need to run in the project root folder to work, you can configure your test suite with def testSandboxWorkingDir = false to disable the sandbox and make the tests run in the project root.