Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Netty example build.sc and write about it #3326

Merged
merged 16 commits into from
Aug 3, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* xref:Java_Build_Examples.adoc[]
* xref:Java_Module_Config.adoc[]
* xref:Java_Web_Build_Examples.adoc[]
* xref:Java_Case_Study.adoc[]
* xref:Java_Case_Study_Netty.adoc[]

.Scala Quick Start
* xref:Scala_Intro_to_Mill.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ gtag('config', 'AW-16649289906');
</script>
++++

This page compares using Mill to Maven, using the [Netty Network Server](https://github.com/netty/netty)
This page compares using Mill to Maven, using the https://github.com/netty/netty[Netty Network Server]
codebase as the example. Netty is a large, old codebase. 500,000 lines of Java, written by
over 100 contributors across 15 years, split over 47 subprojects, with over 10,000 lines of
Maven `pom.xml` configuration alone. By porting it to Mill, this case study should give you
Expand All @@ -16,46 +16,334 @@ To do this, we have written a Mill `build.sc` file for the Netty project. This c
with Mill to build and test the various submodules of the Netty project without needing to
change any other files in the repository:

- ???[Netty `build.sc` file]
- https://github.com/com-lihaoyi/mill/blob/main/example/thirdparty/netty/build.sc[Netty `build.sc` file]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- https://github.com/com-lihaoyi/mill/blob/main/example/thirdparty/netty/build.sc[Netty `build.sc` file]
- {mill-github-url}/blob/main/example/thirdparty/netty/build.sc[Netty `build.sc` file]


== Completeness

The Mill build for Netty is not 100% complete, but it covers most of the major parts of Netty:
compiling Java, compiling and linking C code via JNI, running JUnit tests and some integration
tests using H2Spec. All 47 Maven subprojects are modelled using Mill.
tests using H2Spec. All 47 Maven subprojects are modelled using Mill, with the entire Netty codebase
being approximately 500,000 lines of code.

```bash
$ git ls-files | grep \\.java | xargs wc -l
...
513805 total
```

The goal of this exercise is not to be 100% feature complete enough to replace the Maven build
today. It is instead meant to provide a realistic comparison of how using Mill in a large,
complex project compares to using Maven.

Both Mill and Maven builds end up compiling the same set of files, although the number being
reported by the command line is slightly higher for Mill (2915 files) than Maven (2822) due
to differences in the reporting (e.g. Maven does not report `package-info.java` files as part
of the compiled file count).

== Performance

The Mill build for Netty is much more performant than the default Maven build. This applies to
most workflows:
most workflows.

For the benchmarks below, each provided number is the wall time of three consecutive runs
on my M1 Macbook Pro. While ad-hoc, these benchmarks are enough to give you a flavor of how
Mill's performance compares to Maven:

[cols="1,1,1,1"]
|===
| Benchmark | Maven | Mill | Speedup

| Sequential Clean Compile All | 2:31.12 | 0:22.19 | 6.8x

=== Sequential Clean Compile
| Parallel Clean Compile All | 1:16.45 | 0:09.95 | 7.7x
| Clean Compile Single Module | 0:19.62 | 0:02.17 | 9.0x
| Incremental Compile Single-Module | 0:21.10 | 0:00.54 | 39.1x
|===

The column on the right shows the speedups of how much faster Mill is compared to the
equivalent Maven workflow. In most cases, Mill is 5-10x faster than Maven. Below, we
will go into more detail of each benchmark: how they were run, what they mean, and how
we can explain the difference in performing the same task with the two different build tools.

=== Sequential Clean Compile All

```bash
$ time ./mvnw -DskipTests -Dcheckstyle.skip -Denforcer.skip=true clean install
2:42.96
2:27.58
2:31.12

$ ./mill clean; time ./mill __.compile
0:29.14
0:22.19
0:20.79
```

This benchmark exercises the simple "build everything from scratch" workflow, with all remote
artifacts already in the local cache. The actual files
being compiled are the same in either case (as mentioned in the <<Completeness>> section).
I have explicitly disabled the various linters and tests for the Maven build, to just focus
on the compilation of Java source code.

As a point of reference, Java typically compiles at 10,000-50,000 lines per second on a
single thread, and the Netty codebase is ~500,000 lines of code, we would expect compile
to take 10-50 seconds without parallelism.
The 20-30s taken by Mill seems about what you would expect for a codebase of this size,
and the ~150s taken by Maven is far beyond what you would expect from simple Java compilation.

==== Where is Maven spending its time?
From eyeballing the logs, the added overhead comes from things like:

$ time ./mill clean __.compile
*Downloading Metadata from Maven Central*

```text
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/io/netty/netty-transport-native-unix-common/maven-metadata.xml
Downloading from central: https://repo.maven.apache.org/maven2/io/netty/netty-transport-native-unix-common/maven-metadata.xml
Downloaded from central: https://repo.maven.apache.org/maven2/io/netty/netty-transport-native-unix-common/maven-metadata.xml (4.3 kB at 391 kB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/io/netty/netty-transport-native-unix-common/maven-metadata.xml (2.7 kB at 7.4 kB/s)
```

=== Parallel Clean Compile
*Comparing Jars*

```text
Comparing [io.netty:netty-transport-sctp:jar:4.1.112.Final] against [io.netty:netty-transport-sctp:jar:4.1.113.Final-SNAPSHOT] (including their transitive dependencies).
```

In general, Maven spends much of time working with Jar files: packing them, unpacking them,
comparing them, etc. None of this is strictly necessary for compiling Java source files to
classfiles! But if they are not necessary, then why is Maven doing it? It turns out the
reason comes own to the difference of `mvn compile` vs `mvn install`

==== Maven Compile vs Install

In general, the reason we have to use `./mvwn install` rather than `./mvnw compile` is that
Maven's main mechanism for managing inter-module dependencies is via the local artifact cache
at `~/.m2/repository`. Although many workflows work with `compile`, some don't, and
`./mvnw clean compile` on the Netty repository fails with:

```text
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.10:unpack-dependencies (unpack) on project netty-resolver-dns-native-macos: Artifact has not been packaged yet. When used on reactor artifact, unpack should be executed after packaging: see MDEP-98. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :netty-resolver-dns-native-macos
```

In contrast, Mill builds do not rely on the local artifact cache, even though Mill is able
to publish to it. That means Mill builds are able to work directly with classfiles on disk,
simply referencing them and using them as-is without spending time packing and unpacking them
into `.jar` files. Furthermore, even if we did want Mill to generate the ``.jar``s, the
overhead of doing so is just a few seconds, far less than the two entire minutes that
Maven's overhead adds to the clean build:

```bash
$ time ./mvnw -DskipTests -Dcheckstyle.skip -Denforcer.skip=true clean install
2:42.96
2:27.58
2:31.12

$ ./mill clean; time ./mill __.compile
0:29.14
0:22.19
0:20.79

$ ./mill clean; time ./mill __.jar
0:32.58
0:24.90
0:23.35
```

From this benchmark, we can see that although both Mill and Maven are doing the same work,
Mill takes about as long as it _should_ for this task of compiling 500,000 lines of Java source
code, while Maven takes considerably longer. And much of this overhead comes from Maven
doing unnecessary work packing/unpacking jar files and publishing to a local repository,
whereas Mill directly uses the classfiles generated on disk to bypass all that work.

=== Parallel Clean Compile All

```bash
$ time ./mvnw -T 4 -DskipTests -Dcheckstyle.skip -Denforcer.skip=true clean install
1:19.58
1:16.34
1:16.45

$ ./mill clean; time ./mill -j 4 __.compile
0:14.80
0:09.95
0:08.83
```

$ time ./mill -j 4 clean __.compile
This example compares Maven v.s. Mill, when performing the clean build on 4 threads.
Both build tools support parallelism (`-T 4` in Maven and `-j 4` in Mill), and both
tools see a similar ~2x speedup for building the Netty project using 4 threads. Again,
this tests a clean build using `./mvnw clean` or `./mill clean`.

This comparison shows that much of Mill's speedup over Maven is unrelated to parallelism.
Whether sequential or parallel, Mill has approximately the same ~7x speedup over Maven
when performing a clean build of the Netty repository.

=== Clean Compile Single-Module

```bash
$ time ./mvnw -pl common -DskipTests -Dcheckstyle.skip -Denforcer.skip=true clean install
0:19.62
0:20.52
0:19:50

$ ./mill clean common; time ./mill common.test.compile
0:04.94
0:02.17
0:01.95
```

This exercise limits the comparison to compiling a single module, in this case `common/`.
`./mvnw -pl common install` compiles both the `main/` and `test/` sources, whereas
`./mill common.compile` would only compile the `main/` sources, and we need to explicitly
reference `common.test.compile` to compile both (because `common.test.compile` depends on
`common.compile`, `common.compile` gets run automatically)

Again, we can see a significant speedup of Mill v.s. Maven remains even when compiling a
single module: a clean compile of `common/` is about 9x faster with Mill than with Maven!
Again, `common/` is about 40,000 lines of Java source code, so at 10,000-50,000 lines per
second we would expect it to compile in about 1-4s. That puts Mill's compile times right
at what you would expect, whereas Maven's has a significant overhead.


=== Incremental Compile Single-Module (Without Clean)

```bash
$ echo "" >> common/src/main/java/io/netty/util/AbstractConstant.java
$ time ./mvnw -pl common -DskipTests -Dcheckstyle.skip -Denforcer.skip=true install
Compiling 174 source files to /Users/lihaoyi/Github/netty/common/target/classes
Compiling 60 source files to /Users/lihaoyi/Github/netty/common/target/test-classes

0:21.10
0:19.64
0:21:29


$ echo "" >> common/src/main/java/io/netty/util/AbstractConstant.java
$ time ./mill common.test.compile
compiling 1 Java source to /Users/lihaoyi/Github/netty/out/common/compile.dest/classes ...

0:00.78
0:00.54
0:00.51
```

=== Incremental Compile Without Clean
This benchmark explores editing a single file and re-compiling `common/`.

Maven by default takes about as long to re-compile `common/`s `main/` and `test/` sources
after a single-line edit as it does from scratch, about 20 seconds. However, Mill
takes just about 0.5s to compile and be done! Looking at the logs, we can see it is
because Mill only compiles the single file we changed, and not the others.

For this incremental compilation, Mill uses the
https://github.com/sbt/zinc[Zinc Incremental Compiler]. Zinc is able to analyze the dependencies
between files to figure out what needs to re-compile: for an internal change that doesn't
affect downstream compilation (e.g. changing a string literal) Zinc only needs to compile
the file that changed, taking barely half a second:

```diff
$ git diff
diff --git a/common/src/main/java/io/netty/util/AbstractConstant.java b/common/src/main/java/io/netty/util/AbstractConstant.java
index de16653cee..9818f6b3ce 100644
--- a/common/src/main/java/io/netty/util/AbstractConstant.java
+++ b/common/src/main/java/io/netty/util/AbstractConstant.java
@@ -83,7 +83,7 @@ public abstract class AbstractConstant<T extends AbstractConstant<T>> implements
return 1;
}

- throw new Error("failed to compare two different constants");
+ throw new Error("failed to compare two different CONSTANTS!!");
}

}
```
```bash
$ time ./mill common.test.compile
[info] compiling 1 Java source to /Users/lihaoyi/Github/netty/out/common/compile.dest/classes ...
0:00.556
```

$ time ./mvnw -DskipTests -Dcheckstyle.skip -Denforcer.skip=true install
In contrast, a change to a class or function public signature (e.g. adding a method) may
require downstream code to re-compile, and we can see that below:

```diff
$ git diff
diff --git a/common/src/main/java/io/netty/util/AbstractConstant.java b/common/src/main/java/io/netty/util/AbstractConstant.java
index de16653cee..f5f5a93e0d 100644
--- a/common/src/main/java/io/netty/util/AbstractConstant.java
+++ b/common/src/main/java/io/netty/util/AbstractConstant.java
@@ -41,6 +41,10 @@ public abstract class AbstractConstant<T extends AbstractConstant<T>> implements
return name;
}

+ public final String name2() {
+ return name;
+ }
+
@Override
public final int id() {
return id;
```
```bash
$ time ./mill common.test.compile
[25/48] common.compile
[info] compiling 1 Java source to /Users/lihaoyi/Github/netty/out/common/compile.dest/classes ...
[info] compiling 2 Java sources to /Users/lihaoyi/Github/netty/out/common/compile.dest/classes ...
[info] compiling 4 Java sources to /Users/lihaoyi/Github/netty/out/common/compile.dest/classes ...
[info] compiling 3 Java sources to /Users/lihaoyi/Github/netty/out/common/test/compile.super/mill/scalalib/JavaModule/compile.dest/classes ...
[info] compiling 1 Java source to /Users/lihaoyi/Github/netty/out/common/test/compile.super/mill/scalalib/JavaModule/compile.dest/classes ...
0:00.812
```

Here, we can see that Zinc ended up re-compiling 7 files in `common/src/main/` and 3 files
in `common/src/test/` as a result of adding a method to `AbstractConstant.java`.

In general, Zinc is conservative, and does not always end up selecting the minimal set of
files that need re-compiling: e.g. in the above example, the new method `name2` does not
interfere with any existing method, and the ~9 downstream files did not actually need to
be re-compiled! However, even conservatively re-compiling 9 files is much faster than
Maven blindly re-compiling all 234 files, and as a result the iteration loop of
editing-compiling-testing your Java projects in Mill can be much faster than doing
the same thing in Maven

$ time ./mill __.compile
=== No-Op Compile Single-Module

```bash
$ time ./mvnw -pl common -DskipTests -Dcheckstyle.skip -Denforcer.skip=true install
0:16.34
0:17.34
0:18.28

$ time ./mill common.test.compile
0:00.49
0:00.47
0:00.45
```

=== No-Op Compile
This last benchmark explores the boundaries of Maven and Mill: what happens if
we ask to compile a single module _that has already been compiled_? In this case,
there is literally _nothing to do_. For Maven, "doing nothing" takes ~17 seconds,
whereas for Mill we can see it complete and return in less than 0.5 seconds

Grepping the logs, we can confirm that both build tools skip re-compilation of the
`common/` source code. In Maven, skipping compilation only saves us ~2 seconds,
bringing down the 19s we saw in <<Clean Compile Single-Module>> to 17s here. This
matches what we expect about Java compilation speed, with the 2s savings on
40,000 lines of code telling us Java compiles at ~20,000 lines per second. However,
we still see Maven taking *17 entire seconds* before it can decide to do nothing!

In contrast, doing the same no-op compile using Mill, we see the timing from 2.2s
in <<Clean Compile Single-Module>> to 0.5 seconds here. This is the same 2s reduction
we saw with Maven, but due to Mill's minimal overhead, in the end the command
finishes in less than half a second.

== Conciseness

Expand Down
3 changes: 2 additions & 1 deletion docs/modules/ROOT/partials/Intro_to_Mill_Header.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ digraph G {
}
....

{mill-github-url}[Mill] is a graph-based JVM build tool that supports {language}.
{mill-github-url}[Mill] is a fast JVM build tool that supports {language}, speeding
up common development workflows by 5-10x xref:Java_Case_Study_Netty.adoc[compared to Maven] or SBT.
Mill aims to make your JVM project's build process performant, maintainable, and flexible
even as it grows from a small project to a large codebase or monorepo with hundreds of modules:

Expand Down
Loading