Let’s turn packages into a module system!

Many projects are divided into modules/subprojects using the build system (Maven, Gradle, SBT …); and writing modular code is generally a Good Thing. Dividing the code into build modules is mainly used for:

  • isolating parts of code (decreasing coupling)
  • api/impl split
  • adding a third-party dependency only to a specific part of code
  • grouping code with similar functionality
  • statically checking that code in one module only uses code from its dependent modules (inter-module dependencies)

While some may say that it is also useful for separate compilation, I don’t think that matters a lot (when considering one project). The build tools are pretty smart nowadays to figure out what needs to be recompiled.

Problems with build modules

I think there are several problems with this approach. First of all, it is pretty hard to decide when a piece of functionality is “big enough” to turn it into a build module. Is a handful of classes enough? Or do you need more? Should it strictly be one functionality per module? But that would cause a module explosion; and so on. At least in the projects I took part in, it was a common theme of discussions, how coarse-grained the build modules should be.

Secondly, build modules are pretty “heavy”. Maven is worst I suppose, you need a large piece of xml to create a module, with lots of boilerplate (for example repeated group id, version number, parent definition); SBT and Gradle are much better, but still, it is a significant effort. A separate directory needs to be created, the whole directory structure (src/main/..., src/test/...), build config updated, etc. Overall it is quite a hassle.

And then quite often when we have our beautiful modules separated, it turns out that in order for two of them to cooperate, we need a “common” part. Then we either end up with a bloated foo-common module, which contains loads of unrelated classes, or multiple small foo-foomodule-common modules; the second solution is fine of course, except for the time wasted setting it up.

Finally, a build module is an additional thing you have to name; most probably already the package name and the class name reflect what the code is doing, now it also needs to be repeated in the build module name (DRY violation).

All in all, I think creating build modules is much too hard and time-consuming. Programmers are lazy (which, of course, is a good thing), and this leads to designs which are not as clean as they could be. Time to change that :).

(See also my earlier blog on modules.)

Packages

Java, Scala and Groovy already have a system for grouping code: packages. However, currently a package is just a string identifier. Except for some very limited visibility options (package-private in Java, package-scoping in Scala) packages have no semantic meaning. So we have several levels of grouping code:

  1. Project
  2. Build module
  3. Package
  4. Class

What if we merged 2. and 3. together; why shouldn’t packages be used for creating modules?

Packages as modules?

Let’s see what would it take to extend packages to be modules. Obviously the first thing that we’d need is to associate some meta-data with each module. There are already some mechanisms for this (e.g. via annotations on package-info.java), or this could be an extension of package objects in Scala – some traits to mix in, or vals to override.

What kind of meta-data? Of course we don’t want to move the whole build definition to the packages. But let’s separate concerns – the build definition should define how to build the project, not what the module dependencies are. Then the first thing to define in a module’s meta-data would be dependencies on third-party libraries. Such definitions could be only symbols, which would be bound to concrete versions in the build definition.

For example, we would specify that package “foo.bar.dao” depends on the “jpa” libraries. The build definition would then contain a mapping from “jpa” to a list of maven artifacts (e.g. hibernate-core, hibernate-entitymanager etc.). Moreover, it would probably make most sense if such dependencies where transitive to sub-packages. So defining a global library would mean adding a dependency on the root package.

As a side note, with an extension of Scala’s package objects, this could even be made type-safe. The package objects could implement a trait, where one of the values to override could be the list of third-party dependencies symbols. The symbols themselves could be e.g. contained in an Enumeration, defined in the root package; which could make things like “find all modules dependent on jpa” a simple usage-search in the IDE.

Second step is to define inter-module dependencies using this mechanism as well. It would be possible, in the package’s meta-data, to define a list of other packages, from which code is visible. This follows how currently build modules are used: each contains a list of project modules which can be accessed. (Another Scala side-note: as the package objects would implement a trait, this would mean defining a list of objects with a given type.)

Taking this further, we could specify api and impl type-packages. Api-type ones would by default be accessible from other packages. Impl-type packages, on the other hand, couldn’t be accessed without explicitly specifying them as a dependency.

How could it look like in practice? A very rough sketch in Scala:

1
2
3
4
5
6
7
8
9
10
11
12
package foo.user
 
// Even without definition, each package has an implicit package object 
// implementing a PackageModule trait ...
package object dao { 
  // ... which is used here. The type of the val below is 
  // List[PackageModule].
  override val moduleDependsOn = List(foo.security, foo.user.model) 
  override val moduleType = ModuleType.API
  // FooLibs enum is defined in a top-level package or the build system
  override val moduleLibraries = List(FooLibs.JPA) 
}

Refactoring

Refactoring is an everyday activity; however, refactoring modules is usually a huge task, approached only once in a while. Should it be so? If packages were extended to modules, refactoring modules would be the same as moving around and renaming packages, with the additional need to update the meta-data. It would be much easier than currently, which I think would lead to better overall designs.

Build system

The above would obviously mean more work to the build system – it would have a harder time figuring out the list of modules, build order, list of artifacts to create etc (by the way, should a separate jar be created for a package, could also be part of the meta-data). Also some validations would be needed – for circular dependencies, or trying to constraint the visibility in a wrong way.

But then, people have done more complicated software than that.

Jigsaw?

You would probably say that this overlaps with project Jigsaw, which will come in Java 9 (or not). However, I think Jigsaw aims at a different scale: project-level modules. So one jigsaw module would be your whole project, while you would have multiple (tens) of packages-modules.

The name “module” is overloaded here, maybe the name “mini-modules” would be better, or very modestly “packages done right”.

Bottom line

I think that currently the way to define build modules is way too hard and constraining. On the other hand, lifting packages to modules would be very lightweight. Defining a new module would be the same as creating a new package – couldn’t get much simpler. Third-party libraries could be added only where needed easily. There would be one less thing to name. And there would be one source tree per project.

Also such an approach would be scalable and adjustable to the project’s needs. It would be possible to define fine-grained modules or coarse-grained ones without much effort. Or even better, why not create both – modules could be nested and built one on top of the other.

Now … the only problem is implementing, and adding IDE support ;)

  • http://spring-java-ee.blogspot.com/ Hendy Irawan

    I get reeeeally curious when there’s an article mentioning “module system” multiple times and yet not a single mention of OSGi. (It’s also nice you thought about Jigsaw.)

    I hope that wasn’t intentional.

    Please compare your plans with OSGi… :-)

  • http://www.warski.org Adam Warski

    True, OSGi certainly deserves a mention :). I skipped it as it’s similar to Jigsaw. Maybe OSGi bundles tend to be more per-build module, so smaller, but even then it is a runtime module system, while what I describe in the blog has static/compile-time checking.

    Now you could of course have integration with OSGi, and have the build generate OSGi bundles basing on package meta-data. So e.g. in Scala this could ba trait that you have to mix in to the package object (“package object foo extends OSGiBundle”) with some vals to define the OSGi-specific meta-data.

  • http://spring-java-ee.blogspot.com/ Hendy Irawan

    Thanks Adam.

    In Jigsaw this “package object” is called “services”, which is similar to how OSGi uses packages.

    So:

    OSGi packages == Jigsaw services == Adam’s “package objects”

    OSGi bundles == Jigsaw modules

    OSGi in a way extends the package concept similar to how you do it. In OSGi packages are versioned and has “uses”. Yes these can be used at runtime but can be helpful during build-time.

    In contrast, this:

    override val moduleDependsOn = List(foo.security, foo.user.model)

    at least for this revision of syntax, seems less flexible than OSGi, no way to specify version dependency. i.e. depending on com.google.common version 10 and version 14 would be somewhat different.

  • http://www.warski.org Adam Warski

    Still, if I understand you right you are thinking about modules at a higher level. So you would probably have several, not many modules per project (or am I mistaken?).

    Most probably also similar effect to what I’m describing can be achieved with Jigsaw/OSGi, as it can also be achieved by creating a multitude of Maven modules. However, one of my main points is how *hard* it is to define modules. If it’s not easy, people are going to do it rarely; same for refactoring. So I think we need a really convenient way to add, remove, combine, change modules. The module system should be “light”, it’s “heavy” now in my opinion.

    Btw., is there an OSGi tool for build-time verification of dependencies? So that I can only use code from the OSGi bundles I depend on?

    As for specifying versions – I intentionally separated these concepts. I think that as far as code goes, a package should only specify “I’m using com.google.common”. This could be e.g. through a “Guava” constant which would contain a list of publicly accessible packages. Now which specific version of Guava, is a matter of the build tool – as is resolving conflicts (if it turns out that there are dependencies on two different Guava versions).

    Thanks for the comments! :)

  • http://www.hendyirawan.com/ Hendy Irawan

    Higher level modules are OSGi bundles or Jigsaw modules.

    Fine or low level modules are OSGi packages or Jigsaw services.

    However, rather than the developer declaring package dependencies to other packages, the direction is other way around: the tooling i.e. maven-bundle-plugin or bnd tool or Eclipse PDE; analyzes the Java sources and declares the package dependencies in the manifest.

    Why? Consider a typical package :

    id.co.bippo.sales;
    uses:=”com.google.common.base,
    com.google.code.morphia,
    com.mongodb,
    javax.annotation,
    org.bson.types,
    com.google.common.collect,
    org.slf4j,
    id.co.bippo.sales.event,
    id.co.bippo.booking,
    id.co.bippo.person,
    com.google.common.eventbus,
    id.co.bippo.common,
    id.co.bippo.story,
    id.co.bippo.product.util,
    com.rabbitmq.client,
    com.google.code.morphia.annotations,
    org.codehaus.jackson.annotate,
    com.fasterxml.jackson.annotation”;
    version=5.0.0.SNAPSHOT,

    of course there’s more of that per bundle (which contains one or more packages), but all that is handled by the tooling. Note that the “uses” is recommended, not required, in OSGi. It assists the runtime to detect conflicts and also can be used for tooling support.

    But if I have to write those manually, umm… I’ll probably be sick :(

  • http://www.warski.org Adam Warski

    Well, for sure you wouldn’t have to declare all the dependencies in every package.

    What you are probably doing right now, is declaring e.g. guava, morphia, mongo-java-driver, slf4j etc. as Maven dependencies, right? So what I propose is to move those declarations to packages. Keep in mind that the packages are hierarchical (at least “would be” in the imaginary situation we are dealing with), so dependencies from the parent are inherited by the child packages. So what you would do, is define slf4j and guava as a dependency of the root package (you probably use them everywhere), and morphia and mongo-java-driver as a dependency of the “dao” package(s).

    What do you gain? Well I think it’s would be much more flexible. Adding a new library dependency would be much easier than now – also to add a library dep only to a specific part of code you wouldn’t have to extract a build module. Plus you could imagine some transitivity options here.

    I’m far from an OSGi expert, so I’ll have to read about the bundle-package distinction – thanks for the pointer.

    So while you are thinking about dependencies for a whole build module “squashed” into one manifest, I’m thinking spreading these where they belong – to the specific places in code which use a given library or which depend on code from another package – would be much easier to comprehend and reason about.

  • Ben

    Google’s Blaze build system works this way [1]. Twitter’s clone [2] could be interesting if it builds a community.

    The benefit of modules is that it forces developers to create boundaries. Ideally a decent package layout convention would be enough, but all too often projects degrade into disorder as internal aspects are leaked. That eventually means product silos in large repositories instead of reusable, well-defined, discoverable components.

    Neither are perfect models and while I’d prefer Blaze I’m pretty happy with Gradle these days.

    [1] http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html
    [2] https://github.com/twitter/commons/tree/master/src/python/twitter/pants

  • http://www.warski.org Adam Warski

    Thanks for the links!

    Agree with your comments about creating boundaries. The point is now to make it easier for developers to enforce those boundaries, as self-discipline alone is both hard and sometimes simply not possible to maintain on larger projects (as you can’t keep everything in your head).

  • http://www.warski.org Adam Warski

    And does Gradle have any mechanisms like I described or similar to Blaze/Pants?

  • Ben

    I think Gradle could be adapted to work like Blaze/Pants instead of mimicking the Maven module way. Its support for incremental builds, large multi-model projects, and exposing the DAG programmatically makes it very powerful. The IDE integration would need to be reworked to generate a single project model instead of one per module and might become painful in a multi-language build. While doable, I don’t think it would be practical without support from the core developers.

  • Pingback: Veripacks 0.1 – Verify Package Specifications @ Blog of Adam Warski()

  • Robert Jack Will

    Packages in the Go language are the way that you describe. Go doesn’t need separate build files because dependencies in each source file directly mention all used packages (and other files in the same package are automatically picked up by the compiler). Go is also cool, because packages are the only boundary where you specify visibilty which greatly simplifies the language and gets rid of constructs like C++ friend classes or Java nested classes, four-level visibility (private, protected, package (default), and public — or is package stricter than protected — who can remember all those nitty details anyway ;-).

    The Dart language seems to go for a similar approach — it’s a sign that this is the future!

  • http://www.warski.org/ Adam Warski

    Thanks, one more reason to explore Go in more detail one day :)