Saturday, 12 April 2014

Maven explanation - part 2


In the earlier article, we have discussed Maven build lifecycle and plugins. In this article, we will continue to discuss Maven repository and dependency management.
Maven Repository & Dependency Management

Maven repository may be the most well known feature of Maven. The benefit of having a repository is obvious. If we take a look back at the time most of Java projects were built with Ant, it is a must to include all the libraries needed in the project folder. If the application is a webapp, the wanted libraries can be stored in WEB-INF/lib folder. Otherwise, developers need to manually create the libraries folder and include this folder in the project classpath. It is also important that developers may need to split out libraries folder if they are for different usage. For example, JUnit and Mock libraries should only be used to testing, not compiling or packaging project.

Maven bring a much more convenient practice where developers only need to specify what they need and Maven helps them to download the libraries from somewhere, plus including them to project classpath. In Maven terms, this somewhere is called repository and the libraries can be specified by dependency.

There are two kind of repositories, remote and internal.


Maven is generous. It gives you a free remote repository that suppose to host all the libraries you need to use. Internal repository is basically a place in your computer where Maven stored all the libraries it has downloaded from remote repository.

Repository and Dependency Declaration

Let use the same trick of checking effective pom view again. Here is what it gave us:

<repositories>
    <repository>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <id>central</id>
      <name>Central Repository</name>
      <url>http://repo.maven.apache.org/maven2</url>
    </repository>
</repositories>

So we know that the website that hosting Maven central repository is http://repo.maven.apache.org/maven2
Normally, you can use your browser to hit this URL and browse it but unfortunately, this feature has been disabled recently. 

Now, let take a look at the declaration of a dependency

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.7</version>
    <scope>test</scope>
</dependency>

This dependency will end up appear in your project classpath as junit-4.7.jar. The jar file name will always be ${artifactId}-${version}. The groupId helps Maven to further differentiate dependencies with similar name. 

Similar to Java package, the groupId define the physical folder that  the dependency is stored. Therefore, to search for the dependency above, Maven will look at

http://repo.maven.apache.org/maven2/junit/junit/4.7/junit-4.7.jar

The naming convention for searching for jar file in remote repository is

${repositoryUrl}/${groupId}/${version}/${artifactId}-${version}.jar

For internal repository, it will be stored at the folder

${USER_HOME}/.m2/repository/junit/junit/4.7

The folder ${USER_HOME}/.m2/repository is a pre-defined place to stored internal repository. If you have time to look at the folder, you will see at least 3 files: junit-4.7.jar, junit-4.7.jar.sha1, junit-4.7.pom. This tell us, dependency is effectively a Maven library. Even if the jar file is not packaged by Maven, by the time it is uploaded to Maven repository, there will be a Maven pom file for each dependency.

Sometimes, developers also upload source together with compiled package. By maven convention, the source file name will be ${artifactId}-${version}-source.jar

Dependency Download and Upload

If you take a look at the diagram above, you can see that I draw the arrow from left to right and opposite. Whenever we build a Maven project in local box with command 

mvn install

It will upload the packaged file to internal repository. If we want the packaged file to by synced back to remote repository, the command is 

mvn deploy

Deploy and install are two consecutive phases in Default lifecycle; therefore you cannot bypass internal repository while uploading. Similarly for downloading. 

It is worth to note that you can declare multiple remote repositories but only one internal repository. In this case, when searching for dependency, Maven will scan through each repository, following the order of repository declaration in your pom file.  

In the diagram above, I put the arrow from internal repository back to remote repository but actually it is not that straightforward. Because, there may be more than one remote repository, mvn deploy is a complex  command, where you may need to provide authentication to upload. Maven provide instruction for deploy command here. However, we rarely need to use that. In local environment, developers should not need to upload to remote repository. If there is a place to upload, it should be Continuous Integration server, after the passing all the tests. At least for Jenkins, there is a plugin to deploy to remote repository automatically after build success. 

Proxy

For all the corporations I know, there always be at least one own hosting remote repository. The reason is simple, you cannot upload your project to Maven central repository. Most of the time, this remote repository also serve as proxy to Maven Central or any other external repository. Internal Repository can boost up performance if you have downloaded the dependency on the same box before. Proxy help to boost up performance for all the boxes in the office. Currently, we use Nexus as remote repository in our work environment.

Snapshot

A snapshot-dependency is a non-final dependency. It means that it is possible for the remote repository to contain an identical version of the dependency with newer content. Hence, Maven supposes to do a check of time stamp to see if it need to download new content for each build. This check may be slow and Maven only do it one time a day. 

For an actively developed project, once a day is definitely not enough, and we should put parameter -u to any Maven command to force it to download snapshot. 

Snapshot is a handful feature. Most of the time, we develop project with multiple sub-modules. Then, it is crucial that we declare each sub-module as snapshot, so that Maven keep downloading latest update while building the parent project. Whenever we have a release, we can finalise the version and move to the next snapshot.

To illustrate, take a look at the example below:

Start Project: 0.1-SNAPSHOT
First Release: 0.1
Continue develop for next milestone: 0.2-SNAPSHOT
Next Release: 0.2
Continue develop for next milestone: 0.3-SNAPSHOT   
...

As you can see, Maven uses the word "SNAPSHOT" to identify if a dependency is final or snapshot.

Dependency Management

In the example above, when we include junit dependency, Maven gives us one junit jar file. This is simple and straightforward. However, Maven can offer us more than that. Let include another dependency

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-core</artifactId>
    <version>4.0.3.RELEASE</version>
</dependency>

If you capture the Maven console when it run first time, you may find it prints out

[INFO] Downloaded http://repo.maven.apache.org/maven2/org/springframework/spring-core/4.0.3.RELEASE/spring-core-4.0.3.RELEASE.pom
[INFO] Downloaded http://repo.maven.apache.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.pom 

It may look surprise at first if you have not used Maven before. After all, remember that all Maven dependencies must have its own pom file. Spring-core pom file specify commons-logging as its dependency.

<dependency>
     <groupId>commons-logging</groupId>
     <artifactId>commons-logging</artifactId>
     <version>1.1.3</version>
     <scope>compile</scope>
</dependency>

Maven help us to recursively build the whole dependency hierarchy before starting downloading content. The dependency automatically download by Maven is called transitive dependency. This feature explain why Maven is widely and quickly adopted over Ant. Dependency Management save us from the effort of figuring out which library use which library. Even if we managed to do it one time, when we need to update version for one major library, the pain come back again.

In the diagram above, I tried to describe this behaviour by having a dependency contains other dependencies within. When copying over to project, they all end up as jar files in classpath.

Dependency Scope

If you look carefully at some of the dependency declarations above, you may find a scope attribute that I have not mentioned. To understand Dependency Scope, we need to equip ourselves with some basic concepts first.

Maven support 4 kind of classpaths:
  • Compile classpath
  • Runtime classpath
  • Test classpath
  • Plugin classpath
Do not worry about this, I myself also do not remember clearly the definition of each classpath. We only need to know that Compile classpath is used when compiling source code, Test classpath is only available for compiling and running test. Runtime classpath is used when project is deployed and run. When Maven package a project, it will includes any dependency only from compile classpath, not test, plugin or runtime.

Dependency scope defines which classpath a dependency will appear. Any dependency appear in compile classpath will appear in test classpath as well. Maven provide 6 dependency scopes:
  • Compile: This is the default scope, used if none is specified. Compile dependencies are available in all classpaths of a project. Furthermore, those dependencies are propagated to dependent projects.
  • Provided: This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime.
  • Runtime: This scope indicates that the dependency is not required for compilation, but is for execution. It is in the runtime and test classpaths, but not the compile classpath.
  • Test: This scope indicates that the dependency is not required for normal use of the application, and is only available for the test compilation and execution phases.
  • System: This scope is similar to Provided except that you have to provide the JAR which contains it explicitly. The artifact is always available and is not looked up in a repository.
  • Import (only available in Maven 2.0.9 or later): Too minor to mention.
We are not done yet. Let torture your mind with this matrix provided by Maven:

compileprovidedruntimetest
compilecompile(*)-runtime-
providedprovided-provided-
runtimeruntime-runtime-
testtest-test-

Look at the diagram above, the left most column define the dependency scope a dependency and the top row define the dependency scopes of its dependencies. The value in the table specify the final dependency scope of the transitive dependencies. System and Import scopes are not included, means Maven do not resolve transitive dependency for both of them.

Go back to earlier example, we do not specify scope when declaring spring-core dependency, it should have compile scope. commons-logging has compile scope inside spring-core pom file. Therefore, it should have compile as final scope. If spring-core has any dependency with test scope, it will be omitted.

This look tough, but to apply to real life, you only need to remember some guidelines:

  • For any test libraries use test scope
  • For any container libraries or environment specific libraries, use runtime
  • For any api, use provided
  • The rest use compile

Resolving dependency version conflict

Resolving version conflict is the source of confusion for dependency management. I personally feel that Maven has not done very well in this part.

To summarize, when resolving dependency if Maven found an identical dependencies with what it has found before, it will omitted the dependency but update the scope of existing dependency. At the end, there is only one dependency with unique groupId and artifactId on the dependency hierarchy.

To illustrate how Maven works, let looks at the dependency hierarchy generated by Eclipse for my project:


In this project, I include spring-core with compile scope and html-unit with test scope. html-unit has transitive dependency commons-logging version 1.0.4 while spring-core has identical dependency with version 1.1.3.

When Maven resolving dependency, it note that there is already commons-logging in the resolved dependencies and choose to omit version 1.1.3 even if it is the later version. Still, it update the commons-logging dependency of html-unit to compile scope because this is the widest scope of the dependency.

End up, I package an older version of commons-logging dependency. If I swap the order of declaration between spring-core and html-unit, I have



So this time, Maven give me commons-logging version 1.1.3 rather than 1.0.4.

If you do not know how Maven resolve dependencies, this will be an endless source of confusion. If you know it well, it is kind of easy. Please remember to use a tool to generate dependency hierarchy and keep track of it.

To avoid this problem, please clearly specify the version of dependency in pom file. In this case Maven will give higher priority for dependency over transitive dependency.


In this last example, I manually declare commons-logging with version 1.0.2 and it override the versions for all 3 occurrences of transitive dependencies.