Building Hadoop and HBase for HBase Maven application development
1 Introduction
HBase 0.90.3 needs Hadoop common 0.20-append branch in order to not lose data. More information about this can be found "Getting Started" section of HBase guide. However, there is no official release of Hadoop common 0.20-append binary. In order to have consistent and right bits on your cluster and your development platform, you need to "compile your own binary version" of Hadoop common from the 0.20-append branch source and your own version of HBase 0.90.3 using that Hadoop common binary.
This article provides an overview of building Hadoop and HBase for developing HBase applications that are managed using Maven.
2 Interpreting Maven terminology
A brief description about a few ambiguous terms is provided in this section to avoid potential confusion.
2.1 Maven repository vs repository manager
Maven repository refers to ~/.m2/repository, whereas Maven repository manager refers to an artifact repository manager like Apache Archiva or Artifactory.
2.2 Installing vs deploying artifacts
Installing an artifact is installing it in the Maven repository, whereas deploying an artifact means publishing the artifact in a Maven repository manager. For more information, please refer to Maven reference.
2.3 Installing artifacts vs binaries
Installing artifacts refers to installing them in Maven repository, whereas installing binaries refers to installing the entire binary distribution on the cluster.
3 Prerequisites
You need the following components for this process.
- Oracle Java SE 6
- Oracle Java SE 5
- Apache Subversion
- Apache Ant
- Apache Forrest 0.8
- Apache Maven
- A Maven repository manager (Optional)
If you are using a Maven repository manager, then make sure that you configure the authentication settings for the repository manager in ~/.m2/settings file.
<settings>
...
<servers>
<server>
<id>yourrepo.internal</id>
<username>USER</username>
<password>PASSWORD</password>
</server>
</servers>
...
</settings>
yourrepo.internal is the ID that you will be referring to later from Ant and Maven build configurations.
USER and PASSWORD are the username and password of an account with deployment role in your Maven repository manager.
4 Building Hadoop common
4.1 Checkout Hadoop common
Checkout Hadoop common from 0.20-append branch.
$ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/ hadoop-common-0.20-append
4.2 Create build.properties
Hadoop uses Apache Ant as a build tool. In order to build hadoop-common, you need to create a hadoop-common-0.20-append/build.properties file that looks something like this.
resolvers=internal
version=0.20-append-r1057313-yourversion
project.version=${version}
hadoop.version=${version}
hadoop-core.version=${version}
hadoop-hdfs.version=${version}
hadoop-mapred.version=${version}
Note that at the time of creation of this article, the latest revision that was available in 0.20-append branch was r1057313.
Also, try to assign a meaningful suffix in place of yourversion so that you can distinguish between the official artifacts and the artifacts that are deployed by you.
4.3 OPTIONAL: Configure your repository manager
Please follow this step only if you are running a Maven repository manager for team collaboration and you want to deploy the Hadoop common artifacts to that repository manager.
Edit hadoop-common-0.20.append/build.xml and add two new targets.
<target name="mvn-deploy-internal" depends="mvn-taskdef, bin-package, set-version, simpledeploy-internal"
description="To deploy hadoop core and test jar's to apache maven repository"/>
<target name="simpledeploy-internal" unless="staging">
<artifact:pom file="${hadoop-core.pom}" id="hadoop.core"/>
<artifact:pom file="${hadoop-test.pom}" id="hadoop.test"/>
<artifact:pom file="${hadoop-examples.pom}" id="hadoop.examples"/>
<artifact:pom file="${hadoop-tools.pom}" id="hadoop.tools"/>
<artifact:pom file="${hadoop-streaming.pom}" id="hadoop.streaming"/>
<artifact:install-provider artifactId="wagon-http" version="${wagon-http.version}"/>
<artifact:deploy file="${hadoop-core.jar}">
<remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
<pom refid="hadoop.core"/>
</artifact:deploy>
<artifact:deploy file="${hadoop-test.jar}">
<remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
<pom refid="hadoop.test"/>
</artifact:deploy>
<artifact:deploy file="${hadoop-examples.jar}">
<remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
<pom refid="hadoop.examples"/>
</artifact:deploy>
<artifact:deploy file="${hadoop-tools.jar}">
<remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
<pom refid="hadoop.tools"/>
</artifact:deploy>
<artifact:deploy file="${hadoop-streaming.jar}">
<remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
<pom refid="hadoop.streaming"/>
</artifact:deploy>
</target>
Note that yourrepo.internal is the same ID that you have configured authentication for in the ~m2/settings.xml file earlier.
4.4 Build and install/deploy Hadoop common artifacts
Now, build and install/deploy Hadoop common artifacts using the Maven ant tasks.
4.4.1 Install artifacts
If you do not have a repository manager, and skipped the previous step, then use the mvn-install target and skip the "Deploy artifacts" section. Otherwise, jump directly to "Deploy artifacts" section.
$ ant mvn-install
This target will generate Hadoop common artifacts and Maven POM files, and install them in your local Maven repository (~/.m2/repository).
4.4.2 Deploy artifacts
If you have an internal repository manager, you should deploy the artifacts on it that you have specified in build.xml of the previous step. To achieve this, run mvn-deploy-internal task.
$ ant mvn-deploy-internal
This target will generate the artifacts and Maven POM files, and publish them to your repository manager that you have specified in build.xml.
4.5 Generate the binary tarball to install on the cluster
You need to generate a binary tarball to install on the cluster. This is achieved by running tar target.
$ ant tar -Djava5.home=<Java 5 SE Home> -Dforrest.home=<Forrest 0.8 Home>
Please note that you need Java SE/EE 5 and Apache Forrest 0.8 for this step. Substituting Java SE/EE 5 or Apache Forrest 0.9 will result in a build failure.
This will generate the hadoop-common-0.20-append-r1057313-yourversion.tar.gz tarball in hadoop-common-0.20-append/build/.
4.6 Install Hadoop binaries on the cluster
Copy the tarball that was generated in the previous step to your cluster, and unpack them in desired location.
This ensures that you have a consistent Hadoop installation because you are not mixing and matching artifacts from a Hadoop common official release and artifacts that you built.
5 Building HBase
5.1 Checkout HBase
Checkout HBase from 0.90.3 tag.
$ svn co http://svn.apache.org/repos/asf/hbase/tags/0.90.3 hbase-0.90.3
5.2 Modify HBase and Hadoop versions
Now, edit hbase-0.90.3/pom.xml and modify HBase and Hadoop versions.
...
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<packaging>jar</packaging>
<version>0.90.3-yourversion</version>
...
<hadoop.version>0.20-append-r1057313-yourversion</hadoop.version>
...
Note that you should be using the same revision number for Hadoop that you have assigned while building Hadoop.
Also, try to assign a meaningful suffix in place of yourversion so that you can distinguish between the official artifacts and the artifacts that are deployed by you.
5.3 OPTIONAL: Specify the URL of your repository manager
If you are running an internal repository manager for team collaboration, it is the time to specify in the hbase-0.90.3/pom.xml. Add the following section to it.
<project>
...
<distributionManagement>
<repository>
<id>yourrepo.internal</id>
<name>Your internal repository</name>
<url>http://yourreposerver.com:port/path</url>
</repository>
</distributionManagement>
...
</project>
Note that yourrepo.internal is the same ID that you have configured authentication for in the ~m2/settings.xml file earlier.
5.4 Build and install/deploy HBase artifacts
Now, build and install/deploy HBase artifacts using the Maven goals.
5.4.1 Install artifacts
If you do not have a repository manager, and skipped the previous step, then use the install goal and skip the "Deploy artifacts" section. Otherwise, jump directly to "Deploy artifacts" section.
$ mvn install
This goal will generate HBase artifacts and Maven POM files, and install them in your local Maven repository (~/.m2/repository). Ignore the rest of this section.
5.4.2 Deploy artifacts
If you have a repository manager, you should deploy the artifacts on your internal server that you have specified in pom.xml of the previous step. To achieve this, invoke deploy goal.
$ mvn deploy
This goal will generate the artifacts and Maven POM files, and publish them to your internal repository manager that you have specified in pom.xml.
5.5 Generate the binary tarball to install on the cluster
You need to generate a binary tarball to install on the cluster. This is achieved by invoking assembly:single goal.
$ mvn assembly:single
This will generate the hbase-0.90.3/target/hbase-0.90.3-yourversion.tar.gz tarball.
5.6 Install HBase binaries on the cluster
Copy the tarball that was generated in the last step to your cluster and unpack them in the desired location.
This ensures that you have a consistent HBase installation with the right version of Hadoop artifacts that you have built. There is no need of replacing any artifact by hand because, Maven automatically pulled the right version of the artifact that you have built.
6 OPTIONAL: Using the HBase artifact in your HBase application
Now you can edit the pom.xml of your HBase application to use the version of HBase that you have built (0.90.3-yourversion)
7 Feedback
I tried to provide only as much information in the article as possible without overloading the scope of it. I have also taken basic care to ensure that the above mentioned commands are accurate. However, there might be some typos or copy paste errors. If you find something that doesn't work for you please let me know and I'll fix them.
8 Credits
- Thanks to Michael G. Noll for his blog post on building Hadoop.
- Thanks to Joe Pallas for his suggestions on this process and review of this article.
9 Disclaimer
This article is provided for informational purpose only and I will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its use.