Praveen's Blog

An Eternal Quest for Incremental Improvement

Building Hadoop and HBase for HBase Maven application development

1   Introduction

HBase 0.90.3 needs Hadoop common 0.20-append branch in order to not lose data. More information about this can be found "Getting Started" section of HBase guide. However, there is no official release of Hadoop common 0.20-append binary. In order to have consistent and right bits on your cluster and your development platform, you need to "compile your own binary version" of Hadoop common from the 0.20-append branch source and your own version of HBase 0.90.3 using that Hadoop common binary.

This article provides an overview of building Hadoop and HBase for developing HBase applications that are managed using Maven.

2   Interpreting Maven terminology

A brief description about a few ambiguous terms is provided in this section to avoid potential confusion.

2.1   Maven repository vs repository manager

Maven repository refers to ~/.m2/repository, whereas Maven repository manager refers to an artifact repository manager like Apache Archiva or Artifactory.

2.2   Installing vs deploying artifacts

Installing an artifact is installing it in the Maven repository, whereas deploying an artifact means publishing the artifact in a Maven repository manager. For more information, please refer to Maven reference.

2.3   Installing artifacts vs binaries

Installing artifacts refers to installing them in Maven repository, whereas installing binaries refers to installing the entire binary distribution on the cluster.

3   Prerequisites

You need the following components for this process.

If you are using a Maven repository manager, then make sure that you configure the authentication settings for the repository manager in ~/.m2/settings file.

<settings>
  ...
  <servers>
    <server>
      <id>yourrepo.internal</id>
      <username>USER</username>
      <password>PASSWORD</password>
    </server>
  </servers>
  ...
</settings>

yourrepo.internal is the ID that you will be referring to later from Ant and Maven build configurations.

USER and PASSWORD are the username and password of an account with deployment role in your Maven repository manager.

4   Building Hadoop common

4.1   Checkout Hadoop common

Checkout Hadoop common from 0.20-append branch.

$ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-append/ hadoop-common-0.20-append

4.2   Create build.properties

Hadoop uses Apache Ant as a build tool. In order to build hadoop-common, you need to create a hadoop-common-0.20-append/build.properties file that looks something like this.

resolvers=internal
version=0.20-append-r1057313-yourversion
project.version=${version}
hadoop.version=${version}
hadoop-core.version=${version}
hadoop-hdfs.version=${version}
hadoop-mapred.version=${version}

Note that at the time of creation of this article, the latest revision that was available in 0.20-append branch was r1057313.

Also, try to assign a meaningful suffix in place of yourversion so that you can distinguish between the official artifacts and the artifacts that are deployed by you.

4.3   OPTIONAL: Configure your repository manager

Please follow this step only if you are running a Maven repository manager for team collaboration and you want to deploy the Hadoop common artifacts to that repository manager.

Edit hadoop-common-0.20.append/build.xml and add two new targets.

<target name="mvn-deploy-internal" depends="mvn-taskdef, bin-package, set-version, simpledeploy-internal"
   description="To deploy hadoop core and test jar's to apache maven repository"/>

<target name="simpledeploy-internal" unless="staging">
   <artifact:pom file="${hadoop-core.pom}" id="hadoop.core"/>
   <artifact:pom file="${hadoop-test.pom}" id="hadoop.test"/>
   <artifact:pom file="${hadoop-examples.pom}" id="hadoop.examples"/>
   <artifact:pom file="${hadoop-tools.pom}" id="hadoop.tools"/>
   <artifact:pom file="${hadoop-streaming.pom}" id="hadoop.streaming"/>

   <artifact:install-provider artifactId="wagon-http" version="${wagon-http.version}"/>
   <artifact:deploy file="${hadoop-core.jar}">
       <remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
       <pom refid="hadoop.core"/>
   </artifact:deploy>
   <artifact:deploy file="${hadoop-test.jar}">
       <remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
       <pom refid="hadoop.test"/>
   </artifact:deploy>
   <artifact:deploy file="${hadoop-examples.jar}">
       <remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
       <pom refid="hadoop.examples"/>
   </artifact:deploy>
   <artifact:deploy file="${hadoop-tools.jar}">
       <remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
       <pom refid="hadoop.tools"/>
   </artifact:deploy>
   <artifact:deploy file="${hadoop-streaming.jar}">
       <remoteRepository id="yourrepo.internal" url="http://yourreposerver.com:port/path"/>
       <pom refid="hadoop.streaming"/>
   </artifact:deploy>
</target>

Note that yourrepo.internal is the same ID that you have configured authentication for in the ~m2/settings.xml file earlier.

4.4   Build and install/deploy Hadoop common artifacts

Now, build and install/deploy Hadoop common artifacts using the Maven ant tasks.

4.4.1   Install artifacts

If you do not have a repository manager, and skipped the previous step, then use the mvn-install target and skip the "Deploy artifacts" section. Otherwise, jump directly to "Deploy artifacts" section.

$ ant mvn-install

This target will generate Hadoop common artifacts and Maven POM files, and install them in your local Maven repository (~/.m2/repository).

4.4.2   Deploy artifacts

If you have an internal repository manager, you should deploy the artifacts on it that you have specified in build.xml of the previous step. To achieve this, run mvn-deploy-internal task.

$ ant mvn-deploy-internal

This target will generate the artifacts and Maven POM files, and publish them to your repository manager that you have specified in build.xml.

4.5   Generate the binary tarball to install on the cluster

You need to generate a binary tarball to install on the cluster. This is achieved by running tar target.

$ ant tar -Djava5.home=<Java 5 SE Home> -Dforrest.home=<Forrest 0.8 Home>

Please note that you need Java SE/EE 5 and Apache Forrest 0.8 for this step. Substituting Java SE/EE 5 or Apache Forrest 0.9 will result in a build failure.

This will generate the hadoop-common-0.20-append-r1057313-yourversion.tar.gz tarball in hadoop-common-0.20-append/build/.

4.6   Install Hadoop binaries on the cluster

Copy the tarball that was generated in the previous step to your cluster, and unpack them in desired location.

This ensures that you have a consistent Hadoop installation because you are not mixing and matching artifacts from a Hadoop common official release and artifacts that you built.

5   Building HBase

5.1   Checkout HBase

Checkout HBase from 0.90.3 tag.

$ svn co http://svn.apache.org/repos/asf/hbase/tags/0.90.3 hbase-0.90.3

5.2   Modify HBase and Hadoop versions

Now, edit hbase-0.90.3/pom.xml and modify HBase and Hadoop versions.

...
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<packaging>jar</packaging>
<version>0.90.3-yourversion</version>
...
 <hadoop.version>0.20-append-r1057313-yourversion</hadoop.version>
...

Note that you should be using the same revision number for Hadoop that you have assigned while building Hadoop.

Also, try to assign a meaningful suffix in place of yourversion so that you can distinguish between the official artifacts and the artifacts that are deployed by you.

5.3   OPTIONAL: Specify the URL of your repository manager

If you are running an internal repository manager for team collaboration, it is the time to specify in the hbase-0.90.3/pom.xml. Add the following section to it.

<project>
  ...
  <distributionManagement>
    <repository>
      <id>yourrepo.internal</id>
      <name>Your internal repository</name>
      <url>http://yourreposerver.com:port/path</url>
    </repository>
  </distributionManagement>
  ...
</project>

Note that yourrepo.internal is the same ID that you have configured authentication for in the ~m2/settings.xml file earlier.

5.4   Build and install/deploy HBase artifacts

Now, build and install/deploy HBase artifacts using the Maven goals.

5.4.1   Install artifacts

If you do not have a repository manager, and skipped the previous step, then use the install goal and skip the "Deploy artifacts" section. Otherwise, jump directly to "Deploy artifacts" section.

$ mvn install

This goal will generate HBase artifacts and Maven POM files, and install them in your local Maven repository (~/.m2/repository). Ignore the rest of this section.

5.4.2   Deploy artifacts

If you have a repository manager, you should deploy the artifacts on your internal server that you have specified in pom.xml of the previous step. To achieve this, invoke deploy goal.

$ mvn deploy

This goal will generate the artifacts and Maven POM files, and publish them to your internal repository manager that you have specified in pom.xml.

5.5   Generate the binary tarball to install on the cluster

You need to generate a binary tarball to install on the cluster. This is achieved by invoking assembly:single goal.

$ mvn assembly:single

This will generate the hbase-0.90.3/target/hbase-0.90.3-yourversion.tar.gz tarball.

5.6   Install HBase binaries on the cluster

Copy the tarball that was generated in the last step to your cluster and unpack them in the desired location.

This ensures that you have a consistent HBase installation with the right version of Hadoop artifacts that you have built. There is no need of replacing any artifact by hand because, Maven automatically pulled the right version of the artifact that you have built.

6   OPTIONAL: Using the HBase artifact in your HBase application

Now you can edit the pom.xml of your HBase application to use the version of HBase that you have built (0.90.3-yourversion)

7   Feedback

I tried to provide only as much information in the article as possible without overloading the scope of it. I have also taken basic care to ensure that the above mentioned commands are accurate. However, there might be some typos or copy paste errors. If you find something that doesn't work for you please let me know and I'll fix them.

8   Credits

9   Disclaimer

This article is provided for informational purpose only and I will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its use.


Comments