Tags

,

Today I would like to explain how I managed to compile and install Apache Oozie 4.0.1 against the lastest stable Hadoop version 2.4.1

Prerequisites:

  • Hadoop 2.4.1 : installation explained in another post
  • Maven
  • Java 1.6+
  • Unix/Mac machine

Download Oozie

wget http://apache.hippo.nl/oozie/4.0.1/oozie-4.0.1.tar.gz
tar xzvf oozie-4.0.1.tar.gz
cd oozie-4.0.1

Building against Hadoop 2.4.1

By default Oozie builds against Hadoop 1.1.1, so to build against Hadoop 2.4.1, we will have to configure maven dependencies in pom.xml

Change hadoop-2 maven profile

In the downloaded Oozie source code (pom.xml), the hadoop-2 maven profile specifies hadoop.version & hadoop.auth.version to be 2.3.0. So we change them to use 2.4.1

        <profile>
            <id>hadoop-2</id>
            <activation>
                <activeByDefault>false</activeByDefault>
            </activation>
            <properties>
               <hadoop.version>2.4.1</hadoop.version>
               <hadoop.auth.version>2.4.1</hadoop.auth.version>
               <pig.classifier>h2</pig.classifier>
               <sqoop.classifier>hadoop200</sqoop.classifier>
            </properties>
        </profile>

Change Hadooplibs maven module

Next step is to configure hadooplibs maven module to build libs for 2.4.1 version. So we change the pom.xml in hadoop-2,hadoop-distcp-2 & hadoop-test-2 maven modules within Hadooplibs maven module

cd hadooplibs
File hadoop-2/pom.xml : change hadoop-client & hadoop-auth dependency version to 2.4.1
File hadoop-distcp-2/pom.xml: change hadoop-distcp version to 2.4.1
File hadoop-test-2/pom.xml: change hadoop-minicluster version to 2.4.1

Build Oozie distro

Use Maven profile hadoop-2 to compile Oozie 4.0.1 against Hadoop 2.4.1

cd ..
bin/mkdistro.sh -P hadoop-2 -DskipTests 
or 
mvn clean package assembly:single -P hadoop-2 -DskipTests 

Setup Oozie server

Copy the Oozie distro to new directory

cd ..
mkdir Oozie
cp -R oozie-4.0.1/distro/target/oozie-4.0.1-distro/oozie-4.0.1/ Oozie
cd oozie
mkdir libext
cp -R ../oozie-4.0.1/hadooplibs/hadoop-2/target/hadooplibs/hadooplib-2.4.1.oozie-4.0.1/* libext
wget -P libext http://extjs.com/deploy/ext-2.2.zip

Prepare the Oozie war

./bin/oozie-setup.sh prepare-war

Create Sharelib Directory on HDFS

Following command will internally issue a HDFS create directory command to the Name node running at hdfs://localhost:9000 and then copy the shared library to that directory.

./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:9000 

*make sure you select the right port number, otherwise you might get some error like Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. That is the case when oozie tries to talk to some other web service instead of HFDS FS.

Oozie Database

./bin/ooziedb.sh create -sqlfile oozie.sql -run

Configure Hadoop

Configure the Hadoop cluster with proxyuser for the Oozie process. The following two properties are required in Hadoop etc/hadoop/core-site.xml. If you are using Hadoop higher than version 1.1.0, you can use wildcards to specify the properties in configuration files. Replace “gaurav” with the username you would be running Oozie with.

<property>
<name>hadoop.proxyuser.gaurav.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.gaurav.groups</name>
<value>*</value>
</property>

Start Oozie

 ./bin/oozied.sh start

Oozie should now be accessible at http://localhost:11000/oozie

Submit a Test Workflow

Now we will try to submit a Workflow provided in the examples with Oozie: map-reduce. First we need to copy the examples directory in Oozie to your home directory on hdfs and then we submit the oozie job

From Hadoop Directory: bin/hdfs dfs -put path-to-oozie-directory/examples examples 
From Oozie Directory: bin/oozie job -oozie http://localhost:11000/oozie/ -config examples/apps/map-reduce/job.properties  -run

You might need to change job.properties before your submit the workflow to use the correct NameNode and JobTracker ports. If you are running Yarn ( MapReduce 2 ) then JobTracker will be referencing to the ResourceManager port.

nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

Status and output of Workflow

Map-Reduce submitted in Oozie : http://localhost:11000/oozie/
Map-reduce-running

Status Of Map-Reduce in Hadoop Cluster
Job-status-Hadoop-cluster

Map-Reduce Finished status in Oozie
Map-reduce-succeeded

That’s It

So we have successfully configured Oozie 4.0.1 with Hadoop 2.4.1 and were also able to submit a Job. In the next post we will talk about other aspects of Oozie, like sub-workflow and how we can link or make workflows depend on each other.

Possible Issues

Java heap space or PermGen space

While running maven command to compile you might faced either of PermGen space or OutOfMemory Java heap space error. So in that case you need to increase the memory allocated to maven process

export 'MAVEN_OPTS=-Xmx1024m -XX:MaxPermSize=128m'

Hadoop History server

The Oozie server needs to talk to Hadoop History server, to know the previous state of the Jobs, so we need to keep history server started while running Oozie. This error occurs when you try to run a workflow.

sbin/mr-jobhistory-daemon.sh start historyserver

Error related to impersonation

RemoteException: User: oozie is not allowed to impersonate oozie. This is caused when you fail to configure proper hadoop.proxyuser.oozie.hosts and hadoop.proxyuser.oozie.groups properties in Hadoop, make sure you use wildcards only if Hadoop is 1.1.0+ version.

InvalidProtocolBufferException

Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.;

This happens when you have compiled your oozie with a Protobuf library which is incompatible with the one used in Hadoop. For my use case I had compiled Oozie 4.0.1 with Protobuf 2.5.0 to work with Hadoop 2.4.1