Today I would like to explain how I managed to compile and install Apache Oozie 4.0.1 against the lastest stable Hadoop version 2.4.1
Prerequisites:
- Hadoop 2.4.1 : installation explained in another post
- Maven
- Java 1.6+
- Unix/Mac machine
Download Oozie
wget http://apache.hippo.nl/oozie/4.0.1/oozie-4.0.1.tar.gz
tar xzvf oozie-4.0.1.tar.gz
cd oozie-4.0.1
Building against Hadoop 2.4.1
By default Oozie builds against Hadoop 1.1.1, so to build against Hadoop 2.4.1, we will have to configure maven dependencies in pom.xml
Change hadoop-2 maven profile
In the downloaded Oozie source code (pom.xml), the hadoop-2 maven profile specifies hadoop.version & hadoop.auth.version to be 2.3.0. So we change them to use 2.4.1
<profile>
<id>hadoop-2</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<hadoop.version>2.4.1</hadoop.version>
<hadoop.auth.version>2.4.1</hadoop.auth.version>
<pig.classifier>h2</pig.classifier>
<sqoop.classifier>hadoop200</sqoop.classifier>
</properties>
</profile>
Change Hadooplibs maven module
Next step is to configure hadooplibs maven module to build libs for 2.4.1 version. So we change the pom.xml in hadoop-2,hadoop-distcp-2 & hadoop-test-2 maven modules within Hadooplibs maven module
cd hadooplibs
File hadoop-2/pom.xml : change hadoop-client & hadoop-auth dependency version to 2.4.1
File hadoop-distcp-2/pom.xml: change hadoop-distcp version to 2.4.1
File hadoop-test-2/pom.xml: change hadoop-minicluster version to 2.4.1
Build Oozie distro
Use Maven profile hadoop-2 to compile Oozie 4.0.1 against Hadoop 2.4.1
cd ..
bin/mkdistro.sh -P hadoop-2 -DskipTests
or
mvn clean package assembly:single -P hadoop-2 -DskipTests
Setup Oozie server
Copy the Oozie distro to new directory
cd ..
mkdir Oozie
cp -R oozie-4.0.1/distro/target/oozie-4.0.1-distro/oozie-4.0.1/ Oozie
cd oozie
mkdir libext
cp -R ../oozie-4.0.1/hadooplibs/hadoop-2/target/hadooplibs/hadooplib-2.4.1.oozie-4.0.1/* libext
wget -P libext http://extjs.com/deploy/ext-2.2.zip
Prepare the Oozie war
./bin/oozie-setup.sh prepare-war
Create Sharelib Directory on HDFS
Following command will internally issue a HDFS create directory command to the Name node running at hdfs://localhost:9000 and then copy the shared library to that directory.
./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:9000
*make sure you select the right port number, otherwise you might get some error like Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. That is the case when oozie tries to talk to some other web service instead of HFDS FS.
Oozie Database
./bin/ooziedb.sh create -sqlfile oozie.sql -run
Configure Hadoop
Configure the Hadoop cluster with proxyuser for the Oozie process. The following two properties are required in Hadoop etc/hadoop/core-site.xml. If you are using Hadoop higher than version 1.1.0, you can use wildcards to specify the properties in configuration files. Replace “gaurav” with the username you would be running Oozie with.
<property>
<name>hadoop.proxyuser.gaurav.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.gaurav.groups</name>
<value>*</value>
</property>
Start Oozie
./bin/oozied.sh start
Oozie should now be accessible at http://localhost:11000/oozie
Submit a Test Workflow
Now we will try to submit a Workflow provided in the examples with Oozie: map-reduce. First we need to copy the examples directory in Oozie to your home directory on hdfs and then we submit the oozie job
From Hadoop Directory: bin/hdfs dfs -put path-to-oozie-directory/examples examples
From Oozie Directory: bin/oozie job -oozie http://localhost:11000/oozie/ -config examples/apps/map-reduce/job.properties -run
You might need to change job.properties before your submit the workflow to use the correct NameNode and JobTracker ports. If you are running Yarn ( MapReduce 2 ) then JobTracker will be referencing to the ResourceManager port.
nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
Status and output of Workflow
Map-Reduce submitted in Oozie : http://localhost:11000/oozie/

Status Of Map-Reduce in Hadoop Cluster

Map-Reduce Finished status in Oozie

That’s It
So we have successfully configured Oozie 4.0.1 with Hadoop 2.4.1 and were also able to submit a Job. In the next post we will talk about other aspects of Oozie, like sub-workflow and how we can link or make workflows depend on each other.
Possible Issues
Java heap space or PermGen space
While running maven command to compile you might faced either of PermGen space or OutOfMemory Java heap space error. So in that case you need to increase the memory allocated to maven process
export 'MAVEN_OPTS=-Xmx1024m -XX:MaxPermSize=128m'
Hadoop History server
The Oozie server needs to talk to Hadoop History server, to know the previous state of the Jobs, so we need to keep history server started while running Oozie. This error occurs when you try to run a workflow.
sbin/mr-jobhistory-daemon.sh start historyserver
Error related to impersonation
RemoteException: User: oozie is not allowed to impersonate oozie. This is caused when you fail to configure proper hadoop.proxyuser.oozie.hosts and hadoop.proxyuser.oozie.groups properties in Hadoop, make sure you use wildcards only if Hadoop is 1.1.0+ version.
InvalidProtocolBufferException
Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.;
This happens when you have compiled your oozie with a Protobuf library which is incompatible with the one used in Hadoop. For my use case I had compiled Oozie 4.0.1 with Protobuf 2.5.0 to work with Hadoop 2.4.1