• Home
  • Big Data
  • Hadoop
  • Oozie
  • Hive
  • Tez
  • Scala
  • Archives
  • About

Gaurav Kohli

Gaurav Kohli

Category Archives: Hadoop

Building Native Hadoop (v 2.5.1) libraries for OS X

15 Monday Dec 2014

Posted by Gaurav in Big Data, Hadoop

≈ 3 Comments

Tags

hadoop 2.5.1, native libraries

In one of earlier blogs I explained how we can build Native Hadoop(2.4.1) libraries on OS X. In the meanwhile Hadoop 2.5.1 was released, so I was curious if now the source code has been patched and building libraries on OS X works out of box. But to my surprise, it still doesn’t work.

So in this blog I won’t go into much details, for that you can check the other blog.

Issues faced on Building Native libraries On Mac OS X

1.Problem in hadoop-hdfs maven module

error:

 [exec] /Users/gaurav/GitHub/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/vecsum.c:61:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
 [exec]     if (clock_gettime(CLOCK_MONOTONIC, &watch->start)) {
 [exec]                       ^
 [exec] /Users/gaurav/GitHub/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/vecsum.c:79:23: error: use of undeclared identifier 'CLOCK_MONOTONIC'
 [exec]     if (clock_gettime(CLOCK_MONOTONIC, &watch->stop)) {
 [exec]                       ^ 

Solution: Download the Patch from Jira issue HDFS-6534. Download link-> HDFS-6534.v2.patch

  • git apply HDFS-6534.v2.patch
  • mvn package -Pdist,native -DskipTests -Dtar

2.Problems in hadoop-yarn-server-nodemanager maven module

error:

 [exec] /Users/gaurav/GitHub/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:501:33: error: use of undeclared identifier 'LOGIN_NAME_MAX'
 [exec]       if (strncmp(*users, user, LOGIN_NAME_MAX) == 0) {
 [exec]                                 ^   
 [exec] /Users/gaurav/GitHub/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c:1266:48: error: too many arguments to function call, expected 4, have 5
 [exec]     if (mount("none", mount_path, "cgroup", 0, controller) == 0) {
 [exec]         ~~~~~                                  ^~~~~~~~~~
 [exec] /usr/include/sys/mount.h:384:1: note: 'mount' declared here
 [exec] int     mount(const char *, const char *, int, void *);
 [exec] ^

Solution: Download the Patch from Jira issue YARN-2161. Download link-> YARN-2161.v1.patch

  • git apply YARN-2161.v1.patch
  • mvn package -Pdist,native -DskipTests -Dtar

Result

hadoop-dist/target/hadoop-2.5.1/lib/native folder should contain the native libraries. Copy them to hadoop-2.5.1/lib/native folder and restart Hadoop cluster.

Advertisement

Building Native Hadoop (v 2.4.1) libraries for OS X

28 Sunday Sep 2014

Posted by Gaurav in Big Data, Hadoop

≈ 6 Comments

Tags

hadoop 2.4.1, java 1.7, mac os, native libraries

If you are reading this blog, I assume that you already have Hadoop(v 2.4.1) installed on your OS X machine and that you are bit annoyed by the following error message

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

If you are only planning to use Hadoop on OS X for development procedures, this error should not bother you. For me it was the same case, but I was just annoyed with this warning message and wanted to try building Native libraries from the source code.

Steps to build Native Hadoop libraries

  1. Download source from GitHub
  • git clone git@github.com:apache/hadoop.git
  • git checkout branch-2.4.1
  1. Dependencies
    Install cmake and zlib using homebrew package manager
  • brew install cmake
  • brew install zlib
  1. Run maven command
  • mvn package -Pdist,native -DskipTests -Dtar

On linux machines, the above procedure should be enough, but not for MAC OS X with Java 1.7. So for that you have to go with few more changes.

Issues faced on Building Native libraries On Mac OS X

1.Missing tools.jar

If you are building using Java 1.7, you would see an error talking about missing tools.jar, which is a bug in Maven JSPC. The related Jira issue is HADOOP-9350. The JSPC Plugin expects classes.jar in ../Classes folder, so we create a symlink.

error:

Exception in thread “main” java.lang.AssertionError: Missing tools.jar at: /Library/Java/JavaVirtualMachines/jdk1.7.0_17.jdk/Contents
/Home/Classes/classes.jar. Expression: file.exists()

Solution: Create a symbolic link to trick Java into believing that classes.jar is same as tools.jar

  • sudo mkdir /usr/libexec/java_home/Classes
  • sudo ln -s /usr/libexec/java_home/lib/tools.jar /usr/libexec/java_home/Classes/classes.jar

2. Incompatible source code

Some code in Hadoop v2.4.1 is not compatible with Mac system, so need to apply the patch HADOOP-9648.v2.patch and the related Jira issue is HADOOP-10699

error:

     [exec] /Users/gaurav/GitHub/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c:77:26: error: invalid operands to binary expression ('void' and 'int')
     [exec]   if(setnetgrent(cgroup) == 1) {
     [exec]      ~~~~~~~~~~~~~~~~~~~ ^  ~
     [exec] 1 error generated.

Solution: Download the Patch from Jira issue HADOOP-10699. Download link-> HADOOP-9648.v2.patch

  • git apply HADOOP-9648.v2.patch
  • mvn package -Pdist,native -DskipTests -Dtar

Result

hadoop-dist/target/hadoop-2.4.1/lib/native folder should contain the native libraries. Copy them to hadoop-2.4.1/lib/native folder and restart Hadoop cluster.

References

  1. Native Libraries Guide documentation page.
  2. Hadoop Git repo
  3. HADOOP-10699 V2 Patch
  4. Details about Maven JSPC Issue

Apache Oozie Installation on Hadoop 2.4.1

26 Tuesday Aug 2014

Posted by Gaurav in Hadoop, Oozie

≈ 33 Comments

Tags

Apache Oozie, Hadoop

Today I would like to explain how I managed to compile and install Apache Oozie 4.0.1 against the lastest stable Hadoop version 2.4.1

Prerequisites:

  • Hadoop 2.4.1 : installation explained in another post
  • Maven
  • Java 1.6+
  • Unix/Mac machine

Download Oozie

wget http://apache.hippo.nl/oozie/4.0.1/oozie-4.0.1.tar.gz
tar xzvf oozie-4.0.1.tar.gz
cd oozie-4.0.1

Building against Hadoop 2.4.1

By default Oozie builds against Hadoop 1.1.1, so to build against Hadoop 2.4.1, we will have to configure maven dependencies in pom.xml

Change hadoop-2 maven profile

In the downloaded Oozie source code (pom.xml), the hadoop-2 maven profile specifies hadoop.version & hadoop.auth.version to be 2.3.0. So we change them to use 2.4.1

        <profile>
            <id>hadoop-2</id>
            <activation>
                <activeByDefault>false</activeByDefault>
            </activation>
            <properties>
               <hadoop.version>2.4.1</hadoop.version>
               <hadoop.auth.version>2.4.1</hadoop.auth.version>
               <pig.classifier>h2</pig.classifier>
               <sqoop.classifier>hadoop200</sqoop.classifier>
            </properties>
        </profile>

Change Hadooplibs maven module

Next step is to configure hadooplibs maven module to build libs for 2.4.1 version. So we change the pom.xml in hadoop-2,hadoop-distcp-2 & hadoop-test-2 maven modules within Hadooplibs maven module

cd hadooplibs
File hadoop-2/pom.xml : change hadoop-client & hadoop-auth dependency version to 2.4.1
File hadoop-distcp-2/pom.xml: change hadoop-distcp version to 2.4.1
File hadoop-test-2/pom.xml: change hadoop-minicluster version to 2.4.1

Build Oozie distro

Use Maven profile hadoop-2 to compile Oozie 4.0.1 against Hadoop 2.4.1

cd ..
bin/mkdistro.sh -P hadoop-2 -DskipTests 
or 
mvn clean package assembly:single -P hadoop-2 -DskipTests 

Setup Oozie server

Copy the Oozie distro to new directory

cd ..
mkdir Oozie
cp -R oozie-4.0.1/distro/target/oozie-4.0.1-distro/oozie-4.0.1/ Oozie
cd oozie
mkdir libext
cp -R ../oozie-4.0.1/hadooplibs/hadoop-2/target/hadooplibs/hadooplib-2.4.1.oozie-4.0.1/* libext
wget -P libext http://extjs.com/deploy/ext-2.2.zip

Prepare the Oozie war

./bin/oozie-setup.sh prepare-war

Create Sharelib Directory on HDFS

Following command will internally issue a HDFS create directory command to the Name node running at hdfs://localhost:9000 and then copy the shared library to that directory.

./bin/oozie-setup.sh sharelib create -fs hdfs://localhost:9000 

*make sure you select the right port number, otherwise you might get some error like Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. That is the case when oozie tries to talk to some other web service instead of HFDS FS.

Oozie Database

./bin/ooziedb.sh create -sqlfile oozie.sql -run

Configure Hadoop

Configure the Hadoop cluster with proxyuser for the Oozie process. The following two properties are required in Hadoop etc/hadoop/core-site.xml. If you are using Hadoop higher than version 1.1.0, you can use wildcards to specify the properties in configuration files. Replace “gaurav” with the username you would be running Oozie with.

<property>
<name>hadoop.proxyuser.gaurav.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.gaurav.groups</name>
<value>*</value>
</property>

Start Oozie

 ./bin/oozied.sh start

Oozie should now be accessible at http://localhost:11000/oozie

Submit a Test Workflow

Now we will try to submit a Workflow provided in the examples with Oozie: map-reduce. First we need to copy the examples directory in Oozie to your home directory on hdfs and then we submit the oozie job

From Hadoop Directory: bin/hdfs dfs -put path-to-oozie-directory/examples examples 
From Oozie Directory: bin/oozie job -oozie http://localhost:11000/oozie/ -config examples/apps/map-reduce/job.properties  -run

You might need to change job.properties before your submit the workflow to use the correct NameNode and JobTracker ports. If you are running Yarn ( MapReduce 2 ) then JobTracker will be referencing to the ResourceManager port.

nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce

Status and output of Workflow

Map-Reduce submitted in Oozie : http://localhost:11000/oozie/
Map-reduce-running

Status Of Map-Reduce in Hadoop Cluster
Job-status-Hadoop-cluster

Map-Reduce Finished status in Oozie
Map-reduce-succeeded

That’s It

So we have successfully configured Oozie 4.0.1 with Hadoop 2.4.1 and were also able to submit a Job. In the next post we will talk about other aspects of Oozie, like sub-workflow and how we can link or make workflows depend on each other.

Possible Issues

Java heap space or PermGen space

While running maven command to compile you might faced either of PermGen space or OutOfMemory Java heap space error. So in that case you need to increase the memory allocated to maven process

export 'MAVEN_OPTS=-Xmx1024m -XX:MaxPermSize=128m'

Hadoop History server

The Oozie server needs to talk to Hadoop History server, to know the previous state of the Jobs, so we need to keep history server started while running Oozie. This error occurs when you try to run a workflow.

sbin/mr-jobhistory-daemon.sh start historyserver

Error related to impersonation

RemoteException: User: oozie is not allowed to impersonate oozie. This is caused when you fail to configure proper hadoop.proxyuser.oozie.hosts and hadoop.proxyuser.oozie.groups properties in Hadoop, make sure you use wildcards only if Hadoop is 1.1.0+ version.

InvalidProtocolBufferException

Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.;

This happens when you have compiled your oozie with a Protobuf library which is incompatible with the one used in Hadoop. For my use case I had compiled Oozie 4.0.1 with Protobuf 2.5.0 to work with Hadoop 2.4.1

Archives

  • December 2014 (1)
  • September 2014 (1)
  • August 2014 (1)
  • August 2013 (2)
  • July 2013 (1)

Tweet

  • RT @msisodia: The best appreciation that one can get ever! twitter.com/PriyankaJShukl… 2 years ago
  • RT @KushaKapila: time taken to ban @kunalkamra88 from several airlines: less than half a day time taken to arrest ABVP's Komal Sharma: sti… 3 years ago

Categories

  • Big Data (2)
  • Hadoop (3)
  • Jira (2)
  • Oozie (1)
  • Play (1)

Info

Amsterdam
Follow Gaurav Kohli on WordPress.com

Tweet

  • RT @msisodia: The best appreciation that one can get ever! twitter.com/PriyankaJShukl… 2 years ago

Categories

  • Big Data (2)
  • Hadoop (3)
  • Jira (2)
  • Oozie (1)
  • Play (1)

Archives

  • December 2014 (1)
  • September 2014 (1)
  • August 2014 (1)
  • August 2013 (2)
  • July 2013 (1)

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Gaurav Kohli
    • Already have a WordPress.com account? Log in now.
    • Gaurav Kohli
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...