Just 15 steps required to setup hadoop in local and run map reduce jobs. Watch below
-
Check the JDK by typing in the terminal java -version java version “1.8.0_65” Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
-
Setup password less SSH Check. >> ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa >> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
-
Download Hadoop from Hadoop binary 2.7.1 version
-
untar the binary(tar zxvf hadoop-2.7.1.tar.gz) and place it inside a folder.
-
Edit the bash_profile by doing vi .bash_profile as below.
export JAVA_HOME=$(/usr/libexec/java_home) export HADOOP_PREFIX=/Users/<root>/<your_folder>/hadoop-2.7.1 export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=$HADOOP_PREFIX export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin -
Check, whether Hadoop is installed or not by typing in the terminal.That should display version information for hadoop. >> cd $HADOOP_PREFIX >> bin/hadoop version
-
Open your core-site.xml present inside /hadoop-2.7.1/etc/hadoop/ and add the property as below
-
<configuration>
<property><name>fs.defaultFS</name> <value>hdfs://localhost:8020</value>
<description>NameNode URI</description>
</property>
</configuration> -
Open your hdfs-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
<configuration>
<property> <name>dfs.datanode.data.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/datanode</value> <description>Paths on the local filesystem for DataNode blocks.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/namenode</value> <description>Path on the local filesystem for the NameNode namespace and transaction logs.</description> </property>
</configuration> -
Open your mapred-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>512</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx618m</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx384m</value> </property> <property> <name>mapreduce.jobtracker.address</name> <value>local</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx618m</value> </property> </configuration> -
d.Open your yarn-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> <description>Total RAM available to all containers on a node</description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <description>Total # of CPU available to all containers on a node</description> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>Minimum RAM per container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> <description>Max RAM allocated to a container</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>Min core allocated to a container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>Max core allocated to a container</description>
</property><
/configuration> -
Format the namenode by typing in there terminal as below >> $HADOOP_HOME/bin/hdfs namenode -format
-
Start hadoop, Go to /hadoop-2.7.1/ and start hadoop by typing as
sbin/start-all.sh -
You should see the below services has been started.
14340 DataNode
14452 SecondaryNameNode
14660 NodeManager
14712 RunJar
14569 ResourceManager
14251 NameNode
14765 Jps -
Check the namenode: http://localhost:50070
-
Check YARN: http://localhost:8088
-
Now run the map reduce job by going inside the /hadoop2.7.1/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar \randomwriter out
-
To Stop hadoop, type, sbin/stop-all.sh
