Just 15 steps required to setup hadoop in local and run map reduce jobs. Watch below

  1. Check the JDK by typing in the terminal java -version java version “1.8.0_65” Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

  2. Setup password less SSH Check.                                       >> ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa >> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

  3. Download Hadoop from Hadoop binary 2.7.1 version

  4. untar the binary(tar zxvf hadoop-2.7.1.tar.gz) and place it inside a folder.

  5. Edit the bash_profile by doing vi .bash_profile as below.
    export JAVA_HOME=$(/usr/libexec/java_home) export HADOOP_PREFIX=/Users/<root>/<your_folder>/hadoop-2.7.1 export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=$HADOOP_PREFIX export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin

  6. Check, whether Hadoop is installed or not by typing in the terminal.That should display version information for hadoop.                     >> cd $HADOOP_PREFIX >> bin/hadoop version

  7. Open your core-site.xml present inside /hadoop-2.7.1/etc/hadoop/ and add the property as below

  8. <configuration>
    <property><name>fs.defaultFS</name> <value>hdfs://localhost:8020</value>
    <description>NameNode URI</description>
    </property>
    </configuration>

  9. Open your hdfs-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
    <configuration>
    <property> <name>dfs.datanode.data.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/datanode</value> <description>Paths on the local filesystem for DataNode blocks.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/namenode</value> <description>Path on the local filesystem for the NameNode namespace and transaction logs.</description> </property>
    </configuration>

  10. Open your mapred-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
    <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>512</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx618m</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx384m</value> </property> <property> <name>mapreduce.jobtracker.address</name> <value>local</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx618m</value> </property> </configuration>

  11. d.Open your yarn-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as

    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> <description>Total RAM available to all containers on a node</description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <description>Total # of CPU available to all containers on a node</description> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>Minimum RAM per container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> <description>Max RAM allocated to a container</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>Min core allocated to a container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>Max core allocated to a container</description>
    </property><
    /configuration>

  12. Format the namenode by typing in there terminal as below             >> $HADOOP_HOME/bin/hdfs namenode -format

  13. Start hadoop, Go to /hadoop-2.7.1/ and start hadoop by typing as
    sbin/start-all.sh 

  14. You should see the below services has been started.
    14340 DataNode
    14452 SecondaryNameNode
    14660 NodeManager
    14712 RunJar
    14569 ResourceManager
    14251 NameNode
    14765 Jps

  15. Check the namenode: http://localhost:50070

  16. Check YARN: http://localhost:8088

  17. Now run the map reduce job by going inside the /hadoop2.7.1/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar \randomwriter out

  18. To Stop hadoop, type, sbin/stop-all.sh