Configure and use Map-Reduce and HDFS in pseudo-distributed mode

December 18, 2014

Configure and use Map-Reduce and HDFS in pseudo-distributed mode

Here is the note that describe Hadoop set-up and configure for running pseudo-distributed Hadoop node.
Supported Platforms:
1. Linux is supported as a development and production platform.
2. Win32 is supported as a development platform.
Here, I am using linux as a demonstration.

Required Softwares:
1. Java 1.6.x, preferably from Sun, must be installed.
2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.

Pseudo-Distributed Operation
Hadoop can be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
For this we have to configure few files:
1. conf/core-site.xml
2. conf/hdfs-site.xml
3. conf/mapred-site.xml

In the file conf/core-site.xml add:

1

2

3

4

5

6

<configuration>

     <property>

         <name>fs.default.name</name>

         <value>hdfs://localhost:9000</value>

     </property>

</configuration>

In the file conf/hdfs-site.xml add:

<configuration>

     <property>

         <name>dfs.replication</name>

         <value>1</value>

     </property>

</configuration>

In the file conf/mapred-site.xml add:

<configuration>

     <property>

         <name>mapred.job.tracker</name>

         <value>localhost:9001</value>

     </property>

</configuration>

Now check that you can ssh to the localhost without a passphrase.
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now format the distributed filesystem:
$ bin/hadoop namenode -format

Now start the Hadoop daemons:
$ bin/start-all.sh

To see the working of map-reduce run the following example:
$ bin/hadoop jar hadoop-examples-*.jar pi 10 100000

Post your comments and recommendations.

Search This Blog

Solutions For All

Configure and use Map-Reduce and HDFS in pseudo-distributed mode

Comments

Post a Comment

Popular Posts

TNSPING is not recognised as an internal or external command

Oracle Temp Tablespace Becomes Too Large – Resize the Temp Tablespace