Configure and use Map-Reduce and HDFS in pseudo-distributed mode
Here is the note that describe Hadoop set-up and configure for running pseudo-distributed Hadoop node.
Supported Platforms:
1. Linux is supported as a development and production platform.
2. Win32 is supported as a development platform.
Here, I am using linux as a demonstration.
Supported Platforms:
1. Linux is supported as a development and production platform.
2. Win32 is supported as a development platform.
Here, I am using linux as a demonstration.
Required Softwares:
1. Java 1.6.x, preferably from Sun, must be installed.
2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
1. Java 1.6.x, preferably from Sun, must be installed.
2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Pseudo-Distributed Operation
Hadoop can be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
For this we have to configure few files:
1. conf/core-site.xml
2. conf/hdfs-site.xml
3. conf/mapred-site.xml
Hadoop can be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
For this we have to configure few files:
1. conf/core-site.xml
2. conf/hdfs-site.xml
3. conf/mapred-site.xml
In the file conf/core-site.xml add:
1
2
3
4
5
6
| < configuration > < property > < name >fs.default.name</ name > </ property > </ configuration > |
In the file conf/hdfs-site.xml add:
1
2
3
4
5
6
| < configuration > < property > < name >dfs.replication</ name > < value >1</ value > </ property > </ configuration > |
In the file conf/mapred-site.xml add:
1
2
3
4
5
6
| < configuration > < property > < name >mapred.job.tracker</ name > < value >localhost:9001</ value > </ property > </ configuration > |
Now check that you can ssh to the localhost without a passphrase.
$ ssh localhost
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now format the distributed filesystem:
$ bin/hadoop namenode -format
$ bin/hadoop namenode -format
Now start the Hadoop daemons:
$ bin/start-all.sh
$ bin/start-all.sh
To see the working of map-reduce run the following example:
$ bin/hadoop jar hadoop-examples-*.jar pi 10 100000
$ bin/hadoop jar hadoop-examples-*.jar pi 10 100000
Post your comments and recommendations.
Comments
Post a Comment