Configure Hadoop for multi-node clusters
In my previous post, I described how to configure and use Hadoop on pseudo-distributed mode.
Hope that would help you for general testing of “How Hadoop works?” on single node.
Since, Hadoop’s main advantage lies with its capability to distribute datasets across clusters known asHDFS(Hadoop Distributed File System), so in this post we’ll use and configure multi-node clusters for Hadoop.
Hope that would help you for general testing of “How Hadoop works?” on single node.
Since, Hadoop’s main advantage lies with its capability to distribute datasets across clusters known asHDFS(Hadoop Distributed File System), so in this post we’ll use and configure multi-node clusters for Hadoop.
Pre-requisites:
1. Linux machine (for our example 3(three), that is: 1 master and 2 slaves); count of machines can be increased according to the requirements.
2. Hadoop instance (stable release, version 1.2.1)
3. JDK version 1.6 or higher must be installed on the individual linux machine, and set the JAVA_HOME path in the bash_profile(we’ll cover this one by one).
1. Linux machine (for our example 3(three), that is: 1 master and 2 slaves); count of machines can be increased according to the requirements.
2. Hadoop instance (stable release, version 1.2.1)
3. JDK version 1.6 or higher must be installed on the individual linux machine, and set the JAVA_HOME path in the bash_profile(we’ll cover this one by one).
Now, lets do some practical
If you don’t have mutiple linux machines(don’t worry), create your own.
For this, you need to download:
If you don’t have mutiple linux machines(don’t worry), create your own.
For this, you need to download:
Now, create 3 virtual machines using VMWare Player and Ubuntu-13.04 image file. For identical recognition of each machine, name the virtual machine as Master, Slave01, Slave02.
Once the installation is completed, restart the virtual machines.
Now, you are ready to configure.
Check out these steps:
Open terminal on each machine and do the followings:
Open terminal on each machine and do the followings:
1
| xyz @ubuntu :~$ |
1
| xyz @ubuntu :~ $ssh localhost |
If it is not installed, install it by running the command:
1
| xyz @ubuntu :~ $sudo apt-get install ssh |
Now, you can ssh to localhost (this is not password-less)
[we'll see later, how to make it password-less ssh]
[we'll see later, how to make it password-less ssh]
Run the command as:
1
| xyz @ubuntu :~ $javac |
If installed:
1
2
3
4
5
6
7
8
| Usage: javac <options> <source files> where possible options include: -g Generate all debugging info -g :none Generate no debugging info -g :{lines,vars,source} Generate only some debugging info -nowarn Generate no warnings -verbose Output messages about what the compiler is doing -deprecation Output source locations where deprecated APIs are used |
If not installed:
1
2
3
4
5
6
7
8
| The program 'javac' can be found in the following packages: * default -jdk * ecj * gcj-4.6-jdk * gcj-4.7-jdk * openjdk-7-jdk * openjdk-6-jdk Try: sudo apt-get install <selected package> |
For installing JDK run the command:
1
| xyz @ubuntu :~ $sudo apt-get install openjdk-7-jdk |
After this, open JDK-7 will be installed on your system.
Open the bash_profile as:
1
| xyz @ubuntu :~ $sudo vi ~/.bash-profile |
Now, write these two lines (if you have installed openjdk-7-jdk)
[Else, you may have different path for java, define those path]
[Else, you may have different path for java, define those path]
1
2
| export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre/ export PATH= $PATH :/usr/lib/jvm/java-7-openjdk-i386/jre/bin |
On master machine:
1
| nabin @ubuntu :~$ sudo vi /etc/hostname |
Change ubuntu to master
On 1st slave machine:
1
| nabin @ubuntu :~$ sudo vi /etc/hostname |
Change ubuntu to slave01
On 2nd slave machine:
1
| nabin @ubuntu :~$ sudo vi /etc/hostname |
Change ubuntu to slave02
1
| nabin @ubuntu :~$ ifconfig |
Add this ip addresses into /etc/hosts of each machine, so that each machine can recognize each other over network.
for example:
xxx.xxx.xxx.xxx master
xxx.xxx.xxx.xxx slave01
xxx.xxx.xxx.xxx slave02
for example:
xxx.xxx.xxx.xxx master
xxx.xxx.xxx.xxx slave01
xxx.xxx.xxx.xxx slave02
1
| xyz @master :~$ ssh slave0x |
Master should be able to ssh to slave01, slave02
To make it password-less ssh, use following commands:
To make it password-less ssh, use following commands:
1
| xyz @master :~$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| Generating public/private rsa key pair. Your identification has been saved in /home/nabin/.ssh/id_rsa. Your public key has been saved in /home/nabin/.ssh/id_rsa.pub. The key fingerprint is: 23:b2:09:4c:7e:7d:2b:b0:cc:a4:0c:1c:ef:77:76:d9 nabin @master The key's randomart image is: +--[ RSA 2048]----+ | | | | | .. | |.+o . | |..+.= o S | | o.B * o oo | | o.*..o.o E | | . o.. | | | +-----------------+ |
After generation of public and private keys, run the following command:
1
| xyz @master :~ $cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys |
Now, run the following command for each slaves from master:
1
| xyz @master :~ $cat ~/.ssh/id_rsa.pub | ssh xyz @slave0x "cat >> .ssh/authorized_keys" |
Now, you are done with all your system configurations.
Lets configure some files of Hadoop-1.2.1 and you are done:
You can edit this using vi mode or you can use WinSCP to open and edit each file. Your choice..!
You can edit this using vi mode or you can use WinSCP to open and edit each file. Your choice..!
- Open hadoop-1.2.1/conf/ on each machine:
In conf/, you need to change:
In core-site.xml:
1
2
3
4
5
6
| < configuration > < property > < name >fs.default.name</ name > </ property > </ configuration > |
In hdfs-site.xml:
1
2
3
4
5
6
| < configuration > < property > < name >dfs.replication</ name > < value >2</ value > </ property > </ configuration > |
In mapred-site.xml:
1
2
3
4
5
6
| < configuration > < property > < name >mapred.job.tracker</ name > </ property > </ configuration > |
In hadoop-env.sh set JAVA_HOME:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre
In masters, add:
xyz@master
In slaves, add:
xyz@slave01
xyz@slave02
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre
In masters, add:
xyz@master
In slaves, add:
xyz@slave01
xyz@slave02
Now you are done with the configuration of multi-node clusters of hadoop.
To start, you need to format the namenode of the HDFS location.
Run the following commands to foramt and start namenode,jobtracker,secondary namenode on master anddatanode,tasktracker on slaves.
To start, you need to format the namenode of the HDFS location.
Run the following commands to foramt and start namenode,jobtracker,secondary namenode on master anddatanode,tasktracker on slaves.
1
| xyz @master :~$ cd hadoop-1.2.1 |
1
| xyz @master :~/hadoop-1.2.1$ bin/hadoop namenode -format |
If permission is denied, then run
1
| xyz @master :~/hadoop-1.2.1$ chmod 755 bin/* |
and then run the above command.
Now start hadoop daemons as:
1
| xyz @master :~/hadoop-1.2.1$ bin/ start-all .sh |
Check the running processes using command jps on each machine.
Your configuration for multi-node cluster is complete now. You can run your example..!!
For any queries, please post. I would love to reply to your queries.
Comments
Post a Comment