Hi All,
Hadoop installation .
Hadoop installation .
HADOOP INSTALLATIPON AND
CONFIGURATION:-
Step 1 :- Goto Cloudera.oracle.com Download
hadoop-0.20.2-cdh3u3.tar file and other components as PIG and HIVE .
Step 2 :- Untar the Hadoop and name the installation Directory as
HADOOP_HOME.
Step 3 :- Now here we can configure the 3 different types of
Environment’s.
1)
Single
2)
Pseudo
3)
Clustered or Distributed Environment.
Here we have Configure a
clustered environment .For this cluster Set-up depending on the Nodes we can deal
with it.
Concepts Need to know before
this cluster set-up :-
Terminology:-
1)
Master Node
2)
Secondary Node
3)
Job-Tracker
4)
Task-Tracker
5)
Data-Node
We Here Configuring only Two
Node’s So, Always Remember MASTER and JOB-TRACKER will be on One Node
and TASK_TRACKER and Data-Node will be on Other Node and Secondary will be a
replica of the Name-Node now it doesn’t mean that it is a back-up for the
name-node but it stored the check-point We can recover if something happen’s to
the Name-Node.
Note :-
========
The secondary name-node can be run on the
same machine as the name-node, but again
For reasons of memory usage (the secondary
has the same memory requirements as the
Primary), it is best to run it on a
separate piece of hardware, especially for larger clusters.
(This topic is discussed in more detail in
“Master node scenarios” on page 254.) Machines
Running the name-nodes should typically
run on 64-bit hardware to avoid the 3
GB limit
on Java heap size in 32-bit architectures
Steps to configure the HADDOP
FILE SYSTEM (CDH3) :-
First Step :-
======
1)
SSH CONFIGURATION
FROM THE MASTER NODE TO SLAVE NODES.
[hadoop-user@master]$ Which ssh
/usr/bin/ssh
[hadoop-user@master]$ which sshd
/usr/bin/sshd
[hadoop-user@master]$ which ssh-keygen
/usr/bin/ssh-keygen
[hadoop-user@master]$ ssh-keygen -t rsa
Generating
public/private rsa key pair.
Enter file in which to save the key
(/home/hadoop-user/.ssh/id_rsa):
Enter passphrase (empty for no
passphrase):
Enter same passphrase again
Your identification has been saved in
/home/hadoop-user/.ssh/id_rsa.
Your public key has been saved in
/home/hadoop-user/.ssh/id_rsa.pub.
After creating your key pair, your public
key will be of the form
[hadoop-user@master]$ more
/home/hadoop-user/.ssh/id_rsa.pub
ssh-rsa
AAAAB3NzaC1yc2EAAAABIwAAAQEA1WS3RG8LrZH4zL2/1oYgkV1OmVclQ2OO5vRi0Nd
K51Sy3wWpBVHx82F3x3ddoZQjBK3uvLMaDhXvncJG31JPfU7CTAfmtgINYv0kdUbDJq4TKG/fuO5q
J9CqHV71thN2M310gcJ0Y9YCN6grmsiWb2iMcXpy2pqg8UM3ZKApyIPx99O1vREWm+4moFTg
YwIl5be23ZCyxNjgZFWk5MRlT1p1TxB68jqNbPQtU7fIafS7Sasy7h4eyIy7cbLh8x0/V4/mcQsY
5dvReitNvFVte6onl8YdmnMpAh6nwCvog3UeWWJjVZTEBFkTZuV1i9HeYHxpm1wAzcnf7az78jT
IRQ== hadoop-user@master
Distribute public key and validate logins :-
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub
hadoop-user@target:~/master_key
[hadoop-user@target]$ mkdir ~/.ssh
[hadoop-user@target]$ chmod 700 ~/.ssh
[hadoop-user@target]$ mv ~/master key
~/.ssh/authorized_keys
[hadoop-user@target]$ chmod 600
~/.ssh/authorized_keys
After generating the key, you can verify
it’s correctly defined by attempting to log in to
the target node from the master:
[hadoop-user@master]$ ssh target
The authenticity of host 'target
(xxx.xxx.xxx.xxx)' can’t be established.
RSA key fingerprint is
72:31:d8:1b:11:36:43:52:56:11:77:a4:ec:82:03:1d.
Are you sure you want to continue
connecting (yes/no)? yes
Warning: Permanently added 'target' (RSA)
to the list of known hosts.
Last login: Sun Jan 4 15:32:22 2009 from
master
After confirming the authenticity of a
target node to the master node, you won’t be
prompted upon subsequent login attempts.
[hadoop-user@master]$ ssh target
Last login: Sun Jan 4 15:32:49 2009 from master
Configuring the OCnfiguration Files for the Cluster setup
Note :-
When we un-tar the tar file , under CONF directory we can only see the
Hadoop-default.xml , hdfs-default.xml ,mapred-default.xml .
We can change these main configuration files to Hadoop-site.xml ,
mapred-site.xml , hdfs-site.xml .
We need to configure a few things before
running Hadoop. Let’s take a closer look at
the Hadoop configuration directory :
[hadoop-user@master]$ cd $HADOOP_HOME
[hadoop-user@master]$ ls -l conf/
-rw-rw-r-- 1 hadoop-user hadoop 2065 Dec 1 10:07
capacity-scheduler.xml
-rw-rw-r-- 1 hadoop-user hadoop 535 Dec 1 10:07 configuration.xsl
-rw-rw-r-- 1 hadoop-user hadoop 49456 Dec 1 10:07 hadoop-default.xml
-rwxrwxr-x 1 hadoop-user hadoop 2314 Jan 8 17:01 hadoop-env.sh
-rw-rw-r-- 1 hadoop-user hadoop 2234 Jan 2 15:29 hadoop-site.xml
-rw-rw-r-- 1 hadoop-user hadoop 2815 Dec 1 10:07 log4j.properties
-rw-rw-r-- 1 hadoop-user hadoop 28 Jan 2 15:29 masters
-rw-rw-r-- 1 hadoop-user hadoop 84 Jan 2 15:29 slaves
-rw-rw-r-- 1 hadoop-user hadoop 401 Dec 1 10:07 sslinfo.xml.example
The Below are the Main set-up which is supposed to be in the
*.xml configuration Files :-
core-site.xml
The key differences are
■ We explicitly stated the hostname for location of the
NameNode q and
JobTracker w daemons.
■ We increased the HDFS replication factor to take advantage
of distributed
storage e. Recall that data is replicated across HDFS to increase
availability and
reliability.
We also need to update the
masters and slaves files to reflect the locations of the other
daemons.
[hadoop-user@master]$ cat
masters
backup
[hadoop-user@master]$ cat
slaves
hadoop1(change these according
to our hostname )
hadoop2
hadoop3
After this Goto $HADOOP_HOME/bin/start-all.sh
It will start all the nodes in master and slaves.
Note: - In Default the configuration File name would be
Hadoop-default.xml. You want to do some changes you can copy the existing
Hadoop
Ok. So after configuring all the above , we are ready to bring up the services .
Now export the HADOOP_HOME/bin to the PATH
Export PATH = $HADOOP_HOME/bin:$PATH
Bringing up the services: -
Hadoop start-all.sh
Once we issue this command it will bring up the services on the nodes
mentioned in Masters and slaves files.
How can we Validate:-
Once the all the services are up we get the web interfaces URL’s :-
Regards,
Naga.


No comments:
Post a Comment