FMW and hadoop knowledge sharing: HADOOP INSTALLATION

Hi All,

Hadoop installation .

HADOOP INSTALLATIPON AND CONFIGURATION:-

Step 1 :- Goto Cloudera.oracle.com Download hadoop-0.20.2-cdh3u3.tar file and other components as PIG and HIVE .

Step 2 :- Untar the Hadoop and name the installation Directory as HADOOP_HOME.

Step 3 :- Now here we can configure the 3 different types of Environment’s.

1) Single

2) Pseudo

3) Clustered or Distributed Environment.

Here we have Configure a clustered environment .For this cluster Set-up depending on the Nodes we can deal with it.

Concepts Need to know before this cluster set-up :-

Terminology:-

1) Master Node

2) Secondary Node

3) Job-Tracker

4) Task-Tracker

5) Data-Node

We Here Configuring only Two Node’s So, Always Remember MASTER and JOB-TRACKER will be on One Node and TASK_TRACKER and Data-Node will be on Other Node and Secondary will be a replica of the Name-Node now it doesn’t mean that it is a back-up for the name-node but it stored the check-point We can recover if something happen’s to the Name-Node.

Note :-

========

The secondary name-node can be run on the same machine as the name-node, but again

For reasons of memory usage (the secondary has the same memory requirements as the

Primary), it is best to run it on a separate piece of hardware, especially for larger clusters.

(This topic is discussed in more detail in “Master node scenarios” on page 254.) Machines

Running the name-nodes should typically run on 64-bit hardware to avoid the 3

GB limit on Java heap size in 32-bit architectures

Steps to configure the HADDOP FILE SYSTEM (CDH3) :-

First Step :-

======

1) SSH CONFIGURATION FROM THE MASTER NODE TO SLAVE NODES.

[hadoop-user@master]$ Which ssh

/usr/bin/ssh

[hadoop-user@master]$ which sshd

/usr/bin/sshd

[hadoop-user@master]$ which ssh-keygen

/usr/bin/ssh-keygen

[hadoop-user@master]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop-user/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again

Your identification has been saved in /home/hadoop-user/.ssh/id_rsa.

Your public key has been saved in /home/hadoop-user/.ssh/id_rsa.pub.

After creating your key pair, your public key will be of the form

[hadoop-user@master]$ more /home/hadoop-user/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA1WS3RG8LrZH4zL2/1oYgkV1OmVclQ2OO5vRi0Nd

K51Sy3wWpBVHx82F3x3ddoZQjBK3uvLMaDhXvncJG31JPfU7CTAfmtgINYv0kdUbDJq4TKG/fuO5q

J9CqHV71thN2M310gcJ0Y9YCN6grmsiWb2iMcXpy2pqg8UM3ZKApyIPx99O1vREWm+4moFTg

YwIl5be23ZCyxNjgZFWk5MRlT1p1TxB68jqNbPQtU7fIafS7Sasy7h4eyIy7cbLh8x0/V4/mcQsY

5dvReitNvFVte6onl8YdmnMpAh6nwCvog3UeWWJjVZTEBFkTZuV1i9HeYHxpm1wAzcnf7az78jT

IRQ== hadoop-user@master

Distribute public key and validate logins :-

[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key

[hadoop-user@target]$ mkdir ~/.ssh

[hadoop-user@target]$ chmod 700 ~/.ssh

[hadoop-user@target]$ mv ~/master key ~/.ssh/authorized_keys

[hadoop-user@target]$ chmod 600 ~/.ssh/authorized_keys

After generating the key, you can verify it’s correctly defined by attempting to log in to

the target node from the master:

[hadoop-user@master]$ ssh target

The authenticity of host 'target (xxx.xxx.xxx.xxx)' can’t be established.

RSA key fingerprint is 72:31:d8:1b:11:36:43:52:56:11:77:a4:ec:82:03:1d.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'target' (RSA) to the list of known hosts.

Last login: Sun Jan 4 15:32:22 2009 from master

After confirming the authenticity of a target node to the master node, you won’t be

prompted upon subsequent login attempts.

[hadoop-user@master]$ ssh target

Last login: Sun Jan 4 15:32:49 2009 from master

Configuring the OCnfiguration Files for the Cluster setup

Note :-

When we un-tar the tar file , under CONF directory we can only see the Hadoop-default.xml , hdfs-default.xml ,mapred-default.xml .

We can change these main configuration files to Hadoop-site.xml , mapred-site.xml , hdfs-site.xml .

We need to configure a few things before running Hadoop. Let’s take a closer look at

the Hadoop configuration directory :

[hadoop-user@master]$ cd $HADOOP_HOME

[hadoop-user@master]$ ls -l conf/

-rw-rw-r-- 1 hadoop-user hadoop 2065 Dec 1 10:07 capacity-scheduler.xml

-rw-rw-r-- 1 hadoop-user hadoop 535 Dec 1 10:07 configuration.xsl

-rw-rw-r-- 1 hadoop-user hadoop 49456 Dec 1 10:07 hadoop-default.xml

-rwxrwxr-x 1 hadoop-user hadoop 2314 Jan 8 17:01 hadoop-env.sh

-rw-rw-r-- 1 hadoop-user hadoop 2234 Jan 2 15:29 hadoop-site.xml

-rw-rw-r-- 1 hadoop-user hadoop 2815 Dec 1 10:07 log4j.properties

-rw-rw-r-- 1 hadoop-user hadoop 28 Jan 2 15:29 masters

-rw-rw-r-- 1 hadoop-user hadoop 84 Jan 2 15:29 slaves

-rw-rw-r-- 1 hadoop-user hadoop 401 Dec 1 10:07 sslinfo.xml.example

The Below are the Main set-up which is supposed to be in the *.xml configuration Files :-

core-site.xml

The key differences are

■ We explicitly stated the hostname for location of the NameNode q and

JobTracker w daemons.

■ We increased the HDFS replication factor to take advantage of distributed

storage e. Recall that data is replicated across HDFS to increase availability and

reliability.

We also need to update the masters and slaves files to reflect the locations of the other

daemons.

[hadoop-user@master]$ cat masters

backup

[hadoop-user@master]$ cat slaves

hadoop1(change these according to our hostname )

hadoop2

hadoop3

After this Goto $HADOOP_HOME/bin/start-all.sh

It will start all the nodes in master and slaves.

Note: - In Default the configuration File name would be Hadoop-default.xml. You want to do some changes you can copy the existing Hadoop

Ok. So after configuring all the above , we are ready to bring up the services .

Now export the HADOOP_HOME/bin to the PATH

Export PATH = $HADOOP_HOME/bin:$PATH

Bringing up the services: - Hadoop start-all.sh

Once we issue this command it will bring up the services on the nodes mentioned in Masters and slaves files.

How can we Validate:-

Once the all the services are up we get the web interfaces URL’s :-

http://localhost:10930/jobtracker.jsp

http://localhost:10960/tasktracker.jsp

http://localhost:10970/dfshealth.jsp

Regards,

Naga.

FMW and hadoop knowledge sharing

Saturday, 19 May 2012

HADOOP INSTALLATION

No comments:

Post a Comment