<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4951644657805240812</id><updated>2012-02-16T14:34:40.300-08:00</updated><category term='Hadoop Installation Apache Setup Pseudo Single Cluster  MapReduce Pseudo-Distributed Cloudera Hive HBase'/><title type='text'>Computing Junkie</title><subtitle type='html'>Basics of Java, Distribtued Computing and J2EE</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://akkinenivijay.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4951644657805240812/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://akkinenivijay.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Vijay Akkineni</name><uri>http://www.blogger.com/profile/15000845775448444707</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>1</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4951644657805240812.post-4338632642076531331</id><published>2011-07-10T13:32:00.001-07:00</published><updated>2011-07-11T13:39:27.886-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hadoop Installation Apache Setup Pseudo Single Cluster  MapReduce Pseudo-Distributed Cloudera Hive HBase'/><title type='text'>Hadoop Setup - Pseudo Cluster Installation Setup</title><content type='html'>Hello Everybody, &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;In this post I would be going over hadoop installation on a single machine  for development purpose.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;       Start by creating a user to run hadoop as admin, though it is easy to run hadoop using your own account, I prefer creating a dedicated user to run hadoop. A user "hadoop" is created for this purpose.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;     Download and extract the latest hadoop release tar file  from the Apache  site &lt;a href="http://hadoop.apache.org/mapreduce/releases.html"&gt;http://hadoop.apache.org/mapreduce/releases.html&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;i&gt;&lt;b&gt;tar xzf hadoop-0.21.0.tar.gz&lt;/b&gt;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;&lt;i&gt;    &lt;/i&gt;Lets create an environmental variable HADOOP_INSTALL  to point to hadoop installation and update the path so that we can execute hadoop commands from any location.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;&lt;b&gt;export HADOOP_INSTALL=/Users/hadoop/hadoop-0.21.0&lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;b&gt; &lt;/b&gt;&lt;/span&gt;&lt;b&gt;export PATH=$PATH:$HADOOP_INSTALL/bin&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div style="text-align: justify;"&gt;   Hadoop, the default configuration is set for "&lt;b&gt;Standalone Mode&lt;/b&gt;". In the Standalone mode the default filesystem used is the local file system native to the operating system. Hadoop Configuration is mainly controlled by three files  &lt;i&gt;&lt;b&gt;core-site.xml, hdfs-site.xml and mapred.xml&lt;/b&gt;.&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;i&gt;    &lt;/i&gt;Core-site.xml is mainly responsible for hadoop-core settings and the possible parameters for the configuration can be found at &lt;b&gt;$HADOOP_INSTALL/common/docs/core-default.html. &lt;/b&gt;For Psuedo Cluster configuration add the following property to core-site.xml&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;fs.default.name = hdfs://localhost:9000&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size:100%;color:#FFFFFF;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px; -webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px;"&gt;&lt;span class="Apple-style-span" style="font-size:130%;color:#000000;"&gt;&lt;span class="Apple-style-span" style="font-size: 16px; white-space: pre; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;  &lt;/b&gt;Hdfs-site.xml is the configuration file that controls the hadoop distributed file system settings and the parameters can be found at &lt;b&gt;$HADOOP_INSTALL/hdfs/docs/hdfs-default.html&lt;/b&gt;. Pseudo Cluster setup properties are mentioned below.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;dfs.replication = 1&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;   Mapred-site.xml is the configuration file that controls the properties for Map Reduce Job Tracker and Task tracker settings and the properties can be found at &lt;b&gt;$HADOOP_INSTALL/docs/mapred-default.html. &lt;/b&gt;Pseudo Cluster setup properties are given below.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;mapred.job.tracker = localhost:9001&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: georgia; "&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span class="Apple-style-span" style="font-family: georgia; "&gt;Before adding any files to hdfs you can format the Hadoop file system with the command "&lt;i&gt;&lt;b&gt;hadoop namenode -format&lt;/b&gt;"&lt;/i&gt; . &lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;Hadoop requires SSH access to the local node as well as remote nodes, for our pseudo cluster setup we need to configure ssh access to localhost for the user "hadoop". Follow the steps below to create the rsa key pairs and add them to the authorized keys.&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="line-height: 15px; background-color: rgb(255, 255, 255); "&gt;&lt;span class="codefrag"&gt;&lt;span class="Apple-style-span"&gt;&lt;i&gt;&lt;b&gt;ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 15px; background-color: rgb(255, 255, 255); "&gt;&lt;span class="codefrag"&gt;&lt;i&gt;&lt;b&gt;cat ~/.ssh/id_dsa.pub &amp;gt;&amp;gt; ~/.ssh/authorized_keys&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 15px;"&gt;"&lt;i&gt;&lt;b&gt;ssh localhost&lt;/b&gt;&lt;/i&gt;" would confirm if the above steps have worked properly or not. To debug ssh you can issue "&lt;i&gt;ssh -vvv localhost&lt;/i&gt;" and investigate the error in detail.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 15px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: georgia; "&gt;Now hadoop is all set to run in a pseudo cluster configuration. &lt;/span&gt;Hadoop can be started with "start-all.sh" which starts all the required daemons. If you want to have a finer control on what components need to be started then you can start dfs with "start-dfs.sh" and mapred trackers with "start-mapred.sh" and configration files can also be passed as parameters to these scripts of you conf files lies outside the conf directory "&lt;i&gt;-config path-to-conf directory&lt;/i&gt;".&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Issue a "start-all.sh" and you can use the jvm process tool "jps" to monitor the java processes of hadoop have started as expected or not. The five components that will get started are NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hadoop comes with several web interfaces which are available at the below locations.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="background-color: rgb(255, 255, 255); "&gt;&lt;span class="Apple-style-span" style="border-style: initial; border-color: initial; "&gt;&lt;a href="http://localhost:50030/" style="font-family: sans-serif; "&gt;&lt;span class="Apple-style-span"&gt;http://localhost:50030/&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-family: sans-serif; "&gt; - &lt;/span&gt;&lt;span class="Apple-style-span"&gt;Web UI for MapReduce Job Trackers&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="background-color: rgb(255, 255, 255); "&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="border-style: initial; border-color: initial; "&gt;&lt;a href="http://localhost:50060/" style="font-family: sans-serif; "&gt;&lt;span class="Apple-style-span"&gt;http://localhost:50060/&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"&gt; - &lt;/span&gt;&lt;span class="Apple-style-span"&gt;Web UI for Task Trackers&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="background-color: rgb(255, 255, 255); "&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="border-style: initial; border-color: initial; "&gt;&lt;a href="http://localhost:50070/" style="font-family: sans-serif; "&gt;&lt;span class="Apple-style-span"&gt;http://localhost:50070/&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span"&gt; - &lt;/span&gt;&lt;span class="Apple-style-span"&gt;Web UI for HDFS name Nodes&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;In the next part of the series I will be posting on how to setup Cloudera's Distribution of Hadoop and HBase.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;Cheers,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family:georgia;"&gt;Vijay Akkineni.&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4951644657805240812-4338632642076531331?l=akkinenivijay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://akkinenivijay.blogspot.com/feeds/4338632642076531331/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4951644657805240812&amp;postID=4338632642076531331' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4951644657805240812/posts/default/4338632642076531331'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4951644657805240812/posts/default/4338632642076531331'/><link rel='alternate' type='text/html' href='http://akkinenivijay.blogspot.com/2011/07/hadoop-setup-pseudo-cluster.html' title='Hadoop Setup - Pseudo Cluster Installation Setup'/><author><name>Vijay Akkineni</name><uri>http://www.blogger.com/profile/15000845775448444707</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
