搭建Hadoop的HDFS HA及YARN HA集群,基于2.7.1版本安裝。


安裝規(guī)劃
| 角色規(guī)劃 | IP/機器名 | 安裝軟件 | 運行進程 |
|---|---|---|---|
| namenode1 | zdh-240 | hadoop | NameNode、DFSZKFailoverController、ResourceManager |
| namenode2 | zdh-245 | hadoop | NameNode、DFSZKFailoverController、ResourceManager |
| datanode1 | zdh-237 | hadoop,zookeeper | DataNode、QuorumPeerMain、JournalNode、NodeManager |
| datanode2 | zdh-238 | hadoop,zookeeper | DataNode、QuorumPeerMain、JournalNode、NodeManager |
| datanode3 | zdh-239 | hadoop,zookeeper | DataNode、QuorumPeerMain、JournalNode、NodeManager |
安裝用戶
garrison/zdh1234
配置IP對應節(jié)點名稱
cat /etc/hosts
vi /etc/hosts
10.43.159.237 zdh-237
10.43.159.238 zdh-238
10.43.159.239 zdh-239
10.43.159.240 zdh-240
10.43.159.245 zdh-245
1.創(chuàng)建用戶(所有的用戶必須同名)
groupadd hadoop
useradd -g hadoop -s /bin/bash -md /home/garrison garrison
passwd garrison
2.設置本地無密碼登陸
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
驗證免密登陸
ssh localhost
設置遠程無密碼登陸,需要把本機的公鑰放到對方的機器authorized_keys,才能免密登陸其他機器。
進入zdh-238的garrison
scp ~/.ssh/authorized_keys garrison@zdh-237:~/.ssh/authorized_keys_from_zdh-238
進入zdh-237 garrison的.ssh目錄,注意備份,否則下面步驟存在重復的ywmaster公鑰。
cat authorized_keys_from_zdh-238 >> authorized_keys
進入zdh-238的garrison,這樣就能在zdh-238上面免密登陸zdh-237
ssh zdh-237
其他zdh-239等同理復制到zdh-237上面,實現(xiàn)其他機器免密登陸zdh-237
再把zdh-237上面的authorized_keys分發(fā)到其他zdh-238等上面,實現(xiàn)幾臺機器都能免密登陸
scp ~/.ssh/authorized_keys garrison@zdh-238:~/.ssh/authorized_keys
3.拷貝安裝包
scp root@10.43.159.41:/home/xiehh/.tar.gz .
scp root@10.43.159.41:/home/ling/java/jdk-7u80-linux-x64.tar.gz .
scp garrison@zdh-237:/home/garrison/.tar.gz .
4.安裝jdk
tar -zxvf jdk-7u80-linux-x64.tar.gz
mv jdk-7u80-linux-x64.tar.gz backup/
/home/garrison/jdk1.7.0_80
配置環(huán)境變量,必須放在.bashrc里面,否則通過后臺執(zhí)行找不到環(huán)境變量。
創(chuàng)建.bash_profile,加載.bashrc
vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
vi .bashrc
export JAVA_HOME=~/jdk1.7.0_80
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source .bashrc
驗證jdk
java -version
5.安裝Zookeeper
解壓zookeeper包
tar -zxvf zookeeper-3.5.1-alpha.tar.gz
mv zookeeper-3.5.1-alpha.tar.gz backup/
在zookeeper-3.5.1-alpha/conf/目錄執(zhí)行
mv zoo_sample.cfg zoo.cfg
修改zoo.cfg文件,(默認服務端口2181,zookeeper修改源數(shù)據(jù)的地方,包括myid文件):
dataDir=/home/garrison/zookeeper-3.5.1-alpha/tmp
文件最后添加,配置zookeeper集群通信端口:
server.1=zdh-237:2888:3888
server.2=zdh-238:2888:3888
server.3=zdh-239:2888:3888
然后分別在zdh-237三個節(jié)點中創(chuàng)建一個tmp文件夾:
mkdir ~/zookeeper-3.5.1-alpha/tmp:
再創(chuàng)建一個空文件:
touch /tmp/myid
把zdh-237的zookeeper拷貝到zdh-238等節(jié)點
scp -r garrison@zdh-237:/home/garrison/zookeeper-3.5.1-alpha .
最后向該文件寫入ID:
zdh-237執(zhí)行echo 1 > /tmp/myid
zdh-238執(zhí)行echo 2 > /tmp/myid
zdh-239執(zhí)行echo 3 > /tmp/myid
配置環(huán)境變量方便以后操作:
export ZOOKEEPER_HOME=:~/zookeeper-3.5.1-alpha
export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
啟動zookeeper,進入到 zookeeper-3.4.5/bin/
./zkServer.sh start
查看狀態(tài):
./zkServer.sh status
停止zookeeper:
./zkServer.sh stop
驗證,客戶端登陸:
./zkCli.sh -server zdh-237:2181
列出目錄:
ls /
6.安裝hadoop
tar -zxvf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1.tar.gz backup/
配置環(huán)境變量
export HADOOP_HOME=~/hadoop-2.7.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
別名方便進入配置操作
alias conf='cd /home/garrison/hadoop-2.7.1/etc/hadoop'
修改文件列表:
- 6.1 core-site.xml
- 6.2 hdfs-site.xml
- 6.3 yarn-site.xml
- 6.4 mapred-site.xml
- 6.5 slaves
zdh-237:2181,zdh-238:2181,zdh-239:2181
/data/hadoop/dfs/name/current 等目錄如果不存在則需要修改,一般加上前綴/home/garrison/hadoop-2.7.1
6.1-6.4的配置可以參考文檔:
搭建hadoop2.6.0 HA及YARN HA
http://www.aboutyun.com/thread-10572-1-1.html
6.4
新增mapred-site.xml
6.5
修改slaves (配置所有slave節(jié)點)
zdh-237
zdh-238
zdh-239
拷貝zdh-240的hadoop到其他節(jié)點。
scp -r garrison@zdh-240:/home/garrison/hadoop-2.7.1 .
rm -r /home/garrison/hadoop-2.7.1/etc/hadoop
scp -r garrison@zdh-240:~/hadoop-2.7.1/etc/hadoop ~/hadoop-2.7.1/etc/hadoop
集群機器配置hadoop環(huán)境變量
修改zdh-245的yarn-site.xml的rm1為rm2
<!--在namenode1上配置rm1,在namenode2上配置rm2,
注意:一般都喜歡把配置好的文件遠程復制到其它機器上,
但這個在YARN的另一個機器上一定要修改-->
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>
啟動journalnode(在namenode1上啟動所有journalnode)
進入到hadoop-2.6.0
sbin/hadoop-daemons.sh start journalnode
或者單獨進入到datanode1,datanode2,datanode3執(zhí)行
sbin/hadoop-daemon.sh start journalnode
停止journalnode:
hadoop-daemon.sh stop journalnode
(運行jps命令檢驗,多了JournalNode進程)
7.啟動hadoop和yarn
格式化HDFS,在namenode1上執(zhí)行命令:
hadoop namenode -format
格式化后會在根據(jù)core-site.xml中的hadoop.tmp.dir配置生成個文件
啟動namenode進程,在namenode1上執(zhí)行
sbin/hadoop-daemon.sh start namenode
在namenode2上執(zhí)行,完成主備節(jié)點同步信息
hdfs namenode -bootstrapStandby
格式化ZK(在namenode1上執(zhí)行即可),會在zookeeper集群上面創(chuàng)建節(jié)點hadoop-ha,
用于管理切換主備namenode
hdfs zkfc -formatZK
啟動HDFS(在namenode1上執(zhí)行)
sbin/start-dfs.sh
啟動YARN(在namenode1和namenode2上執(zhí)行)
sbin/start-yarn.sh
注意在namenode2上執(zhí)行此命令時會提示NodeManager已存在等信息不用管這些,
主要是啟動namenode2上的resourceManager完成與namenode1的互備作用,目前沒有找到單獨啟動resourceManager的方法
8.查看結(jié)點狀態(tài)
啟動后查看namenode分別為Active和Standby
http://10.43.159.240:50070
http://10.43.159.245:50070
在namenode1上查看nm1和nm2狀態(tài):
hdfs haadmin -getServiceState nn1
hdfs haadmin -getServiceState nn2
在namenode1上查看rm1和rm2分別為active和standby狀態(tài)
yarn rmadmin -getServiceState rm1
yarn rmadmin -getServiceState rm2
或者查看狀態(tài):
http://10.43.159.240:8188
http://10.43.159.245:8188 會重定向到zdh-240
hadoop
zdh-240 active
zdh-245 standby
yarn
zdh-240 rm1 active
zdh-245 rm2 standby
簡單驗證hadoop
hadoop jar ~/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount hdfs://gagcluster/usr/wordcout.txt /usr/wordresult_001
hadoop fs -text /usr/wordresult_001/part-r-00000
9.驗證HA
yarn kill zdh-240 上面的ResourceManager
http://10.43.159.240:8188/cluster/apps 無法訪問;rm1 鏈接失敗
http://10.43.159.245:8188/cluster/apps 可以訪問;rm2 active
hadoop kill zdh-240 上面的NameNode
手動切換主備時,確定要轉(zhuǎn)為active的namenode的id,這里將namenode1設為active:
hdfs haadmin -failover --forcefence --forceactive nn2 nn1
自動切換主備時,需要重新啟動被kill的active節(jié)點,standby節(jié)點才會變?yōu)閍ctive,
原來被kill的active節(jié)點變成standby。
另外一種方法就是,關閉當前為 active namenode 狀態(tài)的上的 DFSZKFailoverController 進程,
在需要變成standby的hdfs上面執(zhí)行:
hadoop-daemon.sh stop zkfc
10.配置免密登陸的其他方法
10.1.生成ssh公私鑰文件
操作機器:
在zte-1、zte-2、zte-3上,使用hdfs用戶 ,家目錄下
操作命令:
ssh-keygen
操作說明:
該命令執(zhí)行完后應按三次Enter鍵,即三次需要輸入的皆為空即可
10.2.為hdfs用戶配置ssh免密碼登錄
操作機器:
在zte-1、zte-2、zte-3上,使用hdfs用戶
操作命令:
ssh-copy-id hdfs@zdh-7
ssh-copy-id hdfs@zdh-9
ssh-copy-id hdfs@zdh-11
操作說明:
交互(yes/no)需要輸入yes,提示輸入密碼需要輸入密碼。
當出現(xiàn)類似:
Now try logging into the machine, with "ssh 'hdfs@zdh-1'",
and check in:.ssh/authorized_keys
則表示成功,如果顯示….can’t established則表示發(fā)生錯誤。
10.3.上述操作可以優(yōu)化
分別在zdh-7,zdh-9,zdh-11上面
ssh-copy-id -i hdfs@zdh-7,
再把zdh-7上面的.ssh/authorized_keys,.ssh/known_hosts拷貝到其他機器即可。
參考:
6.1 core-site.xml中增加如下配置項
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://gagcluster</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/garrison/hadoop-2.7.1/data/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zdh-237:2181,zdh-238:2181,zdh-239:2181</value>
</property>
</configuration>
6.2 hdfs-site.xml增加如下配置項
<configuration>
<property>
<name>dfs.nameservices</name>
<value>gagcluster</value>
</property>
<property>
<name>dfs.ha.namenodes.gagcluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn1</name>
<value>zdh-240:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn2</name>
<value>zdh-245:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.gagcluster.nn1</name>
<value>zdh-240:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.gagcluster.nn2</name>
<value>zdh-245:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://zdh-237:8485;zdh-238:8485;zdh-239:8485/gagcluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.gagcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/garrison/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/garrison/hadoop-2.7.1/data/hadoop/tmp/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/garrison/hadoop-2.7.1/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/garrison/hadoop-2.7.1/data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zdh-237:2181,zdh-238:2181,zdh-239:2181</value>
</property>
</configuration>
6.3 yarn-site.xml增加如下配置項
<configuration>
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>zdh-237:2181,zdh-238:2181,zdh-239:2181</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>zdh-240</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>zdh-245</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>zdh-237:2181,zdh-238:2181,zdh-239:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zdh-237:2181,zdh-238:2181,zdh-239:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>gagcluster-yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>zdh-240:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>zdh-240:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>zdh-240:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>zdh-240:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>zdh-240:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>zdh-240:23142</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>zdh-245:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>zdh-245:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>zdh-245:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>zdh-245:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>zdh-245:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>zdh-245:23142</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/garrison/hadoop-2.7.1/data/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/garrison/hadoop-2.7.1/data/log/hadoop</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optional setting. The default value is /yarn-leader-election</description>
</property>
</configuration>
6.4 mapred-site.xml增加如下配置項
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>