以三臺機器搭建的集群為例
1.查看三臺機器的進程
[hadoop@ruozedata001 hadoop]$ vi jps.sh
#!/bin/bash
echo "-----------------ruozedata001 process---------------"
ssh ruozedata001 "$JAVA_HOME/bin/jps"
echo " "
echo "-----------------ruozedata002 process---------------"
ssh ruozedata002 "$JAVA_HOME/bin/jps"
echo " "
echo "-----------------ruozedata003 process---------------"
ssh ruozedata003 "$JAVA_HOME/bin/jps"
echo " "
[hadoop@ruozedata001 hadoop]$ ./jps.sh
----------ruozedata001 process------------
3385 Jps
----------ruozedata002 process------------
2903 Jps
----------ruozedata003 process------------
2611 Jps
2.拷貝文件腳本
[hadoop@ruozedata001 hadoop]$ vi sync_hadoop.sh
#!/bin/bash -x
HADOOP_CONF=/home/hadoop/app/hadoop/etc/hadoop/
cd $HADOOP_CONF
# 此處由于機器少 可以直接一個一個寫 如果機器多的話 可以使用for循環(huán)
scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml ruozedata002:$HADOOP_CONF
scp hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml ruozedata003:$HADOOP_CONF
exit 0
3.啟動集群shell腳本
3.1 編寫啟動腳本
[hadoop@ruozedata001 hadoop]$ vi start_cluster.sh
#!/bin/bash -x
# 啟動zk
ssh ruozedata001 "$ZOOKEEPER_HOME/bin/zkServer.sh start"
ssh ruozedata002 "$ZOOKEEPER_HOME/bin/zkServer.sh start"
ssh ruozedata003 "$ZOOKEEPER_HOME/bin/zkServer.sh start"
# 睡眠5s zk啟動之后不能立馬就啟動hadoop
sleep 5
#start hdfs+yarn+jobhistory
/home/hadoop/app/hadoop/sbin/start-all.sh
sleep 5s
# 啟動ruozedata002機器上的resourcemanager historyserver
ssh ruozedata002 "/home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager"
/home/hadoop/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
# 退出
exit 0
3.2 執(zhí)行腳本并查看進程
# 執(zhí)行腳本
[hadoop@ruozedata001 hadoop]$ ./start_cluster.sh
# 啟動日志如下
+ ssh ruozedata001 '/home/hadoop/app/zookeeper/bin/zkServer.sh start'
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
+ ssh ruozedata002 '/home/hadoop/app/zookeeper/bin/zkServer.sh start'
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
+ ssh ruozedata003 '/home/hadoop/app/zookeeper/bin/zkServer.sh start'
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
+ sleep 5
+ /home/hadoop/app/hadoop/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [ruozedata001 ruozedata002]
ruozedata001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-namenode-ruozedata001.out
ruozedata002: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-namenode-ruozedata002.out
ruozedata001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-ruozedata001.out
ruozedata002: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-ruozedata002.out
ruozedata003: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-datanode-ruozedata003.out
Starting journal nodes [ruozedata001 ruozedata002 ruozedata003]
ruozedata001: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-journalnode-ruozedata001.out
ruozedata003: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-journalnode-ruozedata003.out
ruozedata002: starting journalnode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-journalnode-ruozedata002.out
Starting ZK Failover Controllers on NN hosts [ruozedata001 ruozedata002]
ruozedata001: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-zkfc-ruozedata001.out
ruozedata002: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/hadoop-hadoop-zkfc-ruozedata002.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-resourcemanager-ruozedata001.out
ruozedata003: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-ruozedata003.out
ruozedata002: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-ruozedata002.out
ruozedata001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-nodemanager-ruozedata001.out
+ sleep 5s
+ ssh ruozedata002 '/home/hadoop/app/hadoop/sbin/yarn-daemon.sh start resourcemanager'
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/yarn-hadoop-resourcemanager-ruozedata002.out
+ /home/hadoop/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/logs/mapred-hadoop-historyserver-ruozedata001.out
+ exit 0
# 執(zhí)行jps.sh 查看下各機器進程情況
[hadoop@ruozedata001 hadoop]$ ./jps.sh
----------ruozedata001 process------------
3265 Jps
2401 NameNode
3219 JobHistoryServer
2694 JournalNode
2504 DataNode
3066 NodeManager
----------ruozedata002 process------------
2787 ResourceManager
2628 NodeManager
2260 NameNode
2324 DataNode
2427 JournalNode
2829 Jps
----------ruozedata003 process------------
2357 JournalNode
2455 NodeManager
2263 DataNode
2585 Jps
# 查看進程會發(fā)現(xiàn)啟動有問題 缺少zookeeper zkfc
3.2.1 排查集群未啟動成功問題
3.2.1.1 查看zookeeper進程
# 通過ps -ef | grep zookeeper 查看進程
[hadoop@ruozedata001 hadoop]$ ps -ef | grep zookeeper
hadoop 3527 2111 0 15:08 pts/0 00:00:00 grep --color=auto zookeeper
# 未發(fā)現(xiàn)zookeeper進程
由于zk沒有啟動成功,導(dǎo)致zkfc也啟動失敗
3.2.1.2 查看zk啟動日志
# ① 查找zk配置文件 發(fā)現(xiàn)沒有配置日志目錄
[hadoop@ruozedata001 conf]$ cat zoo.cfg
# ② 查找conf目錄下的log4j.properties文件
[hadoop@ruozedata001 conf]$ cat log4j.properties
......
zookeeper.log.dir=.
zookeeper.log.file=zookeeper.log
......
# 但通過給定的路徑未找到日志文件 zookeeper.log
# ③ find /home/hadoop -name 'zookeeper.log'
[hadoop@ruozedata001 conf]$ find /home/hadoop -name 'zookeeper.log'
# 沒有任何信息打印 說明hadoop用戶下不存在
# 用root在根目錄下搜索
[root@ruozedata001 ~]# find / -name 'zookeeper.log'
# ④ 查看zk啟動腳本 zkServer.sh
[hadoop@ruozedata001 bin]$ vi zkServer.sh
# 由于是啟動 所以只需要看start那一塊腳本即可
......
_ZOO_DAEMON_OUT="$ZOO_LOG_DIR/zookeeper.out"
case $1 in
start)
echo -n "Starting zookeeper ... "
if [ -f "$ZOOPIDFILE" ]; then
if kill -0 `cat "$ZOOPIDFILE"` > /dev/null 2>&1; then
echo $command already running as process `cat "$ZOOPIDFILE"`.
exit 0
fi
fi
nohup "$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &
......
# 經(jīng)查找啟動腳本會發(fā)現(xiàn) 日志文件的后綴被更改成了zookeeper.out
# ⑤ 全局搜索下 zookeeper.out
[root@ruozedata001 ~]# find / -name 'zookeeper.out'
/home/hadoop/app/hadoop-2.6.0-cdh5.15.1/etc/hadoop/zookeeper.out
/home/hadoop/app/zookeeper-3.4.6/conf/zookeeper.out
/home/hadoop/zookeeper.out # 該文件為日志文件
[root@ruozedata001 ~]# cat /home/hadoop/zookeeper.out
nohup: failed to run command ‘java’: No such file or directory
# 此時問題找到
# ⑥ 分析錯誤 直接執(zhí)行ssh查找java 發(fā)現(xiàn)找不到j(luò)ava 但是卻可以找到環(huán)境變量 且 單獨執(zhí)行which java也是存在的
[hadoop@ruozedata001 ~]$ ssh ruozedata001 "which java"
which: no java in (/usr/local/bin:/usr/bin)
[hadoop@ruozedata001 ~]$ ssh ruozedata001 "echo $JAVA_HOME"
/usr/java/jdk1.8.0_40
[hadoop@ruozedata001 ~]$ which java
/usr/java/jdk1.8.0_40/bin/java
# ⑦ 繼續(xù)查看zkServer.sh啟動腳本 從頭開始看 主要為了找到zkServer.sh腳本中$JAVA在哪里賦值的
# 會發(fā)現(xiàn)如下部分
[hadoop@ruozedata001 bin]$ vi zkServer.sh
......
if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR/../libexec/zkEnv.sh"
else
. "$ZOOBINDIR/zkEnv.sh"
fi
......
# 查看zkEnv.sh 找到如下賦值
[hadoop@ruozedata001 bin]$ vi zkEnv.sh
......
#手動加一句打印 為了查看環(huán)境變量是否存在
echo "---------------java: $JAVA_HOME-------------"
if [ "$JAVA_HOME" != "" ]; then
JAVA="$JAVA_HOME/bin/java"
else
JAVA=java
fi
......
# 嘗試ssh啟動zk
[hadoop@ruozedata001 bin]$ ssh ruozedata001 "$ZOOKEEPER_HOME/bin/zkServer.sh start"
JMX enabled by default
---------------java: -------------
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
# 上面結(jié)果顯示未讀取到環(huán)境變量JAVA_HOME 且下面結(jié)果中沒有zk進程
[hadoop@ruozedata001 bin]$ ps -ef | grep zookeeper
hadoop 4284 3989 0 15:58 pts/2 00:00:00 grep --color=auto zookeeper
# ⑧ 解決方案:
# 1.在zkEnv.sh中寫死java路徑
[hadoop@ruozedata001 bin]$ vi zkEnv.sh
if [ "$JAVA_HOME" != "" ]; then
JAVA="$JAVA_HOME/bin/java"
else
JAVA=/usr/java/jdk1.8.0_40/bin/java
fi
# 2.將java環(huán)境變量配置到hadoop用戶的 ~/.bashrc中
[hadoop@ruozedata001 bin]$ vi ~/.bashrc
export JAVA_HOME=/usr/java/jdk1.8.0_40
export PATH=$JAVA_HOME/bin:$PATH
# 驗證下是可以讀取到的
[hadoop@ruozedata001 bin]$ ssh ruozedata001 "which java"
/usr/java/jdk1.8.0_40/bin/java
[hadoop@ruozedata001 bin]$ ssh ruozedata001 "$ZOOKEEPER_HOME/bin/zkServer.sh start"
JMX enabled by default
---------------java: /usr/java/jdk1.8.0_40------------- # 可以讀取到
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@ruozedata001 bin]$ jps
4507 Jps
4478 QuorumPeerMain # zk進程存在
# 對應(yīng)的將~/.bashrc同步到另兩臺機器上面
[hadoop@ruozedata001 bin]$ scp ~/.bashrc ruozedata002:/home/hadoop/
.bashrc 100% 306 0.3KB/s 00:00
[hadoop@ruozedata001 bin]$ scp ~/.bashrc ruozedata003:/home/hadoop/
.bashrc 100% 306 0.3KB/s 00:00
3.2.2 補充ssh知識點
3.2.2.1 ssh執(zhí)行遠(yuǎn)程命令和腳本
三種環(huán)境變量配置
全局: /etc/profile
個人: ~/.bash_profile 或者 ~/.bashrc
?????? bash模式:加載環(huán)境變量配置文件:個人~/.bashrc
3.3. 再次通過shell腳本啟動集群
# 啟動前先關(guān)閉
[hadoop@ruozedata001 hadoop]$ ./stop_cluster.sh
# 執(zhí)行啟動腳本
[hadoop@ruozedata001 hadoop]$ ./start_cluster.sh
# 執(zhí)行jps.sh驗證下 zk zkfc都已啟動成功
[hadoop@ruozedata001 hadoop]$ ./jps.sh
----------ruozedata001 process------------
5186 QuorumPeerMain
5330 NameNode
6373 JobHistoryServer
5625 JournalNode
6025 NodeManager
5434 DataNode
5916 ResourceManager
6445 Jps
5805 DFSZKFailoverController
----------ruozedata002 process------------
4496 NodeManager
4659 ResourceManager
4292 JournalNode
4197 DataNode
4123 NameNode
4060 QuorumPeerMain
4413 DFSZKFailoverController
4717 Jps
----------ruozedata003 process------------
3090 JournalNode
3186 NodeManager
2995 DataNode
3319 Jps
2938 QuorumPeerMain
4.關(guān)閉集群
[hadoop@ruozedata001 hadoop]$ vi stop_cluster.sh
#!/bin/bash -x
#stop history+yarn+hdfs
/home/hadoop/app/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
ssh ruozedata002 "/home/hadoop/app/hadoop/sbin/yarn-daemon.sh stop resourcemanager"
/home/hadoop/app/hadoop/sbin/stop-all.sh
#stop zk
ssh ruozedata001 "$ZOOKEEPER_HOME/bin/zkServer.sh stop"
ssh ruozedata002 "$ZOOKEEPER_HOME/bin/zkServer.sh stop"
ssh ruozedata003 "$ZOOKEEPER_HOME/bin/zkServer.sh stop"
# 調(diào)用jps.sh腳本 查看進程
./jps.sh
exit 0
# 執(zhí)行腳本
[hadoop@ruozedata001 hadoop]$ ./stop_cluster.sh
+ /home/hadoop/app/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
stopping historyserver
+ ssh ruozedata002 '/home/hadoop/app/hadoop/sbin/yarn-daemon.sh stop resourcemanager'
stopping resourcemanager
+ /home/hadoop/app/hadoop/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [ruozedata001 ruozedata002]
ruozedata001: stopping namenode
ruozedata002: stopping namenode
ruozedata001: stopping datanode
ruozedata003: stopping datanode
ruozedata002: stopping datanode
Stopping journal nodes [ruozedata001 ruozedata002 ruozedata003]
ruozedata001: stopping journalnode
ruozedata002: stopping journalnode
ruozedata003: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [ruozedata001 ruozedata002]
ruozedata002: stopping zkfc
ruozedata001: stopping zkfc
stopping yarn daemons
stopping resourcemanager
ruozedata002: stopping nodemanager
ruozedata003: stopping nodemanager
ruozedata001: stopping nodemanager
no proxyserver to stop
+ ssh ruozedata001 '/home/hadoop/app/zookeeper/bin/zkServer.sh stop'
JMX enabled by default
---------------java: /usr/java/jdk1.8.0_40-------------
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
+ ssh ruozedata002 '/home/hadoop/app/zookeeper/bin/zkServer.sh stop'
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
+ ssh ruozedata003 '/home/hadoop/app/zookeeper/bin/zkServer.sh stop'
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
+ ./jps.sh
----------ruozedata001 process------------
7132 Jps
----------ruozedata002 process------------
5142 Jps
----------ruozedata003 process------------
3483 Jps
+ exit 0