一、系統(tǒng)參數(shù)配置優(yōu)化
1、系統(tǒng)內(nèi)核參數(shù)優(yōu)化配置
修改文件/etc/sysctl.conf,添加如下配置,然后執(zhí)行sysctl -p命令使配置生效
net.ipv4.conf.all.arp_notify = 1
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
2、修改Linux最大限制
追加如下配置到文件/etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
3、磁盤I/O優(yōu)化調(diào)整
Linux磁盤I/O調(diào)度器對磁盤的訪問支持不同的策略,默認(rèn)的為CFQ,建議設(shè)置為deadline。
我這里是sda磁盤,你需要根據(jù)你的磁盤進(jìn)行IO調(diào)度策略設(shè)置,如下設(shè)置:
#echo deadline > /sys/block/sda/queue/scheduler
如果想永久生效,加入到/etc/rc.local即可。
以上3步都配置完畢后,重啟操作系統(tǒng)生效。
二、安裝前環(huán)境配置
1、部署環(huán)境清單說明

2、設(shè)置主機(jī)名
192.168.10.91:
#hostname hadoop-nn
#echo "hostname hadoop-nn" >> /etc/rc.local
192.168.10.92:
#hostname hadoop-snn
#echo "hostname hadoop-snn" >> /etc/rc.local
192.168.10.93:
#hostname hadoop-dn-01
#echo "hostname hadoop-dn-01" >> /etc/rc.local
?3、關(guān)閉防火墻(所有節(jié)點(diǎn))
#systemctl stop firewalld
#systemctl disable firewalld
4、關(guān)閉SELinux(所有節(jié)點(diǎn))
#setenforce 0
#sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
5、添加以下配置到/etc/hosts(所有節(jié)點(diǎn))
192.168.10.91 hadoop-nn master
192.168.10.92 hadoop-snn slave01
192.168.10.93 hadoop-dn-01 slave02
6、NTP時(shí)間同步
在Hadoop namenode節(jié)點(diǎn)安裝ntp服務(wù)器,然后其他各個節(jié)點(diǎn)都同步namenode節(jié)點(diǎn)的時(shí)間。
192.168.10.91:
#yum -y install ntp
#systemctl start ntpd
#systemctl enable ntpd
然后在其他節(jié)點(diǎn)同步ntp時(shí)間。
192.168.10.92;192.168.10.93:
#yum -y install ntp
#ntpdate hadoop-nn
添加一個計(jì)劃任務(wù),Hadoop需要各個節(jié)點(diǎn)時(shí)間的時(shí)間都是一致的,切記。
三、開始部署Hadoop
1、 安裝JAVA(所有節(jié)點(diǎn))
#yum -y install java java-devel
查看java版本,確保此命令沒有問題
#java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
另外openjdk安裝后,不會默許設(shè)置JAVA_HOME環(huán)境變量,要查看安裝后的目錄,可以用命令。
#update-alternatives --config jre_openjdk
There is 1 program that provides 'jre_openjdk'.
? Selection? ? Command
-----------------------------------------------
*+ 1? ? ? ? ? java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre)
默認(rèn)jre目錄為:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre
設(shè)置環(huán)境變量,可用編輯/etc/profile.d/java.sh
#!/bin/bash
#
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
export CLASSPATH=$JAVA_HOME/lib/rt.jar:$JAVA_HOME/../lib/dt.jar:$JAVA_HOME/../lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
完成這項(xiàng)操作之后,需要重新登錄,或source一下profile文件,以便環(huán)境變量生效,當(dāng)然也可以手工運(yùn)行一下,以即時(shí)生效
#source /etc/profile.d/java.sh
2、創(chuàng)建hadoop用戶(所有節(jié)點(diǎn))
#useradd hadoop
#passwd hadoop
設(shè)置密碼,為簡單起見,3臺機(jī)器上的hadoop密碼最好設(shè)置成一樣,比如123456。為了方便,建議將hadoop加入root用戶組,操作方法:
#usermod -g root hadoop
執(zhí)行完后hadoop即歸屬于root組了,可以再輸入id hadoop查看輸出驗(yàn)證一下,如果看到類似下面的輸出:
#id hadoop
uid=1002(hadoop) gid=0(root) groups=0(root)
3、在NameNode節(jié)點(diǎn)創(chuàng)建秘鑰?
192.168.10.91:
創(chuàng)建RSA秘鑰對
#su - hadoop
$ssh-keygen
在NameNode節(jié)點(diǎn)復(fù)制公鑰到所有節(jié)點(diǎn)Hadoop用戶目錄下,包括自己:
$ssh-copy-id hadoop@192.168.10.91
$ssh-copy-id hadoop@192.168.10.92
$ssh-copy-id hadoop@192.168.10.93
4、解壓Hadoop二進(jìn)制包并設(shè)置環(huán)境變量(所有節(jié)點(diǎn))
#wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.4/hadoop-2.8.4.tar.gz
#tar xf hadoop-2.8.4.tar.gz -C /usr/local/
#ln -sv /usr/local/hadoop-2.8.4/ /usr/local/hadoop
編輯環(huán)境配置文件/etc/profile.d/hadoop.sh,定義類似如下環(huán)境變量,設(shè)定Hadoop的運(yùn)行環(huán)境。
#!/bin/bash
#
export HADOOP_PREFIX="/usr/local/hadoop"
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
創(chuàng)建數(shù)據(jù)和日志目錄:
#mkdir -pv /data/hadoop/hdfs/{nn,snn,dn}
#chown -R hadoop:hadoop /data/hadoop/hdfs
#mkdir -pv /var/log/hadoop/yarn
#chown -R hadoop:hadoop /var/log/hadoop/yarn
而后,在Hadoop的安裝目錄中創(chuàng)建logs目錄,并修改Hadoop所有文件的屬主和屬組。
#cd /usr/local/hadoop
#mkdir logs
#chmod g+w logs
#chown -R hadoop:hadoop ./*
四、配置所有Hadoop節(jié)點(diǎn)
1、hadoop-nn節(jié)點(diǎn)
需要配置以下幾個文件。
core-site.xml
core-size.xml文件包含了NameNode主機(jī)地址以及其監(jiān)聽RPC端口等信息(NameNode默認(rèn)使用的RPC端口為8020),對于分布式環(huán)境,每個節(jié)點(diǎn)都需要設(shè)置NameNode主機(jī)地址,其簡要的配置內(nèi)容如下所示:
#su - hadoop
$vim /usr/local/hadoop/etc/hadoop/core-site.xml
添加以下配置:
<configuration>
????<property>
????????<name>fs.defaultFS</name>
????????<value>hdfs://master:8020</value>
? ? ? ? <final>true</final>
????</property>
</configuration>
hdfs-site.xml
hdfs-site.xml主要用于配置HDFS相關(guān)的屬性,例如復(fù)制因子(即數(shù)據(jù)塊的副本數(shù)),NN和DN用于存儲數(shù)據(jù)的目錄等。數(shù)據(jù)塊的副本數(shù)對于分布式的Hadoop應(yīng)該為3,這里我設(shè)置為2,為了減少磁盤使用。而NN和DN用于存儲數(shù)據(jù)的目錄為前面的步驟中專門為其創(chuàng)建的路徑。另外,前面的步驟中也為SNN創(chuàng)建了相關(guān)的目錄,這里也一并配置為啟用狀態(tài)。
#su - hadoop
$vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>dfs.replication</name>
? ? ? ? ? ? ? ? <value>2</value>? ??
? ? ? ? </property>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>dfs.namenode.name.dir</name>
? ? ? ? ? ? ? ? <value>file:///data/hadoop/hdfs/nn</value>
? ? ? ? </property>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>dfs.datanode.data.dir</name>
? ? ? ? ? ? ? ? <value>file:///data/hadoop/hdfs/dn</value>
? ? ? ? </property>
? ? ? ?<property>
? ? ? ? ? ? ? ? <name>fs.checkpoint.dir</name>
? ? ? ? ? ? ? ? <value>file:///data/hadoop/hdfs/snn</value>
? ? ? ? </property>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>fs.checkpoint.edits.dir</name>
? ? ? ? ? ? ? ? <value>file:///data/hadoop/hdfs/snn</value>
? ? ? ? </property>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>dfs.permissions</name>
? ? ? ? ? ? ? ? <value>false</value>
? ? ? ? </property>
</configuration>
mapred-site.xml
mapred-site.xml文件用于配置集群的MapReduce framework,此處應(yīng)該指定使用yarn,另外的可用值還有l(wèi)ocal和classic。mapred-site.xml默認(rèn)不存在,但有模版文件mapred-site.xml.template,只需要將其復(fù)制為mapred-site.xml即可。
#su - hadoop
$cp -fr /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
$vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
? ? ? ? <property>
? ? ? ? ? ? ? ? <name>mapreduce.framework.name</name>
? ? ? ? ? ? ? ? <value>yarn</value>
? ? ? ? </property>
</configuration>
yarn-site.xml
yarn-site.xml用于配置YARN進(jìn)程及YARN的相關(guān)屬性,首先需要指定ResourceManager守護(hù)進(jìn)程的主機(jī)和監(jiān)聽的端口(這里ResourceManager準(zhǔn)備安裝在NameNode節(jié)點(diǎn));其次需要指定ResourceMnager使用的scheduler,以及NodeManager的輔助服務(wù)。一個簡要的配置示例如下所示:
$vim?/usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
????<property>
????????<name>yarn.resourcemanager.address</name>
????????<value>master:8032</value>
????</property>
????<property>
????????<name>yarn.resourcemanager.scheduler.address</name>
????????<value>master:8030</value>
????</property>
????<property>
????????<name>yarn.resourcemanager.resource-tracker.address</name>
????????<value>master:8031</value>
????</property>
????<property>
????????<name>yarn.resourcemanager.admin.address</name>
????????<value>master:8033</value>
????</property>
????<property>
????????<name>yarn.resourcemanager.webapp.address</name>
????????<value>master:8088</value>
????</property>
????<property><name>yarn.nodemanager.aux-services</name>
????????<value>mapreduce_shuffle</value>
????</property>
????<property>
????????<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
????????<value>org.apache.hadoop.mapred.ShuffleHandler</value>
????</property>
????<property>
????????<name>yarn.resourcemanager.scheduler.class</name>
????????<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
????</property>
</configuration>
hadoop-env.sh和yarn-env.sh
Hadoop的各個守護(hù)進(jìn)程依賴于JAVA_HOME環(huán)境變量,如果有類似于前面步驟中通過/etc/profile.d/java.sh全局配置定義的JAVA_HOME變量即可正常使用。不過,如果想為Hadoop定義依賴到特定JAVA環(huán)境,也可以編輯這兩個腳本文件,為其JAVA_HOME取消注釋并配置合適的值即可。此外,Hadoop大多數(shù)守護(hù)進(jìn)程默認(rèn)使用的堆大小為1GB,但現(xiàn)實(shí)應(yīng)用中,可能需要對其各類進(jìn)程的堆內(nèi)存大小做出調(diào)整,這只需要編輯此兩者文件中相關(guān)環(huán)境變量值即可,例如HADOOP_HEAPSIZE、HADOOP_JOB_HISTORY_HEADPSIZE、JAVA_HEAP_SIZE和YARN_HEAP_SIZE等。
slaves文件
slaves文件存儲于了當(dāng)前集群的所有slave節(jié)點(diǎn)的列表,默認(rèn)值為localhost。這里我打算在三個節(jié)點(diǎn)都安裝DataNode,所以都添加進(jìn)去即可。
#su - hadoop
$vim /usr/local/hadoop/etc/hadoop/slaves
hadoop-nn
hadoop-snn
hadoop-dn-01
到目前為止,第一個節(jié)點(diǎn)(Master)已經(jīng)配置好了。在hadoop集群中,所有節(jié)點(diǎn)的配置都應(yīng)該是一樣的,前面我們也為slaves節(jié)點(diǎn)創(chuàng)建了Hadoop用戶、數(shù)據(jù)目錄以及日志目錄等配置。
接下來就是把Master節(jié)點(diǎn)的配置文件都同步到所有Slaves即可。
#su - hadoop
$scp /usr/local/hadoop/etc/hadoop/* hadoop@hadoop-snn:/usr/local/hadoop/etc/hadoop/
$scp /usr/local/hadoop/etc/hadoop/* hadoop@hadoop-dn-01:/usr/local/hadoop/etc/hadoop/
五、格式化HDFS
在HDFS的NameNode啟動之前需要先初始化其用于存儲數(shù)據(jù)的目錄,如果hdfs-site.xml中dfs.namenode.name.dir屬性指定的目錄不存在,格式化命令會自動創(chuàng)建之;如果事先存在,請確保其權(quán)限設(shè)置正確,此時(shí)格式操作會清除其內(nèi)部的所有數(shù)據(jù)并重新建立一個新的文件系統(tǒng)。需要以hdfs用戶的身份執(zhí)行如下命令。
[hadoop@hadoop-nn ~]$hdfs namenode -format
其輸出會有大量的信息輸出,如果顯示出類似”INFO common.Storage: Storage directory /data/hadoop/hdfs/nn has been successfully formatted.“的結(jié)果表示格式化操作已經(jīng)完成。
六、啟動Hadoop集群
啟動Hadood集群的方法有兩種:一是在各節(jié)點(diǎn)分別啟動需要啟動的服務(wù),二是在NameNode節(jié)點(diǎn)啟動整個集群(推薦方法)。
1、分別啟動方式
Master節(jié)點(diǎn)需要啟動HDFS的NameNode、SecondaryNameNode、DataNode服務(wù),以及YARN的ResourceManager服務(wù)。
[hadoop@hadoop-nn ~]$hadoop-daemon.sh start namenode
[hadoop@hadoop-nn ~]$hadoop-daemon.sh start secondarynamenode
[hadoop@hadoop-nn ~]$hadoop-daemon.sh start datanode
[hadoop@hadoop-nn ~]$yarn-daemon.sh start resourcemanager
各Slave節(jié)點(diǎn)需要啟動HDFS的DataNode服務(wù),以及YARN的NodeManager服務(wù)。
[hadoop@hadoop-nn ~]$hadoop-daemon.sh start datanode
[hadoop@hadoop-nn ~]$yarn-daemon.sh start nodemanager
2、集群啟動方式
集群規(guī)模較大時(shí),分別啟動各節(jié)點(diǎn)的各服務(wù)過于繁瑣和低效,為此,Hadoop專門提供了start-dfs.sh和stop-dfs.sh來啟動及停止整個hdfs集群,以及start-yarn.sh和stop-yarn.sh來啟動及停止整個yarn集群。
[hadoop@hadoop-nn ~]$start-dfs.sh
[hadoop@hadoop-nn ~]$start-yarn.sh
較早版本的Hadoop會提供start-all.sh和stop-all.sh腳本來統(tǒng)一控制hdfs和mapreduce,但Hadoop 2.0及之后的版本不建議再使用此種方式。
2.1、啟動HDFS集群
[hadoop@hadoop-nn ~]$start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hadoop-nn.out
hadoop-dn-01: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-dn-01.out
hadoop-nn: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-nn.out
hadoop-snn: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-snn.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop-nn.out
HDFS集群啟動完成后,可在各節(jié)點(diǎn)以jps命令等驗(yàn)證各進(jìn)程是否正常運(yùn)行,也可以通過Web UI來檢查集群的運(yùn)行狀態(tài)。
查看NameNode節(jié)點(diǎn)啟動的進(jìn)程:
hadoop-nn:
[hadoop@hadoop-nn ~]$ jps
14576 NameNode
14887 SecondaryNameNode
14714 DataNode
15018 Jps
[hadoop@hadoop-nn ~]$ netstat -anplt | grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp????????0??????0 0.0.0.0:50090?????????? 0.0.0.0:*?????????????? LISTEN??????16468/java??????????
tcp????????0??????0 127.0.0.1:58545???????? 0.0.0.0:*?????????????? LISTEN??????16290/java??????????
tcp????????0??????0 10.10.0.186:8020????????0.0.0.0:*?????????????? LISTEN??????16146/java??????????
tcp????????0??????0 0.0.0.0:50070?????????? 0.0.0.0:*?????????????? LISTEN??????16146/java??????????
tcp????????0??????0 0.0.0.0:50010?????????? 0.0.0.0:*?????????????? LISTEN??????16290/java??????????
tcp????????0??????0 0.0.0.0:50075?????????? 0.0.0.0:*?????????????? LISTEN??????16290/java??????????
tcp????????0??????0 0.0.0.0:50020?????????? 0.0.0.0:*?????????????? LISTEN??????16290/java??????????
tcp????????0??????0 10.10.0.186:32565?????? 10.10.0.186:8020????????ESTABLISHED 16290/java??????????
tcp????????0??????0 10.10.0.186:8020????????10.10.0.186:32565?????? ESTABLISHED 16146/java??????????
tcp????????0??????0 10.10.0.186:8020????????10.10.0.188:11681?????? ESTABLISHED 16146/java??????????
tcp????????0??????0 10.10.0.186:8020????????10.10.0.187:57112?????? ESTABLISHED 16146/java??????????
查看DataNode節(jié)點(diǎn)啟動進(jìn)程:
hadoop-snn:
[hadoop@hadoop-snn ~]$ jps
741 DataNode
862 Jps
[hadoop@hadoop-snn ~]$ netstat -anplt | grep java
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp????????0??????0 0.0.0.0:50010?????????? 0.0.0.0:*?????????????? LISTEN??????1042/java??????????
tcp????????0??????0 0.0.0.0:50075?????????? 0.0.0.0:*?????????????? LISTEN??????1042/java??????????
tcp????????0??????0 127.0.0.1:18975???????? 0.0.0.0:*?????????????? LISTEN??????1042/java??????????
tcp????????0??????0 0.0.0.0:50020?????????? 0.0.0.0:*?????????????? LISTEN??????1042/java??????????
tcp????????0??????0 10.10.0.187:57112?????? 10.10.0.186:8020????????ESTABLISHED 1042/java
hadoop-dn-01:
[hadoop@hadoop-dn-01 ~]$ jps
410 DataNode
539 Jps
通過JPS命令和開啟的端口基本可以看出,NameNode、SecondaryNameNode、DataNode進(jìn)程各自開啟的對應(yīng)端口。另外,可以看到DataNode都正常連接到了NameNode的8020端口。如果相關(guān)節(jié)點(diǎn)起不來,可能是權(quán)限不對,或者相關(guān)目錄沒有創(chuàng)建,具體可以看相關(guān)節(jié)點(diǎn)的日志:/usr/local/hadoop/logs/*.log。
通過NameNode節(jié)點(diǎn)的http://hadoop-nn:50070訪問Web UI界面:

可以看到3個DataNode節(jié)點(diǎn)都運(yùn)行正常。
此時(shí)其實(shí)HDFS集群已經(jīng)好了,就可以往里面存儲數(shù)據(jù)了,下面簡單使用HDFS命令演示一下:
在HDFS集群創(chuàng)建目錄:
[hadoop@hadoop-nn ~]$ hdfs dfs -mkdir /test ? ?
如果出現(xiàn)報(bào)錯:mkdir: Cannot create directory /test. Name node is in safe mode.
則需要執(zhí)行命令:
[hadoop@hadoop-nn ~]$?hadoop dfsadmin -safemode leave
上傳文件到HDFS集群:
[hadoop@hadoop-nn ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hadoop@hadoop-nn ~]$ hdfs dfs -put /etc/init.d/functions /test/functions
查看HDFS集群的文件:
[hadoop@hadoop-nn ~]$ hdfs dfs -ls /test/
Found 2 items
-rw-r--r--?? 2 hadoop supergroup????????524 2017-06-14 01:49 /test/fstab
-rw-r--r--?? 2 hadoop supergroup??????13948 2017-06-14 01:50 /test/functions
然后我們再看一下Hadoop Web UI界面:

可以看到Blocks字段,在所有節(jié)點(diǎn)共占用4個塊,HDFS默認(rèn)未64M一個塊大小。由于我們上傳的文件太小,所以也沒有做切割,我們再啟動集群時(shí)設(shè)置的是2個副本,所以這里就相當(dāng)于存儲了兩份。
HDFS集群管理命令

6.2、啟動YARN集群
[hadoop@hadoop-nn ~]$start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hadoop-nn.out
hadoop-nn: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-nn.out
hadoop-dn-01: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-dn-01.out
hadoop-snn: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-snn.out
YARN集群啟動完成后,可在各節(jié)點(diǎn)以jps命令等驗(yàn)證各進(jìn)程是否正常運(yùn)行。
hadoop-nn:
[hadoop@c0a80a5b logs]$jps
4120 DataNode
1898 NameNode
2474 SecondaryNameNode
2922 NodeManager
2701 ResourceManager
5646 Jps
hadoop-snn:
[hadoop@hadoop-snn ~]$ jps
10415 NodeManager
11251 Jps
9984 DataNode
hadoop-dn-01:
[hadoop@hadoop-dn-01 ~]$ jps
10626 NodeManager
10020 DataNode
11423 Jps
通過JPS命令和開啟的端口基本可以看出ResourceManager、NodeManager進(jìn)程都各自啟動。另外,NodeManager會在對應(yīng)的DataNode節(jié)點(diǎn)都啟動。
通過ResourceManager節(jié)點(diǎn)的http://hadoop-nn:8088訪問Web UI界面:

YARN集群管理命令
YARN命令有許多子命令,大體可分為用戶命令和管理命令兩類。直接運(yùn)行yarn命令,可顯示其簡單使用語法及各子命令的簡單介紹:

這些命令中,jar、application、node、logs、classpath和version是常用的用戶命令,而resourcemanager、nodemanager、proxyserver、rmadmin和daemonlog是較為常用的管理類命令。
七、運(yùn)行YARN應(yīng)用程序
YARN應(yīng)用程序(Application)可以是一個簡單的shell腳本、MapReduce作業(yè)或其它任意類型的作業(yè)。需要運(yùn)行應(yīng)用程序時(shí),客戶端需要事先生成一個ApplicationMaster,而后客戶端把a(bǔ)pplication context提交給ResourceManager,隨后RM向AM分配內(nèi)存及運(yùn)行應(yīng)用程序的容器。大體來說,此過程分為六個階段。
1、Application初始化及提交;
2、分配內(nèi)存并啟動AM;
3、AM注冊及資源分配;
4、啟動并監(jiān)控容器;
5、Application進(jìn)度報(bào)告;
6、Application運(yùn)行完成;
下面我們來利用搭建好的Hadoop平臺處理一個任務(wù),看一下這個流程是怎樣的。Hadoop安裝包默認(rèn)提供了以下運(yùn)行示例,如下操作:
[hadoop@c0a80a5b logs]$yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar
An example program must be given as the first argument.
Valid program names are:
? aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
? aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
? bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
? dbcount: An example job that count the pageview counts from a database.
? distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
? grep: A map/reduce program that counts the matches of a regex in the input.
? join: A job that effects a join over sorted, equally partitioned datasets
? multifilewc: A job that counts words from several files.
? pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
? pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
? randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
? randomwriter: A map/reduce program that writes 10GB of random data per node.
? secondarysort: An example defining a secondary sort to the reduce.
? sort: A map/reduce program that sorts the data written by the random writer.
? sudoku: A sudoku solver.
? teragen: Generate data for the terasort
? terasort: Run the terasort
? teravalidate: Checking results of terasort
? wordcount: A map/reduce program that counts the words in the input files.
? wordmean: A map/reduce program that counts the average length of the words in the input files.
? wordmedian: A map/reduce program that counts the median length of the words in the input files.
? wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
我們找一個比較好理解的wordcount進(jìn)行測試,還記得我們剛開始提供一個funcations文件到了HDFS集群中,下面我們就把funcations這個文件進(jìn)行單詞統(tǒng)計(jì)處理,示例如下:
[hadoop@c0a80a5b logs]$yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar wordcount /test/fstab /test/functions /test/wc
18/04/26 20:11:52 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.10.91:8032
18/04/26 20:11:53 INFO input.FileInputFormat: Total input files to process : 2
18/04/26 20:11:53 INFO mapreduce.JobSubmitter: number of splits:2
18/04/26 20:11:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524741305301_0001
18/04/26 20:11:54 INFO impl.YarnClientImpl: Submitted application application_1524741305301_0001
18/04/26 20:11:54 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1524741305301_0001/
18/04/26 20:11:54 INFO mapreduce.Job: Running job: job_1524741305301_0001
18/04/26 20:12:04 INFO mapreduce.Job: Job job_1524741305301_0001 running in uber mode : false
18/04/26 20:12:04 INFO mapreduce.Job:? map 0% reduce 0%
18/04/26 20:12:11 INFO mapreduce.Job:? map 50% reduce 0%
18/04/26 20:12:12 INFO mapreduce.Job:? map 100% reduce 0%
18/04/26 20:12:18 INFO mapreduce.Job:? map 100% reduce 100%
18/04/26 20:12:18 INFO mapreduce.Job: Job job_1524741305301_0001 completed successfully
18/04/26 20:12:18 INFO mapreduce.Job: Counters: 49
? ? ? ? File System Counters
? ? ? ? ? ? ? ? FILE: Number of bytes read=10779
? ? ? ? ? ? ? ? FILE: Number of bytes written=494063
? ? ? ? ? ? ? ? FILE: Number of read operations=0
? ? ? ? ? ? ? ? FILE: Number of large read operations=0
? ? ? ? ? ? ? ? FILE: Number of write operations=0
? ? ? ? ? ? ? ? HDFS: Number of bytes read=15880
? ? ? ? ? ? ? ? HDFS: Number of bytes written=8015
? ? ? ? ? ? ? ? HDFS: Number of read operations=9
? ? ? ? ? ? ? ? HDFS: Number of large read operations=0
? ? ? ? ? ? ? ? HDFS: Number of write operations=2
? ? ? ? Job Counters
? ? ? ? ? ? ? ? Launched map tasks=2
? ? ? ? ? ? ? ? Launched reduce tasks=1
? ? ? ? ? ? ? ? Data-local map tasks=2
? ? ? ? ? ? ? ? Total time spent by all maps in occupied slots (ms)=8926
? ? ? ? ? ? ? ? Total time spent by all reduces in occupied slots (ms)=5022
? ? ? ? ? ? ? ? Total time spent by all map tasks (ms)=8926
? ? ? ? ? ? ? ? Total time spent by all reduce tasks (ms)=5022
? ? ? ? ? ? ? ? Total vcore-milliseconds taken by all map tasks=8926
? ? ? ? ? ? ? ? Total vcore-milliseconds taken by all reduce tasks=5022
? ? ? ? ? ? ? ? Total megabyte-milliseconds taken by all map tasks=9140224
? ? ? ? ? ? ? ? Total megabyte-milliseconds taken by all reduce tasks=5142528
? ? ? ? Map-Reduce Framework
? ? ? ? ? ? ? ? Map input records=666
? ? ? ? ? ? ? ? Map output records=2210
? ? ? ? ? ? ? ? Map output bytes=22507
? ? ? ? ? ? ? ? Map output materialized bytes=10785
? ? ? ? ? ? ? ? Input split bytes=192
? ? ? ? ? ? ? ? Combine input records=2210
? ? ? ? ? ? ? ? Combine output records=692
? ? ? ? ? ? ? ? Reduce input groups=686
? ? ? ? ? ? ? ? Reduce shuffle bytes=10785
? ? ? ? ? ? ? ? Reduce input records=692
? ? ? ? ? ? ? ? Reduce output records=686
? ? ? ? ? ? ? ? Spilled Records=1384
? ? ? ? ? ? ? ? Shuffled Maps =2
? ? ? ? ? ? ? ? Failed Shuffles=0
? ? ? ? ? ? ? ? Merged Map outputs=2
? ? ? ? ? ? ? ? GC time elapsed (ms)=201
? ? ? ? ? ? ? ? CPU time spent (ms)=2080
? ? ? ? ? ? ? ? Physical memory (bytes) snapshot=711757824
? ? ? ? ? ? ? ? Virtual memory (bytes) snapshot=6615576576
? ? ? ? ? ? ? ? Total committed heap usage (bytes)=492306432
? ? ? ? Shuffle Errors
? ? ? ? ? ? ? ? BAD_ID=0
? ? ? ? ? ? ? ? CONNECTION=0
? ? ? ? ? ? ? ? IO_ERROR=0
? ? ? ? ? ? ? ? WRONG_LENGTH=0
? ? ? ? ? ? ? ? WRONG_MAP=0
? ? ? ? ? ? ? ? WRONG_REDUCE=0
? ? ? ? File Input Format Counters
? ? ? ? ? ? ? ? Bytes Read=15688
? ? ? ? File Output Format Counters
? ? ? ? ? ? ? ? Bytes Written=8015
我們把統(tǒng)計(jì)結(jié)果放到HDFS集群的/test/wc目錄下。另外,注意當(dāng)輸出目錄存在時(shí)執(zhí)行任務(wù)會報(bào)錯。
任務(wù)運(yùn)行時(shí),你可以去Hadoop管理平臺(8088端口)看一下會有如下類似的輸出信息,包括此次應(yīng)用名稱,運(yùn)行用戶、任務(wù)名稱、應(yīng)用類型、執(zhí)行時(shí)間、執(zhí)行狀態(tài)、以及處理進(jìn)度。

然后我們可以看一下/test/wc目錄下有什么:
[hadoop@c0a80a5b logs]$hdfs dfs -ls /test/wc
Found 2 items
-rw-r--r--? 2 hadoop supergroup? ? ? ? ? 0 2018-04-26 20:12 /test/wc/_SUCCESS
-rw-r--r--? 2 hadoop supergroup? ? ? 8015 2018-04-26 20:12 /test/wc/part-r-00000
看一下單詞統(tǒng)計(jì)結(jié)果:

八、開啟歷史服務(wù)
當(dāng)運(yùn)行過Yarn任務(wù)之后,在Web UI界面可以查看其狀態(tài)信息。但是當(dāng)ResourceManager重啟之后,這些任務(wù)就不可見了。所以可以通過開啟Hadoop歷史服務(wù)來查看歷史任務(wù)信息。
Hadoop開啟歷史服務(wù)可以在web頁面上查看Yarn上執(zhí)行job情況的詳細(xì)信息??梢酝ㄟ^歷史服務(wù)器查看已經(jīng)運(yùn)行完的Mapreduce作業(yè)記錄,比如用了多少個Map、用了多少個Reduce、作業(yè)提交時(shí)間、作業(yè)啟動時(shí)間、作業(yè)完成時(shí)間等信息。
[hadoop@c0a80a5b logs]$mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop/logs/mapred-hadoop-historyserver-c0a80a5b.nykjsrv.cn.out
[hadoop@c0a80a5b logs]$jps
9318 JobHistoryServer
4120 DataNode
1898 NameNode
2474 SecondaryNameNode
2922 NodeManager
2701 ResourceManager
9390 Jps
JobHistoryServer開啟后,可以通過Web頁面查看歷史服務(wù)器:
歷史服務(wù)器的Web端口默認(rèn)是19888,可以查看Web界面。你可以多執(zhí)行幾次Yarn任務(wù),可以通過History點(diǎn)擊跳到歷史頁面,查看其任務(wù)詳情。

但是在上面所顯示的某一個Job任務(wù)頁面的最下面,Map和Reduce個數(shù)的鏈接上,點(diǎn)擊進(jìn)入Map的詳細(xì)信息頁面,再查看某一個Map或者Reduce的詳細(xì)日志是看不到的,是因?yàn)闆]有開啟日志聚集服務(wù)。
九、開啟日志聚集
MapReduce是在各個機(jī)器上運(yùn)行的,在運(yùn)行過程中產(chǎn)生的日志存在于各個機(jī)器上,為了能夠統(tǒng)一查看各個機(jī)器的運(yùn)行日志,將日志集中存放在HDFS上,這個過程就是日志聚集。
配置日志聚集功能,Hadoop默認(rèn)是不啟用日志聚集的,在yarn-site.xml文件里添加配置啟用日志聚集。
$vim?/usr/local/hadoop/etc/hadoop/yarn-site.xml
<property>
? ? ????<name>yarn.log-aggregation-enable</name>
? ? ????<value>true</value>
</property>
<property>
? ? ????<name>yarn.log-aggregation.retain-seconds</name>
? ? ? ? <value>106800</value>
</property>
yarn.log-aggregation-enable:是否啟用日志聚集功能。
yarn.log-aggregation.retain-seconds:設(shè)置日志保留時(shí)間,單位是秒。
將配置文件分發(fā)到其他節(jié)點(diǎn):
[hadoop@hadoop-nn ~]$ su - hadoop
[hadoop@hadoop-nn ~]$ scp /usr/local/hadoop/etc/hadoop/* hadoop@hadoop-snn:/usr/local/hadoop/etc/hadoop/
[hadoop@hadoop-nn ~]$ scp /usr/local/hadoop/etc/hadoop/* hadoop@hadoop-dn-01:/usr/local/hadoop/etc/hadoop/
重啟Yarn進(jìn)程:
[hadoop@hadoop-nn ~]$ stop-yarn.sh
[hadoop@hadoop-nn ~]$ start-yarn.sh
重啟HistoryServer進(jìn)程:
[hadoop@hadoop-nn ~]$ mr-jobhistory-daemon.sh stop historyserver
[hadoop@hadoop-nn ~]$ mr-jobhistory-daemon.sh start historyserver
測試日志聚集,運(yùn)行一個demo MapReduce,使之產(chǎn)生日志:
[hadoop@hadoop-nn ~]$ yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /test/fstab /test/wc1
運(yùn)行Job后,就可以在歷史服務(wù)器Web頁面查看各個Map和Reduce的日志了。
十、內(nèi)存調(diào)整
hadoop由512M調(diào)整為4096M
[hadoop@hadoop-nn ~]$/usr/local/hadoop/etc/hadoop/hadoop-env.sh
export HADOOP_PORTMAP_OPTS="-Xmx4096m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx4096m $HADOOP_CLIENT_OPTS"
yarn由2048M調(diào)整為4096M
[hadoop@hadoop-nn ~]$/usr/local/hadoop/etc/hadoop/yarn-env.sh
JAVA_HEAP_MAX=-Xmx4096m
參考:http://www.ywnds.com/?p=10138