這篇文章闡述下Hadoop分布式環(huán)境的搭建,Hadoop版本采用hadoop-2.6.0-cdh5.7.0,手頭有三臺(tái)機(jī)器,即hadoop000/hadoop001/hadoop002,我會(huì)把其中一臺(tái)機(jī)器節(jié)點(diǎn)分配NameNode和ResourceManager角色,同時(shí)這臺(tái)機(jī)器也作為一個(gè)數(shù)據(jù)存儲(chǔ)節(jié)點(diǎn)分配DataNode和NodeManager角色,另外兩臺(tái)機(jī)器僅作為數(shù)據(jù)存儲(chǔ)節(jié)點(diǎn)分配DataNode和NodeManager角色。
- hadoop000:NameNode/DataNode ResourceManager/NodeManager
- hadoop001:DataNode NodeManager
- hadoop002:DataNode NodeManager
準(zhǔn)備工作
- hostname設(shè)置
在三臺(tái)機(jī)器上分別使用sudo vi /etc/sysconfig/network命令修改hostname,比如對(duì)第一臺(tái)機(jī)器做如下設(shè)置,另外兩臺(tái)同理:
NETWORKING=yes
HOSTNAME=hadoop000 - 配置hostname和ip地址的映射關(guān)系,使用sudo vi /etc/hosts對(duì)三臺(tái)機(jī)器做如下配置:
192.168.199.102 hadoop000
192.168.199.247 hadoop001
192.168.199.138 hadoop002
前置安裝
- ssh免密碼登錄
在每臺(tái)機(jī)器上執(zhí)行:ssh-keygen -t rsa
以hadoop000機(jī)器為主
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop000
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002 - jdk安裝
在hadoop000機(jī)器上解壓jdk安裝包,并設(shè)置JAVA_HOME到系統(tǒng)環(huán)境變量
tar -zxvf jdk-8u131-linux-x64.tar.gz -C ~/app/
設(shè)置環(huán)境變量
vi ~/.bash_profile
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
source ~/.bash_profile使之生效
集群安裝
-
Hadoop安裝
在hadoop000機(jī)器上解壓Hadoop安裝包,并設(shè)置HADOOP_HOME到系統(tǒng)環(huán)境變量
hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_79core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop000:8020</value>
</property>-
hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/app/tmp/dfs/name</value>
</property><property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/app/tmp/dfs/data</value>
</property> -
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property><property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop000</value>
</property> mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>slaves
hadoop000
hadoop001
hadoop002
分發(fā)安裝包和配置文件到hadoop001和hadoop002節(jié)點(diǎn)
scp -r ~/app hadoop@hadoop001:~/
scp -r ~/app hadoop@hadoop002:~/
scp ~/.bash_profile hadoop@hadoop001:~/
scp ~/.bash_profile hadoop@hadoop002:~/
在hadoop001和hadoop002機(jī)器上讓.bash_profile生效對(duì)NameNode做格式化:只要在hadoop000上執(zhí)行即可
bin/hdfs namenode -format啟動(dòng)集群:只要在hadoop000上執(zhí)行即可
sbin/start-all.sh-
驗(yàn)證
jps查看進(jìn)程:- hadoop000:
SecondaryNameNode
DataNode
NodeManager
NameNode
ResourceManager - hadoop001:
NodeManager
DataNode - hadoop002:
NodeManager
DataNode
webui訪問: hadoop000:50070(hdfs) hadoop000:8088(yarn)
- hadoop000:
集群停止: stop-all.sh
將Hadoop項(xiàng)目運(yùn)行到集群中
1)上傳數(shù)據(jù)到hadoop000機(jī)器的data目錄下
2)上傳開發(fā)的jar到hadoop000機(jī)器的lib目錄下
3)需要將數(shù)據(jù)上傳到hdfs
4)在分布式集群上運(yùn)行我們開發(fā)的程序
比如我這里運(yùn)行官方給的計(jì)算Pi的案例:
hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3