spark yarn集群搭建(一:準備工作)
Master節(jié)點配置:
- 進入/datamgt目錄下下載二進制包hadoop-2.7.3.tar.gz,解壓并重命名
tar -zxvf hadoop-2.7.6.tar.gz && mv hadoop-2.7.6 hadoop - 修改全局變量/etc/profile
- 修改/etc/profile,增加如下內(nèi)容:
export HADOOP_HOME=/datamgt/hadoop/ export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin - 修改完成后執(zhí)行:
source /etc/profile
- 修改/etc/profile,增加如下內(nèi)容:
- 修改hadoop配置文件
- 修改JAVA_HOME
vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh #將export JAVA_HOME=${JAVA_HOME}修改為: export JAVA_HOME=/usr/java/jdk1.8.0_65 - 修改slaves
vim $HADOOP_HOME/etc/hadoop/slaves #將原來的localhost刪除,改成如下內(nèi)容: slave1 slave2 - 修改$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/datamgt/hadoop/tmp</value> </property> </configuration> - 修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/datamgt/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/datamgt/hadoop/hdfs/data</value> </property> </configuration> - 修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration> - 修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
#先復制mapred-site.xml.template,生成mapred-site.xml后進行修改 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
- 修改JAVA_HOME
salve節(jié)點配置:
- 復制master節(jié)點的hadoop文件夾到slave1和slave2上:
scp -r /datamgt/hadoop root@slave1:/datamgt && scp -r /datamgt/hadoop root@slave2:/datamgt - 修改slave1、slave2節(jié)點下的/etc/profile文件,過程與master一致
啟動集群:
- master節(jié)點啟動之前格式化一下namenode
hadoop namenode -format - master節(jié)點執(zhí)行
/datamgt/hadoop/sbin/start-all.sh
查看集群是否啟動成功:
- 執(zhí)行jps查看java進程
- master顯示
SecondaryNameNode ResourceManager NameNode- slave顯示
NodeManager DataNode - 瀏覽器驗證相應web頁面是否可訪問
master:50070 master:8088
遇到的問題
- 50070不可訪問:
一開始以為是端口監(jiān)聽的問題:Hadoop HDFS的namenode WEB訪問50070端口打不開解決方法
后來查看日志(hadoop/logs/namenode日志)發(fā)現(xiàn)是因為本機9000端口被占用導致hadoop的namenode服務啟動失敗