Hadoop偽集群搭建
前言
需要先準備一下工作:
- 關閉防火墻
- 設置主機名稱
- IP與主機名稱進行綁定
- 安裝JDK環(huán)境(Hadoop需要在Java環(huán)境中運行)
一.配置Hadoop環(huán)境
-
解壓Hadoop安裝包到opt/app路徑下
tar -zxvf hadoop-2.7.1.tar.gz -C /opt/app -
配置etc/hadoop里面的配置文件
-
設置文件夾權限
sudo chown -R root:root /opt/app/hadoop-2.7.1 -
配置hadoop-env.sh,填入jdk目錄
# The java implementation to use. export JAVA_HOME=/opt/app/jdk1.8.0_152 -
返回hadoop根目錄,檢驗環(huán)境配置是否有問題
bin/hadoop Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] CLASSNAME run the class named CLASSNAME or where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file note: please use "yarn jar" to launch YARN applications, not this command. checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the credential interact with credential providers Hadoop jar and the required libraries daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings Most commands print help when invoked w/o parameters. -
配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> -
配置hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> -
配置mapred-site.xml(需要先把mapred-site.xml.template改成mapred-site.xml)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration> -
配置yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
-
二、啟動Hadoop服務
回到Hadoop文件夾根目錄
-
格式化文件系統(tǒng)
bin/hdfs namenode -format -
啟動NameNode和DataNode守護程序
sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode -
啟動Yarn中的ResourceManager和NodeManager守護程序
sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager -
用命令jps查看java進程
jps 126098 Jps 14532 DataNode 17284 ResourceManager 14235 NameNode 17679 NodeManager 可以在web中查看服務頁面
三、運行簡單實例單詞統(tǒng)計
-
編寫一個文件a.txt
hadoop java hbase hello hadoop java zookeeper hello sqoop hbase flume spark -
上傳到hdfs集群的根目錄上(可以在HDFS外部訪問web界面 localhost:50070查看 )
bin/hdfs dfs -put a.txt /a.txt -
跑動MapReduce中的wordcount
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /a.txt output -
在hdfs文件系統(tǒng)上打開output,
bin/hdfs dfs -ls /output Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2020-02-24 23:39 /output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 68 2020-02-24 23:39 /output/part-r-00000 -
查看文本中的文件
bin/hdfs dfs -text /output/part* flume 1 hadoop 2 hbase 2 hello 2 java 2 spark 1 sqoop 1 zookeeper 1
完成!
Hadoop目錄中各個文件夾
bin 基本腳本管理
sbin 服務啟動與關閉腳本
share jar包
etc 配置文件