版本配置:
Hadoop 3.2.1+hive apache-hive-3.1.2+hbase-2.2.6+spark3.0.1+mysql:8.0.22
Mac基于docker安裝,對于docker的一些常規(guī)操作此處不做敘訴。
由于hadoop與hive等存在版本兼容問題,安裝前可以先通過官網(wǎng)確認版本兼容情況:
http://hive.apache.org/downloads.html
docker安裝也可以采取docker-compose.yml配置文件的形式拉取配置,但本人對于docker的了解目前有限,故采用自己相對了解的方式進行安裝配置。
Hadoop
1.拉取hadoop鏡像
docker pull registry.cn-hangzhou.aliyuncs.com/hadoop_test/hadoop_base

2.運行容器
關(guān)于worker路徑,通過etc/profile環(huán)境變量配置的文件即可查看hadoop安裝目錄
//查看環(huán)境變量配置
cat etc/profile
查看wokers狀態(tài)
[圖片上傳失敗...(image-c02e61-1607568203727)]
-
建立hadoop用的內(nèi)部網(wǎng)絡
#指定固定ip號段 docker network create --driver=bridge --subnet=172.19.0.0/16 hadoop
-
建立Master容器,映射端口,10000端口為hiveserver2端口
docker run -it --network hadoop -h Master --name Master -p 9870:9870 -p 8088:8088 -p 10000:10000 registry.cn-hangzhou.aliyuncs.com/hadoop_test/hadoop_base bash 創(chuàng)建Slave1容器
docker run -it --network hadoop -h Slave1 --name Slave1 registry.cn-hangzhou.aliyuncs.com/hadoop_test/hadoop_base bash
-
創(chuàng)建Slave2容器
docker run -it --network hadoop -h Slave2 --name Slave2 registry.cn-hangzhou.aliyuncs.com/hadoop_test/hadoop_base bash配置hosts文件
172.19.0.4 Master 172.19.0.3 Slave1 172.19.0.2 Slave2
3.啟動hadoop
雖然容器里面已經(jīng)把hadoop路徑配置在系統(tǒng)變量里面,但每次進入需要運行source /etc/profile才能生效使用
查看已經(jīng)運行的容器
docker ps
[圖片上傳失敗...(image-f018c4-1607568203727)]
docker進入容器中
#進入Master容器
docker exec -it Master /bin/bash
#進入后格式化hdfs
hadoop namenode -format
[圖片上傳失敗...(image-612090-1607568203727)]
啟動所有服務
root@Master:/usr/local/hadoop/sbin# ./start-all.sh
Starting namenodes on [Master]
Master: Warning: Permanently added 'master,172.19.0.4' (ECDSA) to the list of known hosts.
Starting datanodes
Slave1: Warning: Permanently added 'slave1,172.19.0.3' (ECDSA) to the list of known hosts.
Slave2: Warning: Permanently added 'slave2,172.19.0.2' (ECDSA) to the list of known hosts.
Slave1: WARNING: /usr/local/hadoop/logs does not exist. Creating.
Slave2: WARNING: /usr/local/hadoop/logs does not exist. Creating.
Starting secondary namenodes [Master]
Starting resourcemanager
Starting nodemanagers
[圖片上傳失敗...(image-4ae517-1607568203727)]
[圖片上傳失敗...(image-90e3ef-1607568203727)]
查看分布式文件分布狀態(tài)
root@Master:/usr/local/hadoop/sbin# hdfs dfsadmin -report
root@Master:/usr/local/hadoop/sbin# hdfs dfsadmin -report
bash: hdfs: command not found
root@Master:/usr/local/hadoop/sbin# source /etc/profile
root@Master:/usr/local/hadoop/sbin# hdfs dfsadmin -report
Configured Capacity: 188176871424 (175.25 GB)
Present Capacity: 152964861952 (142.46 GB)
DFS Remaining: 152510214144 (142.04 GB)
DFS Used: 454647808 (433.59 MB)
DFS Used%: 0.30%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
這里提示:hdfs命令沒發(fā)現(xiàn),原因是因為啟動容器時沒有source profile文件,雖然在profile文件中配置hadoop的相關(guān)配置單沒有生效
4.wordCount案例
//復制文件內(nèi)容到file1.txt文件中
root@Master:/usr/local/hadoop# cp LICENSE.txt file1.txt
//設置上傳文件夾
root@Master:/usr/local/hadoop# hadoop fs -mkdir /input
//上傳file1文件到hadoop中
root@Master:/usr/local/hadoop# hadoop fs -put file1.txt /input
2020-11-23 02:15:37,958 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
//查看 HDFS 中 input 文件夾里的內(nèi)容
Found 1 items
-rw-r--r-- 2 root supergroup 150569 2020-11-23 02:15 /input/file1.txt
//查看運行結(jié)果
root@Master:/usr/local/hadoop# hadoop fs -ls /output
Found 2 items
-rw-r--r-- 2 root supergroup 0 2020-11-23 02:22 /output/_SUCCESS
-rw-r--r-- 2 root supergroup 35324 2020-11-23 02:22 /output/part-r-00000
//查看具體結(jié)果內(nèi)容
hadoop fs -cat /output/part-r-00000
Hive
hive鏡像采取的是apache-hive-3.1.2
下載地址:https://mirror.bit.edu.cn/apache/hive/hive-3.1.2/
1.上傳hive鏡像
docker cp apache-hive-3.1.2-bin.tar.gz Master:/usr/local
//進入到目錄后解壓
cd /usr/local/
# 解壓安裝包
tar xvf apache-hive-3.1.2-bin.tar.gz
2.修改配置文件
root@Master:/usr/local/apache-hive-3.1.2-bin/conf# cp hive-default.xml.template hive-site.xml
root@Master:/usr/local/apache-hive-3.1.2-bin/conf# vim hive-site.xml
刪除 hive-site.xml中3215,96 特殊字符
在hive-site.xml文件中加上以下內(nèi)容
<property>
<name>system:java.io.tmpdir</name>
<value>/tmp/hive/java</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
3.配置hive相關(guān)環(huán)境變量
root@Master:/usr/local/apache-hive-3.1.2-bin/conf# vi /etc/profile
#hive
export HIVE_HOME="/usr/local/apache-hive-3.1.2-bin"
export PATH=$PATH:$HIVE_HOME/bin
//source一下配置文件
root@Master:/usr/local/apache-hive-3.1.2-bin/conf# source /etc/profile
4.配置hadoop作為元數(shù)據(jù)庫
拉取mysql鏡像
docker pull mysql:8.0.22
#建立容器
docker run --name mysql_hive -p 4306:3306 --net hadoop --ip 172.19.0.5 -v /root/mysql:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=abc123456 -d mysql:8.0.18
#進入容器
docker exec -it mysql_hive bash
#進入myslq
mysql -uroot -p
#密碼上面建立容器時候已經(jīng)設置abc123456
#建立hive數(shù)據(jù)庫
create database hive;
#修改遠程連接權(quán)限
ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'abc123456';
回去Master容器,修改關(guān)聯(lián)數(shù)據(jù)庫的配置
docker exec -it Master bash
vi /usr/local/apache-hive-3.1.2-bin/conf/hive-site.xml
#還請注意hive配置文件里面使用&作為分隔,高版本myssql需要SSL驗證,在這里設置關(guān)閉
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.19.0.5:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<!--MySQL8的jar包com.mysql.cj.jdbc.Driver,mysql5用com.mysql.jdbc.Driver-->
<description>mysql-jdbc驅(qū)動</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>mysql用戶</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>abc123456</value>
<description>mysql密碼</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
mysql驅(qū)動上傳到hive的lib下
root@Master:/usr/local# cp mysql-connector-java-5.1.49.jar /usr/local/apache-hive-3.1.2-bin/lib
對hive的lib文件夾下的部分文件做修改,不然初始化數(shù)據(jù)庫的時候會報錯
#slf4j這個包hadoop及hive兩邊只能有一個,這里刪掉hive這邊
root@Master:/usr/local/apache-hive-3.1.2-bin/lib# rm log4j-slf4j-impl-2.10.0.jar
#guava這個包hadoop及hive兩邊只刪掉版本低的那個,把版本高的復制過去,這里刪掉hive,復制hadoop的過去
root@Master:/usr/local/hadoop/share/hadoop/common/lib# cp guava-27.0-jre.jar /usr/local/apache-hive-3.1.2-bin/lib
root@Master:/usr/local/hadoop/share/hadoop/common/lib# rm /usr/local/apache-hive-3.1.2-bin/lib/guava-19.0.jar
#把文件hive-site.xml第3225行的特殊字符刪除
root@Master: vim /usr/local/apache-hive-3.1.2-bin/conf/hive-site.xml
五、初始化元數(shù)據(jù)庫
root@Master:/usr/local/apache-hive-3.1.2-bin/bin# schematool -initSche
Hbase
hbase鏡像采取的是hbase-2.2.6,地址:https://mirror.bit.edu.cn/apache/hbase/2.2.6/
1.上傳hbase鏡像
//復制鏡像到master服務器
docker cp hbase-2.2.6-bin.tar.gz Master:/usr/local
//進入到目錄后解壓
root@Master:/# cd /usr/local/
root@Master:/usr/local# tar -zxvf hbase-2.2.6-bin.tar.gz
2.配置hbase環(huán)境變量
#hbase
export HBASE_HOME=/usr/local/hbase-2.2.6
export PATH=$HBASE_HOME/bin:$PATH
將hadoop/etc/hadoop下的core-site.xml和hdfs-site.xml復制到hbase/conf/下
配置hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>localhost:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/yourname/zoodata</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
#在hbase/lib/client-facing-thirdparty 下
mv ./slf4j-log4j12-1.7.30.jar ./slf4j-log4j12-1.7.30.jar.bak
#將hbase的slf4j文件改名但不刪除,作備份作用,以免和hadoop的日志沖突
Spark
spark鏡像采取的是spark3.0.1,下載地址:https://mirror.bit.edu.cn/apache/spark/spark-3.0.1
1.上傳spark鏡像
docker cp spark-3.0.1-bin-hadoop3.2.tgz Master:/usr/local
//進入到目錄后解壓
root@Master:/# cd /usr/local/
root@Master:/usr/local# tar -zxvf spark-3.0.1-bin-hadoop3.2.tgz
2.配置spark環(huán)境
root@Master:/usr/local/spark-3.0.1/conf# vi spark-env.sh
#spark
export SPARK_MASTER_HOST=Master
export SPARK_MEM=1G
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=1G
#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#export SCALA_HOME=/usr/local/scala-2.12.12
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
root@Master:/usr/local/spark-3.0.1/conf# vi spark-defaults.conf
spark.master yarn
3.啟動spark
//啟動spark
root@Master:/usr/local/spark-3.0.1/sbin# ./start-all.sh
//查看
root@Master:/usr/local/hbase-2.2.6/conf# jps
8449 Master
8532 Worker
8582 Jps
7144 ResourceManager
6810 SecondaryNameNode
7275 NodeManager
6621 DataNode
6493 NameNode