作者環(huán)境:
- CPU: E5-2678 v3、32G DDR4
- Centos7 2003
- java 1.8
- hadoop 2.10.1
- hive 2.3.7
- scala 2.11.8
- spark 2.4.7
由于版本更新快,以上軟件就不放下載鏈接了。我會(huì)把聯(lián)系方式放在文章最后,若是需要,可以聯(lián)系我。
修改主機(jī)名
編輯hostname,修改為master:
nano /etc/hostname
重啟:
reboot
安裝Java
將jdk-8u261-linux-x64.tar.gz復(fù)制到/home目錄下解壓:
cd /home
tar -xvf jdk-8u261-linux-x64.tar.gz
配置環(huán)境變量:
nano ~/.bashrc
追加下面的內(nèi)容:
export JAVA_HOME=/home/jdk1.8.0_261
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
刷新環(huán)境變量:
source ~/.bashrc
查看java版本:
java -version
輸出:
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
安裝Hadoop
將hadoop-2.10.1.tar.gz復(fù)制到/home目錄下解壓:
cd /home
tar -xvf hadoop-2.10.1.tar.gz
配置hadoop-env.sh
nano /home/hadoop-2.10.1/etc/hadoop/hadoop-env.sh
找到JAVA_HOME配置項(xiàng),修改為:
export JAVA_HOME=/home/jdk1.8.0_261
配置core-site.xml
nano /home/hadoop-2.10.1/etc/hadoop/core-site.xml
用以下文本代替:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoopdata</value>
</property>
</configuration>
配置hdfs-site.xml
nano /home/hadoop-2.10.1/etc/hadoop/hdfs-site.xml
用以下文本代替:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
配置yarn-site.xml
nano /home/hadoop-2.10.1/etc/hadoop/yarn-site.xml
用以下文本代替:
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
</configuration>
配置mapred-site.xml
cp /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml.template /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml
nano /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml
用以下文本代替:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置Hadoop環(huán)境變量
nano ~/.bashrc
追加下面內(nèi)容:
export HADOOP_HOME=/home/hadoop-2.10.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
刷新環(huán)境變量:
source ~/.bashrc
創(chuàng)建數(shù)據(jù)目錄
cd /home
mkdir hadoopdata
格式化HDFS文件系統(tǒng):
hdfs namenode -format
查看版本:
hadoop version
輸出:
Hadoop 2.10.1
Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816
Compiled by centos on 2020-09-14T13:17Z
Compiled with protoc 2.5.0
From source with checksum 3114edef868f1f3824e7d0f68be03650
This command was run using /home/hadoop-2.10.1/share/hadoop/common/hadoop-common-2.10.1.jar
啟動(dòng)Hadoop
cd /home/hadoop-2.10.1/sbin
./start-all.sh
輸入jps:
jps
輸出:
2323 NameNode
2979 ResourceManager
3510 Jps
2505 DataNode
3323 NodeManager
2748 SecondaryNameNode
WEB UI界面
- NameNode和DataNode: http://192.168.31.66:50070/
- Yarn: http://192.168.31.66:18088/
安裝Hive
安裝MariaDB
yum install mariadb-server -y
啟動(dòng)MariaDB:
systemctl start mariadb
systemctl enable mariadb
修改MariaDB密碼:
mysql_secure_installation
登錄數(shù)據(jù)庫(kù):
mysql -uroot -p
添加數(shù)據(jù):
grant all on *.* to hadoop@'%' identified by '123456';
grant all on *.* to hadoop@'localhost' identified by '123456';
grant all on *.* to hadoop@'master' identified by '123456';
flush privileges;
create database hivedata;
quit;
安裝Hive
將apache-hive-2.3.7-bin.tar.gz復(fù)制到/home目錄下解壓:
tar -xvf apache-hive-2.3.7-bin.tar.gz
配置hive-site.xml,文件默認(rèn)不存在,需要手動(dòng)創(chuàng)建:
nano /home/apache-hive-2.3.7-bin/conf/hive-site.xml
添加以下內(nèi)容:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>localhost</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hivedata?characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.client.user</name>
<value>hive</value>
</property>
<property>
<name>hive.server2.thrift.client.password</name>
<value>hive123456</value>
</property>
</configuration>
- javax.jdo.option.ConnectionURL: 該項(xiàng)配置中的hivedata,需要與剛才mysql創(chuàng)建的庫(kù)名一致;
- javax.jdo.option.ConnectionUserName: 剛才配置的mysql用戶;
- javax.jdo.option.ConnectionPassword: 剛才配置的mysql密碼;
- hive.server2.thrift.client.user: 登錄hiveserver2的用戶;
- hive.server2.thrift.client.password: 登錄hiveserver2的密碼。
將mysql的java connector復(fù)制到依賴庫(kù)中:
cp mysql-connector-java-5.1.36-bin.jar /home/apache-hive-2.3.7-bin/lib/
配置Hive環(huán)境變量:
nano ~/.bashrc
添加以下內(nèi)容:
export HIVE_HOME=/home/apache-hive-2.3.7-bin
export PATH=$PATH:$HIVE_HOME/bin
刷新環(huán)境變量:
source ~/.bashrc
初始化Hive數(shù)據(jù)庫(kù):
schematool -dbType mysql -initSchema
啟動(dòng)Hive:
hive
創(chuàng)建個(gè)數(shù)據(jù)庫(kù):
show databases;
create database hive_data;
show databases;
quit;
啟動(dòng)hiveserver2:
hive --service hiveserver2
啟動(dòng)metastore:
hive --service metastore
安裝spark
將scala-2.11.8.tgz、spark-2.4.7-bin-hadoop2.7.tgz復(fù)制到/home目錄下解壓:
tar -xvf scala-2.11.8.tgz
tar -xvf spark-2.4.7-bin-hadoop2.7.tgz
配置環(huán)境變量:
nano ~/.bashrc
添加以下內(nèi)容:
export SCALA_HOME=/home/scala-2.11.8
export SPARK_HOME=/home/spark-2.4.7-bin-hadoop2.7
刷新環(huán)境變量:
source ~/.bashrc
配置spark-env.sh:
cd /home/spark-2.4.7-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh
nano spark-env.sh
添加以下內(nèi)容:
export JAVA_HOME=/home/jdk1.8.0_261
export HADOOP_HOME=/home/hadoop-2.10.1
export HIVE_HOME=/home/apache-hive-2.3.7-bin
export SCALA_HOME=/home/scala-2.11.8
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=24G
配置slave:
cp slaves.template slaves
nano slaves
將localhost改為master。
啟動(dòng)spark:
cd /home/spark-2.4.7-bin-hadoop2.7/sbin
./start-all.sh
- WEB UI查看: http://192.168.31.66:8080/
配置pyspark:
將spark里的pyspark復(fù)制到python的site-packages里就行:
cd /home/spark-2.4.7-bin-hadoop2.7/python/
cp -rf pyspark /home/anaconda3/envs/tf12/lib/python3.6/site-packages/
安裝需要的第三方包:
pip install py4j
QQ:1982248707,完成。