單機(jī)安裝Hadoop、spark環(huán)境

作者環(huán)境:

  • CPU: E5-2678 v3、32G DDR4
  • Centos7 2003
  • java 1.8
  • hadoop 2.10.1
  • hive 2.3.7
  • scala 2.11.8
  • spark 2.4.7

由于版本更新快,以上軟件就不放下載鏈接了。我會(huì)把聯(lián)系方式放在文章最后,若是需要,可以聯(lián)系我。

修改主機(jī)名

編輯hostname,修改為master:

nano /etc/hostname

重啟:

reboot

安裝Java

將jdk-8u261-linux-x64.tar.gz復(fù)制到/home目錄下解壓:

cd /home
tar -xvf jdk-8u261-linux-x64.tar.gz

配置環(huán)境變量:

nano ~/.bashrc

追加下面的內(nèi)容:

export JAVA_HOME=/home/jdk1.8.0_261
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

刷新環(huán)境變量:

source ~/.bashrc

查看java版本:

java -version

輸出:

java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)

安裝Hadoop

將hadoop-2.10.1.tar.gz復(fù)制到/home目錄下解壓:

cd /home
tar -xvf hadoop-2.10.1.tar.gz

配置hadoop-env.sh

nano /home/hadoop-2.10.1/etc/hadoop/hadoop-env.sh

找到JAVA_HOME配置項(xiàng),修改為:

export JAVA_HOME=/home/jdk1.8.0_261

配置core-site.xml

nano /home/hadoop-2.10.1/etc/hadoop/core-site.xml

用以下文本代替:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoopdata</value>
    </property>
</configuration>

配置hdfs-site.xml

nano /home/hadoop-2.10.1/etc/hadoop/hdfs-site.xml

用以下文本代替:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

配置yarn-site.xml

nano /home/hadoop-2.10.1/etc/hadoop/yarn-site.xml

用以下文本代替:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:18040</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:18030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:18025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:18141</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:18088</value>
    </property>
</configuration>

配置mapred-site.xml

cp /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml.template /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml
nano /home/hadoop-2.10.1/etc/hadoop/mapred-site.xml

用以下文本代替:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置Hadoop環(huán)境變量

nano ~/.bashrc

追加下面內(nèi)容:

export HADOOP_HOME=/home/hadoop-2.10.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

刷新環(huán)境變量:

source ~/.bashrc

創(chuàng)建數(shù)據(jù)目錄

cd /home
mkdir hadoopdata

格式化HDFS文件系統(tǒng):

hdfs namenode -format

查看版本:

hadoop version

輸出:

Hadoop 2.10.1
Subversion https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816
Compiled by centos on 2020-09-14T13:17Z
Compiled with protoc 2.5.0
From source with checksum 3114edef868f1f3824e7d0f68be03650
This command was run using /home/hadoop-2.10.1/share/hadoop/common/hadoop-common-2.10.1.jar

啟動(dòng)Hadoop

cd /home/hadoop-2.10.1/sbin
./start-all.sh

輸入jps:

jps

輸出:

2323 NameNode
2979 ResourceManager
3510 Jps
2505 DataNode
3323 NodeManager
2748 SecondaryNameNode

WEB UI界面

  • NameNode和DataNode: http://192.168.31.66:50070/
  • Yarn: http://192.168.31.66:18088/

安裝Hive

安裝MariaDB

yum install mariadb-server -y

啟動(dòng)MariaDB:

systemctl start mariadb
systemctl enable mariadb

修改MariaDB密碼:

mysql_secure_installation

登錄數(shù)據(jù)庫(kù):

mysql -uroot -p

添加數(shù)據(jù):

grant all on *.* to hadoop@'%' identified by '123456';
grant all on *.* to hadoop@'localhost' identified by '123456';
grant all on *.* to hadoop@'master' identified by '123456';
flush privileges;
create database hivedata;
quit;

安裝Hive

將apache-hive-2.3.7-bin.tar.gz復(fù)制到/home目錄下解壓:

tar -xvf apache-hive-2.3.7-bin.tar.gz

配置hive-site.xml,文件默認(rèn)不存在,需要手動(dòng)創(chuàng)建:

nano /home/apache-hive-2.3.7-bin/conf/hive-site.xml

添加以下內(nèi)容:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>false</value>
    </property>
    <property>  
        <name>hive.metastore.uris</name>  
        <value>thrift://localhost:9083</value>  
        <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>  
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>localhost</value>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hivedata?characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value> 
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
    </property>
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.server2.thrift.client.user</name>
        <value>hive</value>
    </property>
    <property>
        <name>hive.server2.thrift.client.password</name>
        <value>hive123456</value>
    </property>
</configuration>
  • javax.jdo.option.ConnectionURL: 該項(xiàng)配置中的hivedata,需要與剛才mysql創(chuàng)建的庫(kù)名一致;
  • javax.jdo.option.ConnectionUserName: 剛才配置的mysql用戶;
  • javax.jdo.option.ConnectionPassword: 剛才配置的mysql密碼;
  • hive.server2.thrift.client.user: 登錄hiveserver2的用戶;
  • hive.server2.thrift.client.password: 登錄hiveserver2的密碼。

將mysql的java connector復(fù)制到依賴庫(kù)中:

cp mysql-connector-java-5.1.36-bin.jar /home/apache-hive-2.3.7-bin/lib/

配置Hive環(huán)境變量:

nano ~/.bashrc

添加以下內(nèi)容:

export HIVE_HOME=/home/apache-hive-2.3.7-bin
export PATH=$PATH:$HIVE_HOME/bin

刷新環(huán)境變量:

source ~/.bashrc

初始化Hive數(shù)據(jù)庫(kù):

schematool -dbType mysql -initSchema

啟動(dòng)Hive:

hive

創(chuàng)建個(gè)數(shù)據(jù)庫(kù):

show databases;
create database hive_data;
show databases;
quit;

啟動(dòng)hiveserver2:

hive --service hiveserver2

啟動(dòng)metastore:

hive --service metastore

安裝spark

將scala-2.11.8.tgz、spark-2.4.7-bin-hadoop2.7.tgz復(fù)制到/home目錄下解壓:

tar -xvf scala-2.11.8.tgz
tar -xvf spark-2.4.7-bin-hadoop2.7.tgz

配置環(huán)境變量:

nano ~/.bashrc

添加以下內(nèi)容:

export SCALA_HOME=/home/scala-2.11.8
export SPARK_HOME=/home/spark-2.4.7-bin-hadoop2.7

刷新環(huán)境變量:

source ~/.bashrc

配置spark-env.sh:

cd /home/spark-2.4.7-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh
nano spark-env.sh

添加以下內(nèi)容:

export JAVA_HOME=/home/jdk1.8.0_261
export HADOOP_HOME=/home/hadoop-2.10.1
export HIVE_HOME=/home/apache-hive-2.3.7-bin
export SCALA_HOME=/home/scala-2.11.8
export HIVE_CONF_DIR=$HIVE_HOME/conf
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=24G

配置slave:

cp slaves.template slaves
nano slaves

將localhost改為master。
啟動(dòng)spark:

cd /home/spark-2.4.7-bin-hadoop2.7/sbin
./start-all.sh
  • WEB UI查看: http://192.168.31.66:8080/

配置pyspark:
將spark里的pyspark復(fù)制到python的site-packages里就行:

cd /home/spark-2.4.7-bin-hadoop2.7/python/
cp -rf pyspark /home/anaconda3/envs/tf12/lib/python3.6/site-packages/

安裝需要的第三方包:

pip install py4j

QQ:1982248707,完成。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容