前面的準備工作做好之后,我們來搭建帶Kerberos和SASL的完全分布式的Hadoop集群。
1. 集群環(huán)境準備
我們現(xiàn)在有3臺服務器,服務器列表如下:
| hostname | ip | 作用 |
|---|---|---|
| master | 10.16.. | NameNode, DataNode, ResourceManager, JobManager |
| slave1 | 10.16.. | DataNode, JobManager |
| slave2 | 10.16.. | DataNode, JobManager |
1.1 修改hosts文件
在每臺機器上執(zhí)行以下命令獲取hostname:
$ hostname
將每臺機器的hostname和ip,添加到所有機器的/etc/hosts文件中,所有機器的/etc/hsots文件最終如以下所示:
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.16.195.254 master
10.16.196.1 slave1
10.16.196.5 slave2
在任意一臺機器,通過域名可以ping到對應的ip地址,則配置成功。
1.2 配置JDK環(huán)境
- 通過yum安裝JDK1.8版本的環(huán)境,命令如下:
$ yum install java-1.8.0-openjdk*
- 獲取java的安裝目錄:
$ whereis java
java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/java /usr/share/man/man1/java.1.gz
$ ls -l /usr/lib/jvm/
total 0
lrwxrwxrwx 1 root root 26 Dec 20 2018 java -> /etc/alternatives/java_sdk
lrwxrwxrwx 1 root root 32 Dec 20 2018 java-1.8.0 -> /etc/alternatives/java_sdk_1.8.0
lrwxrwxrwx 1 root root 40 Dec 20 2018 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdk
drwxr-xr-x 9 root root 101 Dec 20 2018 java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64
drwxr-xr-x 9 root root 101 Dec 20 2018 java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug
lrwxrwxrwx 1 root root 34 Dec 20 2018 java-openjdk -> /etc/alternatives/java_sdk_openjdk
lrwxrwxrwx 1 root root 21 Dec 20 2018 jre -> /etc/alternatives/jre
lrwxrwxrwx 1 root root 27 Dec 20 2018 jre-1.8.0 -> /etc/alternatives/jre_1.8.0
lrwxrwxrwx 1 root root 35 Dec 20 2018 jre-1.8.0-openjdk -> /etc/alternatives/jre_1.8.0_openjdk
lrwxrwxrwx 1 root root 51 Dec 20 2018 jre-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64 -> java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/jre
lrwxrwxrwx 1 root root 57 Dec 20 2018 jre-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug -> java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64-debug/jre
lrwxrwxrwx 1 root root 29 Dec 20 2018 jre-openjdk -> /etc/alternatives/jre_openjdk
- 配置JAVA_HOME等環(huán)境變量
打開/etc/profile文件,添加以下的內(nèi)容:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
1.3 配置SSH免密登錄
在每臺機器上生成ssh公鑰和私鑰對,命令如下:
$ ssh-keygen -t rsa
生成好的公鑰在/.ssh/id_rsa.pub文件中,將所有機器的公鑰寫入到每臺機器的/.ssh/authorized_keys文件中,并設置authorized_keys文件的權(quán)限為600。authorized_keys文件的示例如下:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD2pELeQD25P/Mu+eKmfwAN7hyrixm243YYiPLn4goFe8q/uI9cUKivYNg14bGCavta8fVE90x4WJysXEjMA7SWk5Ic3jS6gEoFhXQ1F0FISpv0eAamikWHASgQNrqY3KGaEm1dxR8lV3/lc0TWjv9QEO3wCw8zj7l4r8LQL0wIaEZ8NB8ElSRx3yFHl6FZE2XEiu/+j61q9U612WMNXqgvTMS8Z5zDujuSgO4mVSOVTyfkE5baIbeZGGKjdNT/4400KBa5k0Qs+VGBaEZs5FxtsmXqBdG/r6Aef7yZivFPNz0mXqFknp5OAafpe/cfPr3weqmCePbUBVOnDIAQzEfj master
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5kUfv1h9fuWp/3xqEqlDcmrz0Bk2n0+/LLBeShtLpFn+/krF4az6BN5CAFCY5NBgebhfw/9AQSUmyrr9aUXkpi7664QweJsJAne4mxi9/lKkQi+2liV2mBVNly1ax8+tf6P3OKgSSiD+XSVzlr5StIQE9M/Cr67lELHjhV/rvY2ALEQXbZH666SWLL+KPkshLvtpRVqFQKUFPvn2cXBr+YShCBm7DasZcDAGg4XqlxCLaeyI4N+zsrrr/52cGHT/0yJKK42zJyZ2pyVN51rGDwQh0T+6AMEp2YJUo/o+2P9hD/HZTepmnCBef/UyUR6u0xgvBPK/QYvcgziFr/85P slave1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvUd0rjjGVz2umcWRMt3YHzxQBwIGdNo7QdXZcnILuTPqQ4PsIUTe+ULYrHcHlj+l6Z7XBO5ABd2BKks0Z8PR1eQyjY8yKv+P0LCe/fGKppsXzHvluexEe14aE95yI1aPguxAAqrLZ/NLhoQjoal2RvrGv6d/wLBPOdWx8DO2s2zbI5AuTawOyolSyOcSE5Mrgg3ahiYSs1OcopU8/pex3rOolfZVNbyyOjipL/QXdkcLLXQ0rpD41DzJzzgkNPmaG41rdcqjzFqLpE5O1qdFetfwcg1ZBniR3EdajGyd7jcccqXg2fWC/7+UarC4Dd7Yl9sup7zkExw/QhPiMY8fh slave2
完成每臺機器的配置后,可以通過ssh直接登錄其他機器。
2. 配置Hadoop
2.1 下載Hadoop的安裝包
在Hadoop的下載頁選擇2.8.5版本的二進制文件,并下載在master節(jié)點的/data目錄中,http://hadoop.apache.org/releases.html
下載完成后,將hadoop的tar.gz包scp到其他slave節(jié)點的相同目錄下,并在所有機器上解壓縮安裝包。
$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
$ scp hadoop-2.8.5.tar.gz root@slave1:/data
$ scp hadoop-2.8.5.tar.gz root@slave2:/data
$ tar -xvf hadoop-2.8.5.tar.gz
2.2 Hadoop的環(huán)境變量配置
在所有機器的/etc/profile文件中添加以下的內(nèi)容:
export HADOOP_HOME=/data/hadoop-2.8.5
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
2.3 配置kerberos賬號
Hadoop中通常會使用三個kerberos賬號:hdfs,yarn和HTTP,添加賬號的命令如下:
$ kadmin.local -q "addprinc -randkey hdfs/master@HADOOP.COM"
$ kadmin.local -q "addprinc -randkey yarn/master@HADOOP.COM"
$ kadmin.local -q "addprinc -randkey HTTP/master@HADOOP.COM"
生成每個賬號的keytab文件:
$ kadmin.local -q "xst -k hdfs.keytab hdfs/master@HADOOP.COM"
$ kadmin.local -q "xst -k yarn.keytab yarn/master@HADOOP.COM"
$ kadmin.local -q "xst -k HTTP.keytab HTTP/master@HADOOP.COM"
將三個keytab文件合并為一個:
$ ktutil
ktutil: rkt hdfs.keytab
ktutil: rkt yarn.keytab
ktutil: rkt HTTP.keytab
ktutil: wkt hadoop.keytab
ktutil: q
2.4 分發(fā)keytab文件并登錄
將此文件移動到hadoop目錄的etc/hadoop目錄下,并scp到其他slave機器的相同目錄:
$ mv hadoop.keytab /data/hadoop-2.8.5/etc/hadoop/
$ scp /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab root@slave1:/data/hadoop-2.8.5/etc/hadoop
$ scp /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab root@slave2:/data/hadoop-2.8.5/etc/hadoop
配置crontab每天登錄一次:
$ crontab -l
0 0 * * * kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab hdfs/master@HADOOP.COM
0 0 * * * kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab yarn/master@HADOOP.COM
0 0 * * * kinit -k -t /data/hadoop-2.8.5/etc/hadoop/hadoop.keytab HTTP/master@HADOOP.COM
2.5 修改Hadoop配置文件
配置文件的路徑在/data/hadoop-2.8.5/etc/hadoop下,
- slaves
master
slave1
slave2
- core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>027</value>
</property>
</configuration>
- mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ads-data-web-online012-bjdxt9p</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>yarn.resourcemanager.principal</name>
<value>yarn/master@HADOOP.COM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>yarn.nodemanager.principal</name>
<value>yarn/master@HADOOP.COM</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>10.16.195.254:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
<description>max number of file which can be opened in a datanode</description>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.namenode.kerberos.https.principal</name>
<value>HTTP/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1034</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1036</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.datanode.kerberos.https.principal</name>
<value>HTTP/master@HADOOP.COM</value>
</property>
<!-- datanode SASL配置 -->
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>integrity</value>
</property>
<!--journalnode 配置-->
<property>
<name>dfs.journalnode.keytab.file</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.journalnode.kerberos.principal</name>
<value>hdfs/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.journalnode.kerberos.internal.spnego.principal</name>
<value>HTTP/master@HADOOP.COM</value>
</property>
<!--webhdfs-->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.nfs.kerberos.principal</name>
<value>hdfs/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.nfs.keytab.file</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.secondary.https.address</name>
<value>10.16.195.254:50495</value>
</property>
<property>
<name>dfs.secondary.https.port</name>
<value>50495</value>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/data/hadoop-2.8.5/etc/hadoop/hadoop.keytab</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>hdfs/master@HADOOP.COM</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.https.principal</name>
<value>HTTP/master@HADOOP.COM</value>
</property>
</configuration>
- 分別hadoop配置文件到其他機器
$ scp /data/hadoop-2.8.5/etc/hadoop/* root@slave1:/data/hadoop-2.8.5/etc/hadoop
$ scp /data/hadoop-2.8.5/etc/hadoop/* root@slave2:/data/hadoop-2.8.5/etc/hadoop
2.6 NameNode格式化
$ hdfs namenode -format
19/06/30 22:23:45 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = root
STARTUP_MSG: host = master/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.5
...
19/06/30 22:23:46 INFO util.ExitUtil: Exiting with status 0
19/06/30 22:23:46 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/127.0.0.1
3 啟動Hadoop集群
3.1 啟動HDFS集群
$ start-dfs.sh
$ jps
19282 DataNode
28324 Jps
19480 SecondaryNameNode
18943 NameNode
訪問NameNode UI:https://10.16.195.254:50470/
3.2 啟動Yarn集群
$ start-yarn.sh
$ jps
21088 NodeManager
19282 DataNode
28324 Jps
19480 SecondaryNameNode
18943 NameNode
20959 ResourceManager
訪問Yarn UI:http://10.16.195.254:8088/
至此Hadoop完全分布式的集群搭建完成。