1 創(chuàng)建Hadoop用戶
1.1 創(chuàng)建新用戶
用戶名為hadoop,使用/bin/bash作為shell
$ sudo useradd -m hadoop -s /bin/bash
1.2 修改密碼
$ sudo passwd hadoop
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
1.2 為hadoop用戶添加管理員權(quán)限
$ sudo adduser hadoop sudo
Adding user `hadoop' to group `sudo' ...
Adding user hadoop to group sudo
Done.
2 安裝java環(huán)境
2.1 安裝
$ sudo apt-get install default-jre default-jdk
2.2 配置環(huán)境變量
$ vim ~/.bashrc
后面加入export JAVA_HOME=/usr/lib/jvm/default-java
然后使環(huán)境變量生效:
$ source ~/.bashrc
2.3 測試java是否安裝成功
$ echo $JAVA_HOME
/usr/lib/jvm/default-java
$ java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
3 設(shè)置SSH
SSH是Secure Shell的縮寫,SSH由客戶端和服務(wù)端構(gòu)成,服務(wù)端是一個守護進程,在后臺運行并相應(yīng)來自客戶端的請求,客戶端包含遠程復(fù)制scp、安全文件傳輸sftp,遠程登錄slogin等運用程序。
Ubuntu已經(jīng)默認安裝了SSH客戶端,還需要安裝SSH服務(wù)端。
【注意】:Hadoop并沒有提供SSH密碼登錄的形式,所以需要將所有機器配置為無密碼登錄。
3.1 安裝SSH服務(wù)端
$ sudo apt-get install openssh-server
3.2 登錄localhost
$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:MCT7ubGt3sPlkvS9v//KhAoa7vBO+EVPJN/JXenC8XM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hadoop@localhost's password:
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-42-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
243 packages can be updated.
11 updates are security updates.
之后會在~/文件夾下發(fā)現(xiàn)一個.ssh文件
3.3 設(shè)置為無密碼登錄
$ cd ~/.ssh/
$ ssh-keygen -t rsa #出現(xiàn)提示直接按enter
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:FaavA0T6j8XH0clbVu0pq5hkad7kADUBibL/76I2U00 hadoop@ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| o.o.+ o|
| . + . = + . ..|
| + . o + + o..|
| . o o E . = ..|
| . o S = . o |
| . * X . . |
| + O B . |
| + o = + |
| ..+ +o |
+----[SHA256]-----+
$ cat ./id_rsa.pub >> ./authorized_keys #加入授權(quán)
此時就直接使用$ ssh localhost,無密碼登錄了。
4 安裝Hadoop
Hadoop的安裝包括3中模式:
(1)單機模式:只在一臺機器上運行,存儲采用本地文件系統(tǒng),沒有采用分布式文件系統(tǒng)HDFS。
(2)偽分布式模式:存儲采用分布式文件系統(tǒng)HDFS,但是HDFS的節(jié)點和數(shù)據(jù)節(jié)點都在同一節(jié)點。
(2)分布式模式:存儲采用分布式文件系統(tǒng)HDFS,而且HDFS的節(jié)點和數(shù)據(jù)節(jié)點位于不同機器上。
Hadoop的下載:http://mirrors.cnnic.cn/apache/hadoop/common
4.1 單機模式配置
下載安裝包后解壓即可使用:
$ sudo tar -zxvf hadoop-2.7.1.tar.gz -C /usr/local
$ cd /usr/local/
$ sudo mv ./hadoop-2.7.1/ ./hadoop # 將文件夾名改為hadoop
$ sudo chown -R hadoop ./hadoop # 修改文件權(quán)限
查看Hadoop版本信息:
$ cd /usr/local/hadoop/bin
$ ./hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
Hadoop附帶了很多例子,運行如下命令可以查看:
$ ./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
下面運行g(shù)rep程序
$ cd /usr/local/hadoop
$ mkdir input
$ cp ./etc/hadoop/*.xml ./input # 將配置文件復(fù)制到input目錄下
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
$ cat ./output/* # 查看運行結(jié)果
1 dfsadmin
運行成功后,可以看到grep程序?qū)nput文件夾作為輸入,從文件夾中篩選出所有符合正則表達式dfs[a-z]+的單詞,并把單詞出現(xiàn)的次數(shù)的統(tǒng)計結(jié)果輸出到/usr/local/hadoop/output文件夾下。
【注意】:如果再次運行上述命令,會報錯,因為Hadoop默認不會覆蓋output輸出結(jié)果的文件夾,所有需要先刪除output文件夾才能再次運行。
4.2 偽分布式模式配置
在單個節(jié)點(一臺機器上)以偽分布式的方式運行。
4.2.1 修改配置文件
需要修改/usr/local/hadoop/etc/hadoop/文件夾下的core-site.xml和hdfs-site.xml 文件。
core-site.xml文件:
將
<configuration>
</configuration>
修改為:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
-
hadoop.tmp.dir用于保存臨時文件,如果沒有配置這個參數(shù),則默認使用的臨時目錄為/tmp/hadoo-hadoop,這個目錄在Hadoop重啟后會被系統(tǒng)清理掉。 -
fs.defaultFS用于指定HDFS的訪問地址。
hdfs-site.xml文件修改如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
-
dfs.replicaion:指定副本數(shù)量,在分布式文件系統(tǒng)中,數(shù)據(jù)通常會被冗余的存儲多份,以保證可靠性和安全性,但是這里用的是偽分布式模式,節(jié)點只有一個,也有就只有一個副本。 -
dfs.namenode.name.di:設(shè)定名稱節(jié)點元數(shù)據(jù)的保存目錄 -
dfs.datanode.data.dir:設(shè)定數(shù)據(jù)節(jié)點的數(shù)據(jù)保存目錄
這里,名稱節(jié)點和數(shù)據(jù)節(jié)點必須設(shè)定。
【注意】:Hadoop的運行方式是由配置文件決定的,如果想從偽分布式模式切換回單機模式,只需刪除core-site.xml文件中的配置項即可
4.2.2 執(zhí)行名稱節(jié)點格式化
執(zhí)行如下命令:
$ cd /usr/local/hadoop
$ ./bin/hdfs namenode -format
【錯誤】:出現(xiàn)Exiting with status 1表示出現(xiàn)錯誤
19/01/11 18:38:02 ERROR namenode.NameNode: Failed to start namenode.
java.lang.IllegalArgumentException: URI has an authority component
at java.io.File.<init>(File.java:423)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.getStorageDirectory(NNStorage.java:329)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:276)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:247)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:985)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)
19/01/11 18:38:02 INFO util.ExitUtil: Exiting with status 1
19/01/11 18:38:02 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
【解決】:檢查hdfs-site.xml的配置
如果出現(xiàn)/usr/local/hadoop/tmp/dfs/name has been successfully formatted.和 Exiting with status 0,表示格式化成功。
19/01/11 18:46:35 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
19/01/11 18:46:36 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/01/11 18:46:36 INFO util.ExitUtil: Exiting with status 0
4.2.3 啟動Hadoop
$ cd /usr/local/hadoop
$ ./sbin/start-dfs.sh
【錯誤】:
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
【解決】:
$ echo $JAVA_HOME
/usr/lib/jvm/default-java
查看后發(fā)現(xiàn)JAVA_HOME路徑已經(jīng)設(shè)置,那就只能將/hadoop/etc/hadoop/hadoop-env.sh文件的JAVA_HOME改為絕對路徑了。將export JAVA_HOME=$JAVA_HOME改為
export JAVA_HOME=/usr/lib/jvm/default-java
用jps命令查看Hadoop是否啟動成功,如果出現(xiàn)DataNode、NameNode、SecondaryNameNode的進程說明啟動成功。
$ jps
4821 Jps
4459 DataNode
4348 NameNode
4622 SecondaryNameNode
如果還要問題,重復(fù)如下命令:
$ ./sbin/stop-dfs.sh # 關(guān)閉
$ rm -r ./tmp # 刪除 tmp 文件,注意這會刪除 HDFS中原有的所有數(shù)據(jù)
$ ./bin/hdfs namenode -format # 重新格式化名稱節(jié)點
$ ./sbin/start-dfs.sh # 重啟
4.2.4 使用瀏覽器查看HDFS信息
在瀏覽器中打開鏈接:http://localhost:50070/dfshealth.html#tab-overview
即可查看:

4.2.5 運行Hadoop偽分布式實例
$ cd /usr/local/hadoop
$ ./bin/hdfs dfs -mkdir -p /user/hadoop # 在HDFS中創(chuàng)建用戶目錄
$ ./bin/hdfs dfs -mkdir input #在HDFS中創(chuàng)建hadoop用戶對應(yīng)的input目錄
$ ./bin/hdfs dfs -put ./etc/hadoop/*.xml input #把本地文件復(fù)制到HDFS中
$ ./bin/hdfs dfs -ls input #查看文件列表
Found 8 items
-rw-r--r-- 1 hadoop supergroup 4436 2019-01-11 19:35 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1075 2019-01-11 19:35 input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 9683 2019-01-11 19:35 input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1130 2019-01-11 19:35 input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2019-01-11 19:35 input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2019-01-11 19:35 input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 5511 2019-01-11 19:35 input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2019-01-11 19:35 input/yarn-site.xml
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
....
$ ./bin/hdfs dfs -cat output/* #查看運行結(jié)果
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
再次運行需要刪除output文件夾
$ ./bin/hdfs dfs -rm -r output # 刪除 output 文件夾
4.2.6 關(guān)閉Hadoop
使用命令:
./sbin/stop-dfs.sh
下次啟動時不需要再執(zhí)行節(jié)點格式化命令(否則會報錯),只需要直接運行start-dfs.sh命令即可。
5 總結(jié)
hadoop的安裝步驟:
1 創(chuàng)建Hadoop用戶
2 安裝java環(huán)境
3 設(shè)置SSH
4 修改配置文件修改/usr/local/hadoop/etc/hadoop/文件夾下的core-site.xml和hdfs-site.xml 文件
5 相關(guān)命令
$ cd /usr/local/hadoop
$ ./bin/hdfs namenode -format #格式化名稱節(jié)點 (這個命令只需只需一次)
$ ./sbin/start-dfs.sh #啟動Hadoop
$ jps #查看Hadoop是否成功啟動
$ ./sbin/stop-dfs.sh # 關(guān)閉Hadoop
$ rm -r ./tmp # 刪除 tmp 文件,注意這會刪除 HDFS中原有的所有數(shù)據(jù)
$ ./sbin/start-dfs.sh # 重啟