入門指南
1. 簡(jiǎn)介
Quickstart會(huì)讓你啟動(dòng)和運(yùn)行一個(gè)單節(jié)點(diǎn)單機(jī)HBase。
2. 快速啟動(dòng) – 單點(diǎn)HBase
這部分描述單節(jié)點(diǎn)單機(jī)HBase的配置。一個(gè)單例擁有所有的HBase守護(hù)線程—Master,RegionServers和ZooKeeper,運(yùn)行一個(gè)單獨(dú)JVM持久化到本地文件系統(tǒng)。這是我們最基礎(chǔ)的部署文檔。我們將會(huì)向你展示如何通過(guò)hbase shell CLI在HBase中創(chuàng)建一個(gè)表格,在表中插入行,執(zhí)行put和scan操作,讓表使能和啟動(dòng)和停止HBase等等操作。
除了下載HBase,這個(gè)過(guò)程大概需要不到10分鐘地時(shí)間。
HBase 0.94.x之前的版本希望回送IP地址為127.0.0.1,而UBuntu和其他發(fā)行版默認(rèn)是127.0.1.1,這將會(huì)給你造成麻煩。查看Why does HBase care about /etc/hosts?獲得更多細(xì)節(jié)
在Ubuntu上運(yùn)行0.94.x之前版本的HBase,/etc/hosts文檔應(yīng)該以下面所寫的模板來(lái)保證正常運(yùn)行
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
hbase-0.96.0版本之后的已經(jīng)修復(fù)了。
2.1. JDK 版本要求
HBase 需要安裝JDK。查看Java來(lái)獲得每個(gè)HBase版本所支持的JDK版本。
2.2. 開(kāi)始使用 HBase
過(guò)程:下載,配置,和啟動(dòng)單機(jī)模式HBase
1. 從Apache Download Mirrors列表中選一個(gè)下載節(jié)點(diǎn)。點(diǎn)擊顯示的鏈接。這將會(huì)帶你到一個(gè)HBase發(fā)布版本的鏡像。點(diǎn)擊名字為stable的文件夾然后下載文件結(jié)尾為.tar.gz的二進(jìn)制文件到你的本地文件系統(tǒng)中。不要下載文件結(jié)尾為src.tar.gz的文件。
2.提取下載文件并且將它放到新建的目錄。
$ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz
$ cd hbase-2.0.0-SNAPSHOT/
3. 你需要在啟動(dòng)HBase之前設(shè)置好JAVA_HOME環(huán)境變量,你可以通過(guò)你的操作系統(tǒng)常用方法來(lái)設(shè)置這個(gè)變量,但是HBase提供了一種中央機(jī)制,conf/hbse-env.sh。編輯這個(gè)文檔,將JAVA-HOME這一行的注釋給取消,然后將他的值設(shè)為你的操作系統(tǒng)中JAVA的安裝路徑。JAVA_HOME變量應(yīng)該設(shè)置包含可執(zhí)行文件bin/java的路徑。大多數(shù)現(xiàn)代的Linux操作系統(tǒng)提供一種機(jī)制,例如在RHEL或者CentOS是/usr/bin/alternatives,為了能夠顯示地切換Java版本。在這種情況,你可以在設(shè)置JAVA_HOME為包含bin/java符號(hào)鏈接的目錄,通常是/usr。
JAVA_HOME=/usr
4. 編輯conf/hbase-site.xml,該文檔是HBase配置文件。在這個(gè)時(shí)間點(diǎn)你只需要在本地文件系統(tǒng)中指定HBase和ZooKeeper寫數(shù)據(jù)的目錄。默認(rèn)情況下,會(huì)在/tmp目錄下創(chuàng)建一個(gè)新目錄。許多服務(wù)器會(huì)配置為一旦reboot那么會(huì)刪除/tmp目錄下的內(nèi)容,所以你應(yīng)該在別的地方存儲(chǔ)數(shù)據(jù)。接下來(lái)的配置將會(huì)存儲(chǔ)HBase的數(shù)據(jù)在hbase目錄下,放在用戶testuser的主目錄下。新安裝的HBase下?標(biāo)簽里面的內(nèi)容是空,粘貼?標(biāo)簽到?下進(jìn)行配置。
Example 1. Example?hbase-site.xml?for Standalone HBase
hbase.rootdir
file:///home/testuser/hbase
hbase.zookeeper.property.dataDir
/home/testuser/zookeeper
你不需要?jiǎng)?chuàng)建HBase數(shù)據(jù)目錄。HBase將會(huì)為你創(chuàng)建。如果你自己創(chuàng)建了,HBase將會(huì)試圖一個(gè)你并不想要的遷移。
上面例子中hbase.rootdir指向本地文件系統(tǒng)的目錄。我們用‘file:/’前綴來(lái)表示本地文件系統(tǒng)。將HBase的home目錄配置在已有的HDFS實(shí)例上,設(shè)置hbase.rootdir指向你的HDFS實(shí)例,例如hdfs://namenode.example.org:8020/hbase.關(guān)于這個(gè)變量的細(xì)節(jié),請(qǐng)查看下面在HDFS上部署單機(jī)HBase部分。
5. bin/start-hbase.sh腳本將提供一個(gè)簡(jiǎn)便的方式來(lái)啟動(dòng)HBase。發(fā)出這個(gè)命令并且運(yùn)行良好的話,一條標(biāo)準(zhǔn)的成功啟動(dòng)的信息會(huì)打印在控制臺(tái)上。你可以通過(guò)jps命令來(lái)判斷你是否已經(jīng)運(yùn)行一個(gè)HMaster進(jìn)程。在單價(jià)模式下,HBase會(huì)在這個(gè)單獨(dú)的JVM中啟動(dòng)HMater,HRegionServer和ZooKeeper守護(hù)進(jìn)程。在http://localhost:16010查看HBase WebUI?。
需要安裝Java并且使之可用。如果你已經(jīng)安裝了,但是卻報(bào)錯(cuò)提示你尚未安裝,可能安裝在一個(gè)非標(biāo)準(zhǔn)路徑下,編輯conf/hbase-env.sh并且修改JAVA_HOME,將包含bin/java的目錄賦給它
過(guò)程:首次使用HBase
1)連接HBase
使在你HBase安裝目錄下的bin/ 下用hbase shell命令行來(lái)連接HBase。在這個(gè)例子中,會(huì)打印一些你在啟動(dòng)的HBase shell用時(shí)遺漏的用法和版本信息。HBase Shell用>符號(hào)來(lái)表示結(jié)束。
$ ./bin/hbase shell
hbase(main):001:0>
2)顯示HBase幫助文本
輸出help按下Enter,顯示HBase Shell的基礎(chǔ)使用信息,以及一些示例命令。需要注意的是表名,行,列都必須用引用符號(hào)。
創(chuàng)建表
使用create命令來(lái)創(chuàng)建一個(gè)新表。你必須指定表名和列族名
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
列出表的信息
使用list命令
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
插入數(shù)據(jù)到表中
使用put命令來(lái)插入數(shù)據(jù).
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
在這里,我們每次插入一條數(shù)據(jù),總共三條。第一次將value1插入到row1,列cf:a中。HBase中的列以列族名為前綴,例子中是cf,后面是冒號(hào)和列的限定符后綴,例子中是a。
一次查看所有數(shù)據(jù)
使用scan命令行來(lái)掃描表格的數(shù)據(jù)。你可以限制你的掃描,但是此時(shí)此刻,所有的數(shù)據(jù)都被獲取了。
hbase(main):006:0> scan 'test'
ROW????????????????????????????????????? COLUMN+CELL
row1??????????????????????????????????? column=cf:a, timestamp=1421762485768, value=value1
row2??????????????????????????????????? column=cf:b, timestamp=1421762491785, value=value2
row3??????????????????????????????????? column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
獲得一行的數(shù)據(jù)
用get命令一次獲取一行數(shù)據(jù)
hbase(main):007:0> get 'test', 'row1'
COLUMN?????????????????????????????????? CELL
cf:a??????????????????????????????????? timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
禁用表
如果你想要?jiǎng)h除一個(gè)表或者改變它的配置,以及其他一些情況,你首先需要用disable命令來(lái)禁用表。
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
在啟用‘test’之后再次禁用‘test’
hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
刪除表
用drop命令來(lái)刪除表
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
退出HBase Shell
使用exit來(lái)與HBase斷開(kāi)連接,但HBase仍然在后臺(tái)運(yùn)行
過(guò)程:關(guān)閉HBase
跟bin/start-hbase.sh腳本一樣方便地啟動(dòng)HBase,用bin/stop-hbase.sh腳本來(lái)停止它。
$ ./bin/stop-hbase.sh
stopping hbase....................
$
在發(fā)出這個(gè)命令之后,將花費(fèi)幾分鐘的時(shí)間來(lái)關(guān)閉。使用jps來(lái)確保HMaster和HRegionServer已經(jīng)關(guān)閉。
上面的內(nèi)容已經(jīng)向你展示了如何啟動(dòng)和停止一個(gè)單機(jī)HBase。在下一部分我們將提供其他模式的部署。
2.3. 偽分布式本地安裝
在通過(guò)quickstart啟動(dòng)了單機(jī)模式之后,你可以重新配置來(lái)運(yùn)行偽分布式模式。偽分布式模式意味著HBase仍然運(yùn)行在一個(gè)節(jié)點(diǎn)上,但是每個(gè)HBase的守護(hù)進(jìn)程(HMaster, HRegionServer, and ZooKeeper)運(yùn)行在單獨(dú)的進(jìn)程中:在單機(jī)模式中所有的守護(hù)進(jìn)程都運(yùn)行在一個(gè)JVM實(shí)例中。默認(rèn)情況下,除非你配置像quickstart中所描述的配置?hbase.rootdir屬性,你的數(shù)據(jù)仍然存儲(chǔ)在/tmp/中。在這次演示中,我們將數(shù)據(jù)存儲(chǔ)在HDFS中,確保你HDFS是可用的。你可以跳過(guò)HDFS配置繼續(xù)將數(shù)據(jù)存儲(chǔ)在本地文件系統(tǒng)中
Hadoop配置
這個(gè)過(guò)程假設(shè)你已經(jīng)在本地系統(tǒng)或者遠(yuǎn)程系統(tǒng)中配置好Hadoop和HDFS,并且能夠運(yùn)行和確保可用。也假定你使用Hadoop2.Setting up a Single Node Cluster將引導(dǎo)如何搭建單節(jié)點(diǎn)Hadoop
1)如果HBase還在運(yùn)行請(qǐng)停止它
如果你已經(jīng)完成quickstart中的指導(dǎo)并且HBase仍然在運(yùn)行,請(qǐng)停止他。這個(gè)過(guò)程將創(chuàng)建一個(gè)新的目錄來(lái)儲(chǔ)存它的數(shù)據(jù),所以之前你創(chuàng)建的數(shù)據(jù)庫(kù)將會(huì)丟失。
2)配置HBase
編輯hbase-site.xml進(jìn)行配置. 第一,添加下面 property來(lái) 指導(dǎo) HBase運(yùn)行分布式模式, 每個(gè)守護(hù)進(jìn)程運(yùn)行在一個(gè)JVM上。
hbase.cluster.distributed
true
接下來(lái), 將hbase.rootdir由本地系統(tǒng)改為HDFS實(shí)例的地址, 使用?hdfs:////?URI 語(yǔ)法. 在這個(gè)例子當(dāng)中, HDFS 運(yùn)行在端口 8020上.
hbase.rootdir
hdfs://localhost:8020/hbase
你不需要在HDFS上創(chuàng)建一個(gè)目錄。HBase會(huì)自己創(chuàng)建。如果你自己創(chuàng)建了,HBase會(huì)試圖做一些你并不想要的遷移。
3)啟動(dòng)HBase
使用bin/start-hbase.sh命令來(lái)啟動(dòng)HBase. 如果你的系統(tǒng)配置是正確的話,使用jps命令將會(huì)看到HMaster和HRegionServer已經(jīng)運(yùn)行。
4)檢查HBase在HDFS中的目錄
如果所有都運(yùn)行正確的話,HBase將會(huì)在HDFS中創(chuàng)建它的目錄。在上面的配置中,它將存儲(chǔ)在HDFS的/hbase中。你可以在Hadoop的bin/下使用hadoop fs命令行來(lái)列出這個(gè)目錄下的所有文件。
$ ./bin/hadoop fs -ls /hbase
Found 7 items
drwxr-xr-x?? - hbase users????????? 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x?? - hbase users????????? 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x?? - hbase users????????? 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x?? - hbase users????????? 0 2014-06-25 18:58 /hbase/data
-rw-r--r--?? 3 hbase users???????? 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r--?? 3 hbase users????????? 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x?? - hbase users? ????????0 2014-06-25 21:49 /hbase/oldWALs
5)創(chuàng)建一個(gè)表格并插入數(shù)據(jù)
你可以使用HBase Shell來(lái)創(chuàng)建一個(gè)表格,插入數(shù)據(jù),掃描和獲取數(shù)據(jù),使用方法和shell exercises所展示的一樣。
6)啟動(dòng)和停止一個(gè)HMaster備用服務(wù)器
在同一個(gè)硬件環(huán)境上運(yùn)行多個(gè)HMaster實(shí)例的情況不能出現(xiàn)在生產(chǎn)環(huán)境,同樣偽分布式也是不允許的。這個(gè)步驟只適用于測(cè)試和學(xué)習(xí)
HMaster服務(wù)器控制HBase 集群。你可以啟動(dòng)9個(gè)HMaster服務(wù)器,那么10個(gè)HMaster一起執(zhí)行計(jì)算。使用local-master-backup.sh來(lái)啟動(dòng)一個(gè)HMaster備用服務(wù)器。你想要啟動(dòng)的每個(gè)備用服務(wù)器都要添加一個(gè)代表master的端口參數(shù)。每個(gè)備用HMaster使用三個(gè)端口(默認(rèn)是16010,16020,16030)端口都是以默認(rèn)默認(rèn)端口進(jìn)行偏移的,偏移量為2的話,備用HMaster的端口會(huì)是16012,16022,16032。下面的指令用來(lái)啟動(dòng)3個(gè)端口分別為16012/16022/16032、 16013/16023/16033和16015/16025/16035的HMaster。
$ ./bin/local-master-backup.sh 2 3 5
想要?dú)⒌粢粋€(gè)備用master而不是關(guān)掉整個(gè)進(jìn)程,你需要找到他的ID(PID)。PID存儲(chǔ)在一個(gè)名字為/tmp/hbase-USER-X-master.pid的文件中。該文件里面的內(nèi)容只有PID。你可以使用kill-9命令來(lái)殺掉PID。下面的命令殺掉端口為偏移量1的master,而集群仍然運(yùn)行:
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
7)啟動(dòng)和停止另外的RegionServers
HRegionServer被HMaster指導(dǎo)管理它StoreFiles里的數(shù)據(jù)。通常來(lái)說(shuō),集群中的每個(gè)節(jié)點(diǎn)都運(yùn)行一個(gè)HReigionServer,運(yùn)行多個(gè)HRegionServer在同一系統(tǒng)當(dāng)中可以用來(lái)測(cè)試偽分布式模式。使用local-regionservers.sh命令運(yùn)行多個(gè)RegionServers。跟local-master-backup.sh一樣,為每個(gè)實(shí)例提供端口偏移量。每個(gè)RegionServer需要兩個(gè)端口,默認(rèn)端口為16020和16030。然而,1.0.0版本的基本端口已經(jīng)被HMaster所使用,所以RegionServer無(wú)法使用默認(rèn)端口。所有基本端口改為16200和16300。你可以在一個(gè)服務(wù)中運(yùn)行99額外RegionServer而不是一個(gè)HMaster或者HMaster。下面的命令用來(lái)啟動(dòng)端口從16202/16302開(kāi)始連續(xù)的額外的RegionServer。
$ .bin/local-regionservers.sh start 2 3 4 5
使用local-regionservers.sh?命令 和要關(guān)閉的server的偏移量參數(shù)來(lái)手動(dòng)停止RegionServer。
$ .bin/local-regionservers.sh stop 3
8)停止 HBase
你可以使用quickstart中闡述的命令bin/stop-hbase.sh來(lái)停止HBase。
2.4. 高級(jí) – 全分布式
事實(shí)上,你需要一個(gè)全分布式的配置來(lái)測(cè)試完整的HBase并且將它用在真實(shí)世界的應(yīng)用場(chǎng)景中。在一個(gè)分布式配置中,集群包括多個(gè)節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)運(yùn)行一個(gè)或者多個(gè)HBase守護(hù)進(jìn)程。這些包括主要的和備用Master實(shí)例,多個(gè)ZooKeeper節(jié)點(diǎn)和多個(gè)RegionServer節(jié)點(diǎn)。
這個(gè)高級(jí)配置比quickstart中多添加了兩個(gè)節(jié)點(diǎn),結(jié)構(gòu)如下:
Table 1. Distributed Cluster Demo Architecture
Node Name ? ? ? ? ? ? ??Master ? ?ZooKeeper ? ?RegionServer
node-a.example.com ? ? ?yes ? ? ? ? ? ? ? ?yes ? ? ? ? ? ? ? ? ? ? ? ? ?no
node-b.example.com ? backup ? ? ? ? ? yes ? ? ? ? ? ? ? ? ? ? ? ? yes
node-c.example.com ? ? ?no ? ? ? ? ? ? ? ? yes ? ? ? ? ? ? ? ? ? ? ? ? yes
這個(gè)快速啟動(dòng)設(shè)定每個(gè)節(jié)點(diǎn)都是一個(gè)虛擬機(jī)而且他們?cè)谕瑯拥木W(wǎng)絡(luò)上。它搭建在之前的quickstart和Pseudo-Distributed Local Install之上,設(shè)定你之前配置系統(tǒng)為node-a。在繼續(xù)操作之前請(qǐng)停止HBase。
防火墻也應(yīng)該關(guān)閉確保所節(jié)點(diǎn)都能夠互相通信。如果你看到no route to host的報(bào)錯(cuò),檢查你的防火墻。
過(guò)程:配置無(wú)密鑰SSH登陸
node-a?需要登錄到node-b和node-c來(lái)啟動(dòng)守護(hù)進(jìn)程。最簡(jiǎn)單的實(shí)現(xiàn)方法是在所有的主機(jī)上使用相同用戶名,配置無(wú)密鑰SSH登陸。
1)在?node-a上生成密鑰對(duì)
登陸那個(gè)要運(yùn)行HBase的用戶,使用下面命令生成一個(gè)SSH密鑰對(duì):
$ ssh-keygen -t rsa
如果該命令成功執(zhí)行,那么密鑰對(duì)的路徑就會(huì)打印到標(biāo)準(zhǔn)輸出。公鑰的默認(rèn)名字為id_rsa.pub。
2)在其他節(jié)點(diǎn)創(chuàng)建用來(lái)儲(chǔ)存密鑰的路徑。
在node-b和node-c,登陸HBase用戶并且在用戶的home目錄下創(chuàng)建.ssh/目錄,如果該目錄不存在的話。如果已經(jīng)存在,要意識(shí)到他可能已經(jīng)包含其他密鑰了。
3)復(fù)制密鑰到其他節(jié)點(diǎn)
使用scp或者其他安全的方式將密鑰安全地從node-a復(fù)制到其他每個(gè)節(jié)點(diǎn)上。每個(gè)節(jié)點(diǎn)上如果不存在.ssh/authorized_keys這個(gè)文件的話,那么創(chuàng)建一個(gè),然后將id_rsa.pub文件的內(nèi)容添加到該文件末端。需要說(shuō)明的是你需要在node-a做同樣的操作。
$ cat id_rsa.pub >> ~/.ssh/authorized_keys
4)測(cè)試無(wú)密鑰登陸.
如果一切運(yùn)行順利的話,那么你可以使用SSH用相同的用戶名而不需要密鑰的情況下登陸其他節(jié)點(diǎn)。
5)因?yàn)閚ode-b將會(huì)運(yùn)行一個(gè)備用Master,重復(fù)上述的過(guò)程,將能看到的node-a都換成node-b。確保不要覆蓋已經(jīng)存在的.ssh/authorized_keys的文檔,但可以用>>符號(hào)將密鑰追加到已存在的文檔后面。
過(guò)程:預(yù)備node-a
node-a將會(huì)運(yùn)行主master和ZooKeeper進(jìn)程,但是沒(méi)有RegionServers。在node-a將RegionServer停掉。
1)編輯conf/regionservers和移除包含localhost的那一行。添加node-b和node-c的主機(jī)名和IP地址。
盡管你想要在node-a運(yùn)行一個(gè)RegionServer,你應(yīng)該給他指定一個(gè)主機(jī)名便于其他服務(wù)可以和它通訊。在這個(gè)例子當(dāng)中,主機(jī)名為node-a.example.com。這使得你可以分布配置到集群每個(gè)節(jié)點(diǎn)來(lái)避免主機(jī)名沖突。保存文檔。
2)將node-b配置為一個(gè)備用master。
所以在conf/目錄下創(chuàng)建一個(gè)名為backup-master的新文件,然后添加一行node-b的主機(jī)名。在這個(gè)示例當(dāng)中,主機(jī)名為node-b.example.com
3)配置ZooKeeper
事實(shí)上,你應(yīng)該認(rèn)真的配置你的ZooKeeper。你可以在zookeeper找到更多關(guān)于ZooKeeper的細(xì)節(jié)。這個(gè)配置會(huì)指導(dǎo)HBase的啟動(dòng)和管理集群的每個(gè)節(jié)點(diǎn)中的ZooKeeper實(shí)例。
On?node-a, editconf/hbase-site.xmland add the following properties.
hbase.zookeeper.quorum
node-a.example.com,node-b.example.com,node-c.example.com
hbase.zookeeper.property.dataDir
/usr/local/zookeeper
4)在你的配置中把node-a配置為主機(jī)的地方改變指向主機(jī)名的引用以致其他節(jié)點(diǎn)可以使用它來(lái)代表node-a。在這個(gè)示例當(dāng)中,主機(jī)名是node-a.example.com。
過(guò)程:預(yù)備node-b和node-c
node-b?將會(huì)運(yùn)行一個(gè)備用master 服務(wù)器和一個(gè)ZooKeeper 實(shí)例.
1)下載和解壓HBase.
在node-b下下載和解壓HBase,跟你在quickstart和偽分布式中所做的一樣。
2)從node-a復(fù)制配置信息到node-b和node-c
集群中的每個(gè)節(jié)點(diǎn)需要相同的配置信息。復(fù)制conf/下的內(nèi)容到node-b和node-c下conf/。
過(guò)程:啟動(dòng)和測(cè)試你的集群
1)確保任何節(jié)點(diǎn)上沒(méi)有運(yùn)行HBase
如果你在之前測(cè)試中忘記停止HBase,就會(huì)出錯(cuò)。用jps命令行檢查HBase是否運(yùn)行??纯碒Master,HRegionServer和HQuorumPeer是否存在,如果存在,那么殺掉。
2)啟動(dòng)集群
在node-a上,運(yùn)行start-hbase.sh命令。就會(huì)打出類似下面的輸出:
$ bin/start-hbase.sh
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
先啟動(dòng)ZooKeeper,接著master,然后RegionServers,最后是備用masters。
3)檢查進(jìn)程是否運(yùn)行
在集群中的每個(gè)節(jié)點(diǎn),運(yùn)行jps命令檢查每個(gè)服務(wù)是否正常運(yùn)行。你可能會(huì)看到其他用于其他目的Java進(jìn)程也運(yùn)行著。
Example 2.node-ajpsOutput
$ jps
20355 Jps
20071 HQuorumPeer
20137 HMaster
Example 3.node-bjpsOutput
$ jps
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMaster
Example 4.node-ajpsOutput
$ jps
13901 Jps
13639 HQuorumPeer
13737 HRegionServer
ZooKeeper進(jìn)程名字
HQuorumPeer?進(jìn)程就是ZooKeeper實(shí)例由HBase啟動(dòng)用來(lái)控制HBase的。如果你在這里使用ZooKeeper,那么會(huì)限制集群中每個(gè)節(jié)點(diǎn)有一個(gè)實(shí)例并且只適用于測(cè)試。如果ZooKeeper運(yùn)行在HBase之外,那么進(jìn)程名為QuorumPeer。請(qǐng)到zookeeper查看更多關(guān)于ZooKeeper配置包括如果用外部ZooKeeper控制HBase。
4)瀏覽Web
Web訪問(wèn)端口改變
如果HBase的版本高于0.98.x,那么登陸master的端口由60010改為16010,登陸RegionServer的端口由60030改為16030。
如果配置都正確的話,你應(yīng)該能夠使用瀏覽器通過(guò)http://node-a.example.com:16010/連接Master,通過(guò)http://node-b.example.com:16010/連接備用Master。如果你只能通過(guò)本地主機(jī)登陸而其他主機(jī)不能,檢查你的防火墻規(guī)則。你可以通過(guò)ip:16030來(lái)連接RegionServers,也可以在Master的Web界面中點(diǎn)擊相關(guān)鏈接來(lái)登陸。
5)當(dāng)節(jié)點(diǎn)或者服務(wù)消失時(shí)測(cè)試一下發(fā)生了什么
正如你配置的三個(gè)節(jié)點(diǎn),事情并不總是如你所想。你可以通過(guò)殺死進(jìn)程觀察log來(lái)看看當(dāng)主Master或者RegionServer消失時(shí)發(fā)生了什么?
下面是原文
Getting Started
1. Introduction
Quickstartwill get you up and running on a single-node, standalone instance of HBase.
2. Quick Start - Standalone HBase
This section describes the setup of a single-node standalone HBase. Astandaloneinstance has all HBase daemons?—?the Master, RegionServers, and ZooKeeper?—?running in a single JVM persisting to the local filesystem. It is our most basic deploy profile. We will show you how to create a table in HBase using thehbase shellCLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase.
Apart from downloading HBase, this procedure should take less than 10 minutes.
Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you. SeeWhy does HBase care about /etc/hosts?for detail
The following/etc/hostsfile works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble.
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
This issue has been fixed in hbase-0.96.0 and beyond.
2.1. JDK Version Requirements
HBase requires that a JDK be installed. SeeJavafor information about supported JDK versions.
2.2. Get Started with HBase
Procedure: Download, Configure, and Start HBase in Standalone Mode
Choose a download site from this list ofApache Download Mirrors. Click on the suggested top link. This will take you to a mirror ofHBase Releases. Click on the folder namedstableand then download the binary file that ends in.tar.gzto your local filesystem. Do not download the file ending insrc.tar.gzfor now.
Extract the downloaded file, and change to the newly-created directory.
$ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz
$ cd hbase-2.0.0-SNAPSHOT/
You are required to set theJAVA_HOMEenvironment variable before starting HBase. You can set the variable via your operating system’s usual mechanism, but HBase provides a central mechanism,conf/hbase-env.sh. Edit this file, uncomment the line starting withJAVA_HOME, and set it to the appropriate location for your operating system. TheJAVA_HOMEvariable should be set to a directory which contains the executable filebin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can setJAVA_HOMEto the directory containing the symbolic link tobin/java, which is usually/usr.
JAVA_HOME=/usr
Editconf/hbase-site.xml, which is the main HBase configuration file. At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of/tmpupon reboot, so you should store the data elsewhere. The following configuration will store HBase’s data in thehbasedirectory, in the home directory of the user calledtestuser. Paste thetags beneath thetags, which should be empty in a new HBase install.
Example 1. Examplehbase-site.xmlfor Standalone HBase
hbase.rootdirfile:///home/testuser/hbasehbase.zookeeper.property.dataDir/home/testuser/zookeeper
You do not need to create the HBase data directory. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
Thehbase.rootdirin the above example points to a directory in thelocal filesystem. The 'file:/' prefix is how we denote local filesystem. To home HBase on an existing instance of HDFS, set thehbase.rootdirto point at a directory up on your instance: e.g.hdfs://namenode.example.org:8020/hbase. For more on this variant, see the section below on Standalone HBase over HDFS.
Thebin/start-hbase.shscript is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use thejpscommand to verify that you have one running process calledHMaster. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon. Go tohttp://localhost:16010to view the HBase Web UI.
Java needs to be installed and available. If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit theconf/hbase-env.shfile and modify theJAVA_HOMEsetting to point to the directory that containsbin/javayour system.
Procedure: Use HBase For the First Time
Connect to HBase.
Connect to your running instance of HBase using thehbase shellcommand, located in thebin/directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a>character.
$ ./bin/hbase shell
hbase(main):001:0>
Display HBase Shell Help Text.
Typehelpand press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.
Create a table.
Use thecreatecommand to create a new table. You must specify the table name and the ColumnFamily name.
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
List Information About your Table
Use thelistcommand to
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
Put data into your table.
To put data into your table, use theputcommand.
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
Here, we insert three values, one at a time. The first insert is atrow1, columncf:a, with a value ofvalue1. Columns in HBase are comprised of a column family prefix,cfin this example, followed by a colon and then a column qualifier suffix,ain this case.
Scan the table for all data at once.
One of the ways to get data from HBase is to scan. Use thescancommand to scan the table for data. You can limit your scan, but for now, all data is fetched.
hbase(main):006:0> scan 'test'
ROW? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? COLUMN+CELL
row1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? column=cf:a, timestamp=1421762485768, value=value1
row2? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? column=cf:b, timestamp=1421762491785, value=value2
row3? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
Get a single row of data.
To get a single row of data at a time, use thegetcommand.
hbase(main):007:0> get 'test', 'row1'
COLUMN? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? CELL
cf:a? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
Disable a table.
If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using thedisablecommand. You can re-enable it using theenablecommand.
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
Disable the table again if you tested theenablecommand above:
hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
Drop the table.
To drop (delete) a table, use thedropcommand.
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
Exit the HBase Shell.
To exit the HBase Shell and disconnect from your cluster, use thequitcommand. HBase is still running in the background.
Procedure: Stop HBase
In the same way that thebin/start-hbase.shscript is provided to conveniently start all HBase daemons, thebin/stop-hbase.shscript stops them.
$ ./bin/stop-hbase.sh
stopping hbase....................
$
After issuing the command, it can take several minutes for the processes to shut down. Use thejpsto be sure that the HMaster and HRegionServer processes are shut down.
The above has shown you how to start and stop a standalone instance of HBase. In the next sections we give a quick overview of other modes of hbase deploy.
2.3. Pseudo-Distributed Local Install
After working your way throughquickstartstandalone mode, you can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and ZooKeeper) runs as a separate process: in standalone mode all daemons ran in one jvm process/instance. By default, unless you configure thehbase.rootdirproperty as described inquickstart, your data is still stored in/tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to continue storing your data in the local filesystem.
Hadoop Configuration
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. The guide onSetting up a Single Node Clusterin the Hadoop documentation is a good starting point.
Stop HBase if it is running.
If you have just finishedquickstartand HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.
Configure HBase.
Edit thehbase-site.xmlconfiguration. First, add the following property. which directs HBase to run in distributed mode, with one JVM instance per daemon.
hbase.cluster.distributedtrue
Next, change thehbase.rootdirfrom the local filesystem to the address of your HDFS instance, using thehdfs:////URI syntax. In this example, HDFS is running on the localhost at port 8020.
hbase.rootdirhdfs://localhost:8020/hbase
You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
Start HBase.
Use thebin/start-hbase.shcommand to start HBase. If your system is configured correctly, thejpscommand should show the HMaster and HRegionServer processes running.
Check the HBase directory in HDFS.
If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in/hbase/on HDFS. You can use thehadoop fscommand in Hadoop’sbin/directory to list this directory.
$ ./bin/hadoop fs -ls /hbase
Found 7 items
drwxr-xr-x? - hbase users? ? ? ? ? 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x? - hbase users? ? ? ? ? 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x? - hbase users? ? ? ? ? 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x? - hbase users? ? ? ? ? 0 2014-06-25 18:58 /hbase/data
-rw-r--r--? 3 hbase users? ? ? ? 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r--? 3 hbase users? ? ? ? ? 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x? - hbase users? ? ? ? ? 0 2014-06-25 21:49 /hbase/oldWALs
Create a table and populate it with data.
You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as inshell exercises.
Start and stop a backup HBase Master (HMaster) server.
Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only.
The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use thelocal-master-backup.sh. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.
$ ./bin/local-master-backup.sh 2 3 5
To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like/tmp/hbase-USER-X-master.pid. The only contents of the file is the PID. You can use thekill -9command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
Start and stop additional RegionServers
The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. Thelocal-regionservers.shcommand allows you to run multiple RegionServers. It works in a similar way to thelocal-master-backup.shcommand, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).
$ .bin/local-regionservers.sh start 2 3 4 5
To stop a RegionServer manually, use thelocal-regionservers.shcommand with thestopparameter and the offset of the server to stop.
$ .bin/local-regionservers.sh stop 3
Stop HBase.
You can stop HBase the same way as in thequickstartprocedure, using thebin/stop-hbase.shcommand.
2.4. Advanced - Fully Distributed
In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple ZooKeeper nodes, and multiple RegionServer nodes.
This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:
Table 1. Distributed Cluster Demo Architecture
Node NameMasterZooKeeperRegionServer
node-a.example.com
yes
yes
no
node-b.example.com
backup
yes
yes
node-c.example.com
no
yes
yes
This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart,Pseudo-Distributed Local Install, assuming that the system you configured in that procedure is nownode-a. Stop HBase onnode-abefore continuing.
Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors likeno route to host, check your firewall.
Procedure: Configure Passwordless SSH Access
node-aneeds to be able to log intonode-bandnode-c(and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login fromnode-ato each of the others.
Onnode-a, generate a key pair.
While logged in as the user who will run HBase, generate a SSH key pair, using the following command:
$ ssh-keygen -t rsa
If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key isid_rsa.pub.
Create the directory that will hold the shared keys on the other nodes.
Onnode-bandnode-c, log in as the HBase user and create a.ssh/directory in the user’s home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.
Copy the public key to the other nodes.
Securely copy the public key fromnode-ato each of the nodes, by using thescpor some other secure means. On each of the other nodes, create a new file called.ssh/authorized_keysif it does not already exist, and append the contents of theid_rsa.pubfile to the end of it. Note that you also need to do this fornode-aitself.
$ cat id_rsa.pub >> ~/.ssh/authorized_keys
Test password-less login.
If you performed the procedure correctly, if you SSH fromnode-ato either of the other nodes, using the same username, you should not be prompted for a password.
Sincenode-bwill run a backup Master, repeat the procedure above, substitutingnode-beverywhere you seenode-a. Be sure not to overwrite your existing.ssh/authorized_keysfiles, but concatenate the new key onto the existing file using the>>operator rather than the>operator.
Procedure: Preparenode-a
node-awill run your primary master and ZooKeeper processes, but no RegionServers. . Stop the RegionServer from starting onnode-a.
Editconf/regionserversand remove the line which containslocalhost. Add lines with the hostnames or IP addresses fornode-bandnode-c.
Even if you did want to run a RegionServer onnode-a, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would benode-a.example.com. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.
Configure HBase to usenode-bas a backup master.
Create a new file inconf/calledbackup-masters, and add a new line to it with the hostname fornode-b. In this demonstration, the hostname isnode-b.example.com.
Configure ZooKeeper
In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper inzookeeper. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.
Onnode-a, editconf/hbase-site.xmland add the following properties.
hbase.zookeeper.quorumnode-a.example.com,node-b.example.com,node-c.example.comhbase.zookeeper.property.dataDir/usr/local/zookeeper
Everywhere in your configuration that you have referred tonode-aaslocalhost, change the reference to point to the hostname that the other nodes will use to refer tonode-a. In these examples, the hostname isnode-a.example.com.
Procedure: Preparenode-bandnode-c
node-bwill run a backup master server and a ZooKeeper instance.
Download and unpack HBase.
Download and unpack HBase tonode-b, just as you did for the standalone and pseudo-distributed quickstarts.
Copy the configuration files fromnode-atonode-b.andnode-c.
Each node of your cluster needs to have the same configuration information. Copy the contents of theconf/directory to theconf/directory onnode-bandnode-c.
Procedure: Start and Test Your Cluster
Be sure HBase is not running on any node.
If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using thejpscommand. Look for the processesHMaster,HRegionServer, andHQuorumPeer. If they exist, kill them.
Start the cluster.
Onnode-a, issue thestart-hbase.shcommand. Your output will be similar to that below.
$ bin/start-hbase.sh
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.
Verify that the processes are running.
On each node of the cluster, run thejpscommand and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.
Example 2.node-ajpsOutput
$ jps
20355 Jps
20071 HQuorumPeer
20137 HMaster
Example 3.node-bjpsOutput
$ jps
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMaster
Example 4.node-ajpsOutput
$ jps
13901 Jps
13639 HQuorumPeer
13737 HRegionServer
ZooKeeper Process Name
TheHQuorumPeerprocess is a ZooKeeper instance which is controlled and started by HBase. If you use ZooKeeper this way, it is limited to one instance per cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of HBase, the process is calledQuorumPeer. For more about ZooKeeper configuration, including using an external ZooKeeper instance with HBase, seezookeeper.
Browse to the Web UI.
Web UI Port Changes
Web UI Port Changes
In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the RegionServer.
If everything is set up correctly, you should be able to connect to the UI for the Masterhttp://node-a.example.com:16010/or the secondary master athttp://node-b.example.com:16010/for the secondary master, using a web browser. If you can connect vialocalhostbut not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by clicking their links in the web UI for the Master.
Test what happens when nodes or services disappear.
With a three-node cluster like you have configured, things will not be very resilient. Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.