HBase 簡介

https://hbase.apache.org/
HBase, Hadoop Database,是一個高可靠性、高性能、面向列、可伸縮、 實時讀寫的分布式開源 NoSQL 數(shù)據(jù)庫,面向列存儲。主要用來存儲非結構化和半結構化的松散數(shù)據(jù)。
HBase 的設計思想,來源于 Fay Chang所撰寫的Google論文 “Bigtable:一個結構化數(shù)據(jù)的分布式存儲系統(tǒng)”。HBase 使用Hadoop HDFS作為其文件存儲系統(tǒng),利用Hadoop MapReduce來處理HBase中的海量數(shù)據(jù)。HBase 在 Hadoop 和 HDFS 之上提供類似 Bigtable 的功能。
HBase is a distributed, column oriented open source?database. This technology comes from the Google paper “BigTable: a distributed storage system of structured data” written by Fay Chang. Just as BigTable makes use of the distributed data storage provided by Google file system, HBase provides capabilities similar to BigTable on Hadoop. HBase is a subproject of Apache’s Hadoop project. HBase is different from the general relational database. It is a database suitable for unstructured data storage. Another difference is that HBase is column based rather than row based.
HBase 特性
HBase 是一種“NoSQL”數(shù)據(jù)庫。 “NoSQL”是一個通用術語,意思是數(shù)據(jù)庫不是支持 SQL 作為其主要訪問語言的 RDBMS,但是 NoSQL 數(shù)據(jù)庫的類型很多:BerkeleyDB 是本地 NoSQL 數(shù)據(jù)庫的一個例子,其中 HBase 非常分布式數(shù)據(jù)庫。
HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn’t an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, where as HBase is very much a distributed database.?
從技術上講,HBase 更像是一個“數(shù)據(jù)存儲”而不是“數(shù)據(jù)庫”,因為它缺少您在 RDBMS 中可以找到的許多功能,例如類型化列、二級索引、觸發(fā)器和高級查詢語言等。
Technically speaking, HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
然而,HBase 有許多特性,支持線性和模塊化擴展。 HBase 集群通過添加托管在商品類服務器上的 RegionServer 進行擴展。例如,如果一個集群從 10 臺 RegionServers 擴展到 20 臺,它的存儲和處理能力都會增加一倍。 RDBMS 可以很好地擴展,但只能達到一個點——特別是單個數(shù)據(jù)庫服務器的大小——并且為了獲得最佳性能,需要專門的硬件和存儲設備。值得注意的 HBase 特性是:
1.強一致性讀/寫:HBase 不是“最終一致性”數(shù)據(jù)存儲。 這使得它非常適合于諸如高速計數(shù)器聚合之類的任務。
2.自動分片:HBase 表通過區(qū)域分布在集群上,區(qū)域會隨著數(shù)據(jù)的增長自動拆分和重新分布。
3.自動 RegionServer 故障轉移
4.Hadoop/HDFS 集成:HBase 支持開箱即用的 HDFS 作為其分布式文件系統(tǒng)。
5.MapReduce:HBase 支持通過 MapReduce 進行大規(guī)模并行處理,將 HBase 用作源和接收器。
6.Java 客戶端 API:HBase 支持易于使用的 Java API 進行編程訪問。
7.Thrift/REST API:HBase 還支持非 Java 前端的 Thrift 和 REST。
8.塊緩存和布隆過濾器:HBase 支持用于大容量查詢優(yōu)化的塊緩存和布隆過濾器。
9.運維管理:HBase 為運維洞察和 JMX 指標提供內置網(wǎng)頁。
However, HBase has many features which supports both linear and modular scaling. HBase clusters expand by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20 RegionServers, for example, it doubles both in terms of storage and as well as processing capacity. An RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best performance requires specialized hardware and storage devices. HBase features of note are:
1.Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This makes it very suitable for tasks such as high-speed counter aggregation.
2.Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are automatically split and re-distributed as your data grows.
3.Automatic RegionServer failover
4.Hadoop/HDFS Integration: HBase supports HDFS out of the box as its distributed file system.
5.MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both source and sink.
6.Java Client API: HBase supports an easy to use Java API for programmatic access.
7.Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.
8.Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.
9.Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.

HBase 在整個Hadoop生態(tài)圈中如下:

hadoop所有應用都是構建于hdfs(它提供高可靠的底層存儲支持,幾乎已經(jīng)成為分布式文件存儲系統(tǒng)事實上的工業(yè)標準)之上的分布式列存儲系統(tǒng),主要用于海量結構化數(shù)據(jù)存儲。通過Hadoop生態(tài)圈,可以看到HBase的身影,可見HBase在Hadoop的生態(tài)圈是扮演這一個重要的角色,那就是:? 實時、分布式、高維數(shù)據(jù) 的數(shù)據(jù)存儲;

BigTable 簡介
Bigtable是一個分布式存儲系統(tǒng),為了解決Google的結構化數(shù)據(jù)的管理問題??蓴U展,數(shù)據(jù)量級在PB級,集群機器臺數(shù)達數(shù)千臺。Bigtable實現(xiàn)了幾個目標:廣泛應用、可擴展、高性能和高可用。
BigTable 使用一個類似B+樹的數(shù)據(jù)結構存儲片的位置信息。

第一層,Chubby file。這一層是一個Chubby文件,它保存著root tablet的位置。這個Chubby文件屬于Chubby服務的一部分,一旦Chubby不可用,就意味著丟失了root tablet的位置,整個Bigtable也就不可用了。
第二層,root tablet。root tablet其實是元數(shù)據(jù)表(METADATA table)的第一個分片,它保存著元數(shù)據(jù)表其它片的位置。root tablet很特別,為了保證樹的深度不變,root tablet從不分裂。
第三層,其他元數(shù)據(jù)片,它們和root tablet一起組成完整的元數(shù)據(jù)表。每個元數(shù)據(jù)片都包含了許多用戶片的位置信息。
可以看出整個定位系統(tǒng)其實只是兩部分,一個Chubby文件,一個元數(shù)據(jù)表。
注意元數(shù)據(jù)表雖然特殊,但也仍然服從前文的數(shù)據(jù)模型,每個分片也都是由專門的片服務器負責,這就是不需要主服務器提供位置信息的原因。
客戶端會緩存片的位置信息,如果在緩存里找不到一個片的位置信息,就需要查找這個三層結構了,包括訪問一次Chubby服務,訪問兩次片服務器。
關于BigTable的詳細內容,將會在另外一篇文章中單獨講。
分布式 HBase

https://www.scaleyourapp.com/what-database-does-facebook-use-a-1000-feet-deep-dive/
HBase使用Zookeeper作為其分布式協(xié)同服務。
Zookeeper 為 HBase 集群提供協(xié)調服務,它管理著HMaster和HRegionServer的狀態(tài)(available/alive等),并且會在它們宕機時通知給HMaster,從而HMaster可以實現(xiàn)HMaster之間的failover(故障轉移),或對宕機的HRegionServer中的HRegion集合的修復(將它們分配給其他的HRegionServer)。ZooKeeper集群本身使用一致性協(xié)議(PAXOS協(xié)議)保證每個節(jié)點狀態(tài)的一致性。

Tables are automatically partitioned horizontally by HBase into regions. Each region comprises a subset of a table’s rows, usually a range of sorted row keys.
Initially, a table comprises a single region, but as the region grows it eventually crosses a configurable size threshold, at which point it splits at a row boundary into two new regions of approximately equal size. Until this first split happens, all loading will be against the single server hosting the original region.
Regions are the units that get distributed over an HBase cluster.

https://jheck.gitbook.io/hadoop/data-storage
HBase適用場景
1.并發(fā)、簡單、隨機查詢。
(注:HBase不太擅長復雜join查詢,但可以通過二級索引即全局索引的方式來優(yōu)化性能)
2.半結構化、非結構化數(shù)據(jù)存儲。
一般我們從數(shù)倉中離線統(tǒng)計分析海量數(shù)據(jù),將得到的結果插入HBase中用于實時查詢。
HBase 數(shù)據(jù)模型 (HBase Data Model)
Here we have a?table?that consists of?cells?organized by?row keys?and?column families. Sometimes, a column family (CF) has a number of?column qualifiers?to help better organize data within a CF.
A?cell?contains a?value?and a?timestamp. And a?column?is a collection of cells under a common column qualifier and a common CF.
Within a table,?data is partitioned by 1-column row key?in lexicographical?order, where topically related data is stored close together to maximize performance. The design of the row key is crucial and has to be thoroughly thought through in the algorithm written by the developer to ensure efficient data lookups.

https://www.scnsoft.com/blog/cassandra-vs-hbase
HBase Data Model Introduction
HBase stores data in the form of tables. A table consists of rows and columns.
The column is divided into several column families, as shown in the following figure.

1.Table:
HBase will organize data into tables, but it should be noted that the table name must be a legal name that can be used in the file path, because the table of HBase is mapped to the file above HDFS.
2.Row:
in the table, each row represents a data object. Each row is uniquely identified by a row key. The row key has no specific data type and is stored in binary bytes.
3.Column family:
when defining the HBase table, you need to set the column cluster in advance. All columns in the table need to be organized in the column cluster. Once the column cluster is determined, it cannot be easily modified because it will affect the real physical storage structure of HBase. However, the column qualifier and its corresponding values in the column cluster can be dynamically added or deleted. Each row in the table has the same column cluster, but it is not necessary to have consistent column qualifier and value in the column cluster of each row, so it is a sparse table structure.
4.Column qualifier:
the data in the column cluster is mapped through the column identifier. In fact, the concept of “column” can not be rigidly adhered to here, but can also be understood as a key value pair. Column qualifier is the key. The column ID also has no specific data type and is stored in binary bytes.
5.Cell:
each row key, column cluster and column ID form a cell. The data stored in the cell is called cell data. Cell and cell data have no specific data type and are stored in binary bytes.
6.Timestamp:
by default, the data in each cell is inserted with a timestamp to identify the version. When reading cell data, if the timestamp is not specified, the latest data will be returned by default. When writing new cell data, if no timestamp is set, the current time is used by default. The version number of cell data of each column cluster is maintained separately by HBase. By default, HBase retains three versions of data.
https://developpaper.com/hbase-learning-1-basic-introduction/
HBase table
The HBase table is shown in the following figure:

HBase is not a relational database and requires a different approach to modeling your data. HBase actually defines a four-dimensional data model and the following four coordinates define each cell (see Figure.):
1.Row Key: Each row has a unique row key; the row key does not have a data type and is treated internally as a byte array.
2.Column Family: Data inside a row is organized into column families; each row has the same set of column families, but across rows, the same column families do not need the same column qualifiers. Under-the-hood, HBase stores column families in their own data files, so they need to be defined upfront, and changes to column families are difficult to make.
3.Column Qualifier: Column families define actual columns, which are called column qualifiers. You can think of column qualifiers as the columns themselves.
4.Version: Each column can have a configurable number of versions, and you can access the data for a specific version of a column qualifier.

HBase as a Key/Value Store: the key is the row key we have been talking about, and the value is the collection of column families (that have their associated columns that have versions of the data).

示例
以一個公司員工表為案例來講解,此表中包含員工基本信息(員工姓名、年齡),員工詳細信息(工資、角色),以及時間戳。
1. ImployeeBasicInfoCLF ,員工基本信息列族:姓名、年齡。
2. DetailInfoCLF ,詳細信息列族:薪水、角色。
整體表結構如下:

如上,每一行有一個RowKey用于唯一地標識和定位行,各行數(shù)據(jù)按RowKey的字典序排列,列族下又有多個具體列。
Row Key:
決定一行數(shù)據(jù)的唯一標識
RowKey是按照字典順序排序的。
Row key最多只能存儲64k的字節(jié)數(shù)據(jù)。
Column Family列族(CF1、CF2、CF3) & qualifier列:
HBase表中的每個列都歸屬于某個列族,列族必須作為表模式(schema) 定義的一部分預先給出。如create ‘test’, ‘course’;
列名以列族作為前綴,每個“列族”都可以有多個列成員(column,每個列族中可以存放幾千~上千萬個列);如 CF1:q1, CF2:qw,新的列族成員(列)可以隨后按需、動態(tài)加入,F(xiàn)amily下面可以有多個Qualifier,所以可以簡單的理解為,HBase中的列是二級列,也就是說Family是第一級列,Qualifier是第二級列。兩個是父子關系。
權限控制、存儲以及調優(yōu)都是在列族層面進行的;
HBase把同一列族里面的數(shù)據(jù)存儲在同一目錄下,由幾個文件保存。
目前為止HBase的列族能能夠很好處理最多不超過3個列族。
Timestamp時間戳:
在HBase每個cell存儲單元對同一份數(shù)據(jù)有多個版本,根據(jù)唯一的時間戳來區(qū)分每個版本之間的差異,不同版本的數(shù)據(jù)按照時間倒序排序,最新的數(shù)據(jù)版本排在最前面。
時間戳的類型是64位整型。
時間戳可以由HBase(在數(shù)據(jù)寫入時自動)賦值,此時時間戳是精確到毫 秒的當前系統(tǒng)時間。
時間戳也可以由客戶顯式賦值,如果應用程序要避免數(shù)據(jù)版本沖突, 就必須自己生成具有唯一性的時間戳。
Cell單元格:
由行和列的坐標交叉決定;
單元格是有版本的(由時間戳來作為版本);
單元格的內容是未解析的字節(jié)數(shù)組(Byte[]),cell中的數(shù)據(jù)是沒有類型的,全部是字節(jié)碼形式存貯。由?
{row key,column(=<family> +<qualifier>),version}?
唯一確定的單元。
HBase 數(shù)據(jù)模型術語說明
HBase 的數(shù)據(jù)模型是分布式的、多維的、持久的,并且是一個按列鍵、行鍵和時間戳索引的排序放大器,這也是 Apache HBase 也被稱為鍵值存儲系統(tǒng)的原因。
以下是 Apache HBase 中使用的數(shù)據(jù)模型術語。
1. 表
Apache HBase 將數(shù)據(jù)組織成表,表由字符組成,易于與文件系統(tǒng)一起使用。
2. 行
Apache HBase 基于行存儲其數(shù)據(jù),每一行都有其唯一的行鍵。行鍵表示為字節(jié)數(shù)組。
3. 列族
列族用于存儲行,它還提供了在 Apache HBase 中存儲數(shù)據(jù)的結構。它由字符和字符串組成,可以與文件系統(tǒng)路徑一起使用。表中的每一行都將具有相同的列族,但一行不需要存儲在其所有列族中。
4. 列限定符
列限定符用于指向存儲在列族中的數(shù)據(jù)。它始終表示為一個字節(jié)。
5. Cell
單元格是列族、行鍵、列限定符的組合,一般稱為單元格的值。
6. 時間戳
存儲在單元中的值是版本化的,每個版本都由在創(chuàng)建期間分配的版本號標識。如果我們在寫入數(shù)據(jù)時不提及時間戳,則考慮當前時間。
Apache HBase 中的示例表應如下所示。

上表有兩個列族,分別命名為 Personal 和 Office。兩個列族都有兩列。數(shù)據(jù)存儲在單元格中,行按行鍵排序。
The following are the Data model terminology used in Apache HBase.
1. Table
Apache HBase organizes data into tables which are composed of character and easy to use with the file system.
2. Row
Apache HBase stores its data based on rows and each row has its unique row key. The row key is represented as a byte array.
3. Column Family
The column families are used to store the rows and it also provides the structure to store data in Apache HBase. It is composed of characters and strings and can be used with a file system path. Each row in the table will have the same columns family but a row doesn't need to be stored in all of its column family.
4. Column Qualifier
A column qualifier is used to point to the data that is stored in a column family. It is always represented as a byte.
5. Cell
The cell is the combination of the column family, row key, column qualifier, and generally, it is called a cell's value.
6. Timestamp
The value which is stored in the cell are versioned and each version is identified by a version number that is assigned during creation time. In case if we don't mention timestamp while writing data then the current time is considered.
HBase 數(shù)據(jù)類型
在 Apache HBase 中,沒有這樣的數(shù)據(jù)類型概念。都是字節(jié)數(shù)組。它是一種字節(jié)輸入和字節(jié)輸出數(shù)據(jù)庫,其中,當插入一個值時,使用 Put 和 Result 接口將其轉換為字節(jié)數(shù)組。Apache HBase 使用序列化框架將用戶數(shù)據(jù)轉換為字節(jié)數(shù)組。
我們可以在 Apache HBase 單元中存儲最多 10 到 15 MB 的值。如果該值更高,我們可以將其存儲在 Hadoop HDFS 中,并將文件路徑元數(shù)據(jù)信息存儲在 Apache HBase 中。
HBase 數(shù)據(jù)存儲
以下是 Apache HBase 的概念和物理視圖。
1. 概念視圖
我們可以看到一個表在概念層面被視為一組行。
以下是HBase中數(shù)據(jù)如何存儲的概念圖

2. 實物視圖
物理視圖表由列族物理存儲。
以下示例表示將存儲為基于列族的表的表。


命名空間
命名空間是表的邏輯分組。它類似于組相關表中的關系數(shù)據(jù)庫。
讓我們看看命名空間的表示。

現(xiàn)在讓我們看看命名空間的每個組件。
1. 表
所有表都是命名空間的一部分。如果沒有定義命名空間,那么該表將被分配到默認命名空間。
2. RegionServer 組
命名空間可以有一個默認的 RegionServer 組。在這種情況下,創(chuàng)建的表將成為 RegionServer 的成員。
3. 許可
使用命名空間,用戶可以定義訪問控制列表,例如讀取、刪除和更新權限,并且通過使用寫入權限,用戶可以創(chuàng)建表。
4. 配額
該組件用于定義命名空間可以為表和區(qū)域包含的配額。
5. 預定義的命名空間
有兩個預定義的特殊命名空間。
hbase:這是一個系統(tǒng)命名空間,用于包含 HBase 內部表。
default:此命名空間適用于所有未定義命名空間的表。
Now let us see each component of the namespace.
1. Table
All tables are part of the namespace. If there is no namespace defined then the table will be assigned to the default namespace.
2. RegionServer group
It is possible to have a default RegionServer group for a namespace. In that case, a table created will be a member of RegionServer.
3. Permission
Using namespace a user can define Access Control Lists such as a read, delete, and update permission, and by using write permission a user can create a table.
4. Quota
This component is used to define a quota that the namespace can contain for tables and regions.
5. Predefined namespaces
There are two predefined special namespaces.
hbase:?This is a system namespace that is used to contain HBase internal tables.
default:?This namespace is for all the tables for which a namespace is not defined.
HBase 數(shù)據(jù)模型操作

主要的操作數(shù)據(jù)模型有Get、Put、Scan和Delete。使用這些操作,我們可以從表中讀取、寫入和刪除記錄。
讓我們詳細了解每個操作。
1.Get
Get操作類似于關系數(shù)據(jù)庫的Select語句。它用于獲取 HBase 表的內容。
我們可以在 HBase shell 上執(zhí)行 Get 命令,如下所示。
hbase(main):001:0> get'table?name','row?key'<filters>
2. Put
Put操作用于讀取表的多行。它不同于我們需要指定一組要讀取的行的進入。使用 Scan 我們可以遍歷表中的一系列行或所有行。
3.Scan
掃描操作用于讀取表的多行。它與 Get 不同,Get 中我們需要指定一組要讀取的行。使用 Scan 我們可以遍歷表中的一系列行或所有行。
4.Delete
刪除操作用于從 HBase 表中刪除一行或一組行??梢酝ㄟ^HTable.delete()來執(zhí)行。
一旦執(zhí)行了刪除命令,它就會被標記為墓碑,當壓縮發(fā)生時,該行最終從表中刪除。
各種類型的內部刪除標記如下。
刪除它用于列的特定版本。
刪除列可用于所有列版本。
刪除族它用于特定 ColumnFamily 的所有列。
HBase 系統(tǒng)架構

HBase 更多的適用場景是數(shù)據(jù)存儲,而不是數(shù)據(jù)庫。 HBase 可以通過在集群中添加商品節(jié)點來線性擴展和模塊化擴展。如果我們將節(jié)點從 20 個增加到 40 個,那么在 HBase 集群中,存儲和容量也會同時增加。
HBase is represented as Data Store rather than a database. HBase can scale linear as well as modular by adding commodity nodes in the cluster. If we are increasing the nodes from 20 to 40 then in the HBase cluster then the storage and the capacity also increases concurrently.

HBase is represented as Data Store rather than a database. HBase can scale linear as well as modular by adding commodity nodes in the cluster. If we are increasing the nodes from 20 to 40 then in the HBase cluster then the storage and the capacity also increases concurrently.

Client
包含訪問HBase的接口并維護cache來加快對HBase的訪問
Zookeeper
保證任何時候,集群中只有一個master
存貯所有Region的尋址入口。
實時監(jiān)控Region server的上線和下線信息。并實時通知Master
存儲HBase的schema和table元數(shù)據(jù)
Master
為Region server分配region
負責Region server的負載均衡
發(fā)現(xiàn)失效的Region server并重新分配其上的region
管理用戶對table的增刪改操作
RegionServer
Region server維護region,處理對這些region的IO請求
Region server負責切分在運行過程中變得過大的region
HLog(WAL log):
HLog文件就是一個普通的Hadoop Sequence File,Sequence File 的Key是HLogKey對象,HLogKey中記錄了寫入數(shù)據(jù)的歸屬信息,除了table和 region名字外,同時還包括sequence number和timestamp,timestamp是” 寫入時間”,sequence number的起始值為0,或者是最近一次存入文件系 統(tǒng)中sequence number。
HLog SequeceFile的Value是HBase的KeyValue對象,即對應HFile中的 KeyValue
Region
HBase自動把表水平劃分成多個區(qū)域(region),每個region會保存一個表里面某段連續(xù)的數(shù)據(jù);每個表一開始只有一個region,隨著數(shù)據(jù)不斷插入表,region不斷增大,當增大到一個閥值的時候,region就會等分會 兩個新的region(裂變);
當table中的行不斷增多,就會有越來越多的region。這樣一張完整的表 被保存在多個Regionserver上。
Memstore 與 storefile
一個region由多個store組成,一個store對應一個CF(列族)
store包括位于內存中的memstore和位于磁盤的storefile寫操作先寫入memstore,當memstore中的數(shù)據(jù)達到某個閾值,hregionserver會啟動flashcache進程寫入storefile,每次寫入形成單獨的一個storefile
當storefile文件的數(shù)量增長到一定閾值后,系統(tǒng)會進行合并(minor、 major compaction),在合并過程中會進行版本合并和刪除工作 (majar),形成更大的storefile。
當一個region所有storefile的大小和超過一定閾值后,會把當前的region分割為兩個,并由hmaster分配到相應的regionserver服務器,實現(xiàn)負載均衡。
客戶端檢索數(shù)據(jù),先在memstore找,找不到再找storefile
HRegion是HBase中分布式存儲和負載均衡的最小單元。最小單元就表示不同的HRegion可以分布在不同的HRegion server上。
HRegion由一個或者多個Store組成,每個store保存一個columns family。
每個Strore又由一個memStore和0至多個StoreFile組成。

HBase Components
Let us discuss various components of HBase.
1. ZooKeeper
Apache ZooKeeper is a high-performance, centralized, multi coordination service system for distributed applications, which provides a distributed synchronization and group service to HBase. It directs the focus of users on the application logic despite cluster coordination. It also provides an API using which a user can coordinate with the Master server.
Apache ZooKeeper APIs provide consistency, ordering, and durability, it also provides synchronization and concurrency for a distributed clustered system.
2. HMaster
Apache HBase HMaster is an important component of the HBase cluster that is responsible RegionServers monitoring, handling the failover, and managing region split.
HMaster functionalities are as below.
It Monitors the RegionServers.
It Handles RegionServers failover.
It is used to handle metadata changes.
It will assign/disallow regions.
It provides an interface for all metadata changes.
It is used to perform reload balancing in idle time.
HMaster provides a web user interface that shows information about the HBase cluster.
3. RegionServers
RegionServers are responsible for storing the actual data. Just like in the Hadoop cluster, a NameNode stores metadata, and DataNode stores actual data similar way in HBase, mater holds the metadata, and RegionServers stores actual data. RegionServer runs on a DataNode in a distributed cluster environment.
RegionServer performs the following tasks.
It handles the serving regions (tables) assigned to it.
It Handles read and write requests performed by the client.
It will flush the cache to HDFS.
It is responsible for handling region splits.
It maintains HLogs.
Components of a RegionServer
Let us see the components of RegionServer.
3.1 WAL(Write-Ahead logs)
Apache HBase WAL is an intermediate file also called an edit log file. When data is read or modified to HBase, it's not directly written in the disk rather it is kept in memory for some time but keeping data in memory could be dangerous because if the system goes down then all data would be erased so to overcome to this issue Apache HBase has a Write-Ahead logfile in which data will be written at first place and then on memory.
3.2 HFile
This is the actual file where row data is stored physically.
3.3 Store
It corresponds to a column family for a table in HBase.Here the HFile is stored
3.4 MemStore
This component resides in the main memory and records the current data operation so if data is stored in WAL then RegionServers stores key-value in the memory store.
3.5 Region
Regions are the splits of a table which is divided based on the key and hosted by RegionServers.
4. Client
The client can be written in Java or any other language and using external APIs to connect to RegionServer which is managing actual row data. Client query to catalog tables to find out the region and once the region is found, the client directly contacts RegionServers and performs the data operation and cached the data for fast retrieval.
5. Catalog Tables
Catalog Tables are used to maintain metadata for all RegionServers and regions.
There are two types of Catalog tables that exist in HBase.
-ROOT-?This table will have information about the location of the META table.
.META?This table contains information about all regions and their locations.
HBase Basic Architecture
HBase?consists of HMaster and HRegionServer and also follows the master-slave server architecture. HBase divides the logical table into multiple data blocks, HRegion, and stores them in HRegionServer.
HMaster?is responsible for managing all HRegionServers. It does not store any data itself, but only stores the mappings (metadata) of data to HRegionServer.
All nodes in the cluster are coordinated by Zookeeper and handle various issues that may be encountered during HBase operation. The basic architecture of HBase is shown below:

Client :?Use HBase’s RPC mechanism to communicate with HMaster and HRegionServer, submit requests and get results. For management operations, the client performs RPC with HMaster. For data read and write operations, the client performs RPC with HRegionServer.
Zookeeper:?By registering the status information of each node in the cluster to ZooKeeper, HMaster can sense the health status of each HRegionServer at any time, and can also avoid the single point problem of HMaster.
HMaster:?Manage all HRegionServers, tell them which HRegions need to be maintained, and monitor the health of all HRegionServers. When a new HRegionServer logs in to HMaster, HMaster tells it to wait for data to be allocated. When an HRegion dies, HMaster marks all HRegions it is responsible for as unallocated and then assigns them to other HRegionServers. HMaster does not have a single point problem. HBase can start multiple HMasters. Through the Zookeeper’s election mechanism, there is always one HMaster running in the cluster, which improves the availability of the cluster.
HRegion:?When the size of the table exceeds the preset value, HBase will automatically divide the table into different areas, each of which contains a subset of all the rows in the table. For the user, each table is a collection of data, distinguished by a primary key (RowKey). Physically, a table is split into multiple blocks, each of which is an HRegion. We use the table name + start/end primary key to distinguish each HRegion. One HRegion will save a piece of continuous data in a table. A complete table data is stored in multiple HRegions.
HRegionServer:?All data in HBase is generally stored in HDFS from the bottom layer. Users can obtain this data through a series of HRegionServers. Generally, only one HRegionServer is running on one node of the cluster, and the HRegion of each segment is only maintained by one HRegionServer. HRegionServer is mainly responsible for reading and writing data to the HDFS file system in response to user I/O requests. It is the core module in HBase. HRegionServer internally manages a series of HRegion objects, each HRegion corresponding to a continuous data segment in the logical table. HRegion is composed of multiple HStores. Each HStore corresponds to the storage of one column family in the logical table. It can be seen that each column family is a centralized storage unit. Therefore, to improve operational efficiency, it is preferable to place columns with common I/O characteristics in one column family.
HStore:?It is the core of HBase storage, which consists of MemStore and StoreFiles. MemStore is a memory buffer. The data written by the user will first be put into MemStore. When MemStore is full, Flush will be a StoreFile (the underlying implementation is HFile). When the number of StoreFile files increases to a certain threshold, the Compact merge operation will be triggered, merge multiple StoreFiles into one StoreFile, and perform version merge and data delete operations during the merge process. Therefore, it can be seen that HBase only adds data, and all update and delete operations are performed in the subsequent Compact process, so that the user’s write operation can be returned as soon as it enters the memory, ensuring the high performance of HBaseI/O. When StoreFiles Compact, it will gradually form a larger and larger StoreFile. When the size of a single StoreFile exceeds a certain threshold, the Split operation will be triggered. At the same time, the current HRegion will be split into 2 HRegions, and the parent HRegion will go offline. The two sub-HRegions are assigned to the corresponding HRegionServer by HMaster so that the load pressure of the original HRegion is shunted to the two HRegions.
HLog:?Each HRegionServer has an HLog object, which is a pre-written log class that implements the Write Ahead Log. Each time a user writes data to MemStore, it also writes a copy of the data to the HLog file. The HLog file is periodically scrolled and deleted, and the old file is deleted (data that has been persisted to the StoreFile). When HMaster detects that an HRegionServer is terminated unexpectedly by the Zookeeper, HMaster first processes the legacy HLog file, splits the HLog data of different HRegions, puts them into the corresponding HRegion directory, and then redistributes the invalid HRegions. In the process of loading HRegion, HRegionServer of these HRegions will find that there is a history HLog needs to be processed so the data in Replay HLog will be transferred to MemStore, then Flush to StoreFiles to complete data recovery.
https://towardsdatascience.com/hbase-working-principle-a-part-of-hadoop-architecture-fbe0453a031b
HBase優(yōu)化最佳實踐
1.預先分區(qū)
默認情況下,在創(chuàng)建 HBase 表的時候會自動創(chuàng)建一個 Region 分區(qū),當導入數(shù)據(jù)的時候,所有的 HBase 客戶端都向這一個 Region 寫數(shù)據(jù),直到這個 Region 足夠大了才進行切分。一種可以加快批量寫入速度的方法是通過預先創(chuàng)建一些空的 Regions,這樣當數(shù)據(jù)寫入 HBase 時,會按照 Region 分區(qū)情況,在集群內做數(shù)據(jù)的負載均衡。
2.Rowkey優(yōu)化
HBase 中 Rowkey 是按照字典序存儲,因此,設計 Rowkey 時,要充分利用排序特點,將經(jīng)常一起讀取的數(shù)據(jù)存儲到一塊,將最近可能會被訪問的數(shù)據(jù)放在一塊。
此外,Rowkey 若是遞增的生成,建議不要使用正序直接寫入 Rowkey,而是采用 reverse 的方式反轉Rowkey,使得 Rowkey 大致均衡分布,這樣設計有個好處是能將 RegionServer 的負載均衡,否則容易產(chǎn)生所有新數(shù)據(jù)都在一個 RegionServer 上堆積的現(xiàn)象,這一點還可以結合 table 的預切分一起設計。
3.減少列族數(shù)量
不要在一張表里定義太多的 ColumnFamily。目前 Hbase 并不能很好的處理超過 2~3 個 ColumnFamily 的表。因為某個 ColumnFamily 在 flush 的時候,它鄰近的 ColumnFamily 也會因關聯(lián)效應被觸發(fā) flush,最終導致系統(tǒng)產(chǎn)生更多的 I/O。
4.緩存策略
創(chuàng)建表的時候,可以通過 HColumnDescriptor.setInMemory(true) 將表放到 RegionServer 的緩存中,保證在讀取的時候被 cache 命中。
5.設置存儲生命期
創(chuàng)建表的時候,可以通過HColumnDescriptor.setTimeToLive(int timeToLive)設置表中數(shù)據(jù)的存儲生命期,過期數(shù)據(jù)將自動被刪除。
6.硬盤配置
每臺 RegionServer 管理 10~1000 個 Regions,每個 Region 在 1~2G,則每臺 Server 最少要 10G,最大要1000*2G=2TB,考慮 3 備份,則要 6TB。方案一是用 3 塊 2TB 硬盤,二是用 12 塊 500G 硬盤,帶寬足夠時,后者能提供更大的吞吐率,更細粒度的冗余備份,更快速的單盤故障恢復。
7.分配合適的內存給RegionServer服務
在不影響其他服務的情況下,越大越好。例如在 HBase 的 conf 目錄下的 hbase-env.sh 的最后添加export HBASE_REGIONSERVER_OPTS="-Xmx16000m$HBASE_REGIONSERVER_OPTS”
其中 16000m 為分配給 RegionServer 的內存大小。
8.寫數(shù)據(jù)的備份數(shù)
備份數(shù)與讀性能成正比,與寫性能成反比,且備份數(shù)影響高可用性。有兩種配置方式,一種是將 hdfs-site.xml拷貝到 hbase 的 conf 目錄下,然后在其中添加或修改配置項 dfs.replication 的值為要設置的備份數(shù),這種修改對所有的 HBase 用戶表都生效,另外一種方式,是改寫 HBase 代碼,讓 HBase 支持針對列族設置備份數(shù),在創(chuàng)建表時,設置列族備份數(shù),默認為 3,此種備份數(shù)只對設置的列族生效。
9.WAL(預寫日志)
可設置開關,表示 HBase 在寫數(shù)據(jù)前用不用先寫日志,默認是打開,關掉會提高性能,但是如果系統(tǒng)出現(xiàn)故障(負責插入的 RegionServer 掛掉),數(shù)據(jù)可能會丟失。配置 WAL 在調用 JavaAPI 寫入時,設置 Put 實例的WAL,調用 Put.setWriteToWAL(boolean)。
10. 批量寫
HBase 的 Put 支持單條插入,也支持批量插入,一般來說批量寫更快,節(jié)省來回的網(wǎng)絡開銷。在客戶端調用JavaAPI 時,先將批量的 Put 放入一個 Put 列表,然后調用 HTable 的 Put(Put 列表) 函數(shù)來批量寫。
11. 客戶端一次從服務器拉取的數(shù)量
通過配置一次拉去的較大的數(shù)據(jù)量可以減少客戶端獲取數(shù)據(jù)的時間,但是它會占用客戶端內存。有三個地方可進行配置:
1)在 HBase 的 conf 配置文件中進行配置hbase.client.scanner.caching;
2)通過調用HTable.setScannerCaching(intscannerCaching)進行配置;
3)通過調用Scan.setCaching(intcaching)進行配置。三者的優(yōu)先級越來越高。
12. RegionServer的請求處理I/O線程數(shù)
較少的 IO 線程適用于處理單次請求內存消耗較高的 Big Put 場景 (大容量單次 Put 或設置了較大 cache 的Scan,均屬于 Big Put) 或 ReigonServer 的內存比較緊張的場景。
較多的 IO 線程,適用于單次請求內存消耗低,TPS 要求 (每秒事務處理量 (TransactionPerSecond)) 非常高的場景。設置該值的時候,以監(jiān)控內存為主要參考。
在 hbase-site.xml 配置文件中配置項為hbase.regionserver.handler.count。
13. Region的大小設置
配置項為hbase.hregion.max.filesize,所屬配置文件為hbase-site.xml.,默認大小256M。
在當前 ReigonServer 上單個 Reigon 的最大存儲空間,單個 Region 超過該值時,這個 Region 會被自動 split成更小的 Region。小 Region 對 split 和 compaction 友好,因為拆分 Region 或 compact 小 Region 里的StoreFile 速度很快,內存占用低。缺點是 split 和 compaction 會很頻繁,特別是數(shù)量較多的小 Region 不停地split, compaction,會導致集群響應時間波動很大,Region 數(shù)量太多不僅給管理上帶來麻煩,甚至會引發(fā)一些Hbase 的 bug。一般 512M 以下的都算小 Region。大 Region 則不太適合經(jīng)常 split 和 compaction,因為做一次 compact 和 split 會產(chǎn)生較長時間的停頓,對應用的讀寫性能沖擊非常大。
此外,大 Region 意味著較大的 StoreFile,compaction 時對內存也是一個挑戰(zhàn)。如果你的應用場景中,某個時間點的訪問量較低,那么在此時做 compact 和 split,既能順利完成 split 和 compaction,又能保證絕大多數(shù)時間平穩(wěn)的讀寫性能。compaction 是無法避免的,split 可以從自動調整為手動。只要通過將這個參數(shù)值調大到某個很難達到的值,比如 100G,就可以間接禁用自動 split(RegionServer 不會對未到達 100G 的 Region 做split)。再配合 RegionSplitter 這個工具,在需要 split 時,手動 split。手動 split 在靈活性和穩(wěn)定性上比起自動split 要高很多,而且管理成本增加不多,比較推薦 online 實時系統(tǒng)使用。內存方面,小 Region 在設置memstore 的大小值上比較靈活,大 Region 則過大過小都不行,過大會導致 flush 時 app 的 IO wait 增高,過小則因 StoreFile 過多影響讀性能。
14.操作系統(tǒng)參數(shù)
Linux系統(tǒng)最大可打開文件數(shù)一般默認的參數(shù)值是1024,如果你不進行修改并發(fā)量上來的時候會出現(xiàn)“Too Many Open Files”的錯誤,導致整個HBase不可運行,你可以用ulimit -n 命令進行修改,或者修改/etc/security/limits.conf和/proc/sys/fs/file-max 的參數(shù),具體如何修改可以去Google 關鍵字 “l(fā)inux limits.conf ”
15.Jvm配置
修改 hbase-env.sh 文件中的配置參數(shù),根據(jù)你的機器硬件和當前操作系統(tǒng)的JVM(32/64位)配置適當?shù)膮?shù)。
HBASE_HEAPSIZE 4000? HBase使用的 JVM 堆的大小
HBASE_OPTS? "‐server ‐XX:+UseConcMarkSweepGC" JVM GC 選項
HBASE_MANAGES_ZKfalse? 是否使用Zookeeper進行分布式管理
16. 持久化
重啟操作系統(tǒng)后HBase中數(shù)據(jù)全無,你可以不做任何修改的情況下,創(chuàng)建一張表,寫一條數(shù)據(jù)進行,然后將機器重啟,重啟后你再進入HBase的shell中使用 list 命令查看當前所存在的表,一個都沒有了。是不是很杯具?沒有關系你可以在hbase/conf/hbase-default.xml中設置hbase.rootdir的值,來設置文件的保存位置指定一個文件夾,例如:file:///you/hbase-data/path,你建立的HBase中的表和數(shù)據(jù)就直接寫到了你的磁盤上,同樣你也可以指定你的分布式文件系統(tǒng)HDFS的路徑例如:hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR,這樣就寫到了你的分布式文件系統(tǒng)上了。
17. 緩沖區(qū)大小
hbase.client.write.buffer
這個參數(shù)可以設置寫入數(shù)據(jù)緩沖區(qū)的大小,當客戶端和服務器端傳輸數(shù)據(jù),服務器為了提高系統(tǒng)運行性能開辟一個寫的緩沖區(qū)來處理它,這個參數(shù)設置如果設置的大了,將會對系統(tǒng)的內存有一定的要求,直接影響系統(tǒng)的性能。
18. 掃描目錄表
hbase.master.meta.thread.rescanfrequency
定義多長時間HMaster對系統(tǒng)表 root 和 meta 掃描一次,這個參數(shù)可以設置的長一些,降低系統(tǒng)的能耗。
19. split/compaction時間間隔
hbase.regionserver.thread.splitcompactcheckfrequency
這個參數(shù)是表示多久去RegionServer服務器運行一次split/compaction的時間間隔,當然split之前會先進行一個compact操作.這個compact操作可能是minorcompact也可能是major compact.compact后,會從所有的Store下的所有StoreFile文件最大的那個取midkey.這個midkey可能并不處于全部數(shù)據(jù)的mid中.一個row-key的下面的數(shù)據(jù)可能會跨不同的HRegion。
20. 緩存在JVM堆中分配的百分比
hfile.block.cache.size
指定HFile/StoreFile 緩存在JVM堆中分配的百分比,默認值是0.2,意思就是20%,而如果你設置成0,就表示對該選項屏蔽。
21. ZooKeeper客戶端同時訪問的并發(fā)連接數(shù)
hbase.zookeeper.property.maxClientCnxns
這項配置的選項就是從zookeeper中來的,表示ZooKeeper客戶端同時訪問的并發(fā)連接數(shù),ZooKeeper對于HBase來說就是一個入口這個參數(shù)的值可以適當放大些。
22. memstores占用堆的大小參數(shù)配置
hbase.regionserver.global.memstore.upperLimit
在RegionServer中所有memstores占用堆的大小參數(shù)配置,默認值是0.4,表示40%,如果設置為0,就是對選項進行屏蔽。
23. Memstore中緩存寫入大小
hbase.hregion.memstore.flush.size
Memstore中緩存的內容超過配置的范圍后將會寫到磁盤上,例如:刪除操作是先寫入MemStore里做個標記,指示那個value, column 或 family等下是要刪除的,HBase會定期對存儲文件做一個major compaction,在那時HBase會把MemStore刷入一個新的HFile存儲文件中。如果在一定時間范圍內沒有做major compaction,而Memstore中超出的范圍就寫入磁盤上了。
小結
HBase is a NoSQL database commonly referred to as the Hadoop Database, which is open-source and is based on Google's Big Table white paper. HBase runs on top of the Hadoop Distributed File System (HDFS), which allows it to be highly scalable, and it supports Hadoop's map-reduce programming model. HBase permits two types of access: random access of rows through their row keys and offline or batch access through map-reduce queries.
HBase 是一種 NoSQL 數(shù)據(jù)庫,通常稱為 Hadoop 數(shù)據(jù)庫,它是開源的,基于 Google 的 Big Table 白皮書。 HBase 運行在 Hadoop 分布式文件系統(tǒng) (HDFS) 之上,這使其具有高度可擴展性,并且支持 Hadoop 的 map-reduce 編程模型。 HBase 允許兩種類型的訪問:通過行鍵隨機訪問行和通過 map-reduce 查詢離線或批量訪問。
參考資料
https://blog.csdn.net/weixin_40535323/article/details/81704854
http://www.itdecent.cn/p/3832ae37fac4
https://hbase.apache.org/book.html#arch.overview
https://blog.csdn.net/whdxjbw/article/details/81101200
https://www.cloudduggu.com/hbase/data_model/
https://www.informit.com/articles/article.aspx?p=2253412
https://towardsdatascience.com/hbase-working-principle-a-part-of-hadoop-architecture-fbe0453a031b
https://developpaper.com/hbase-learning-1-basic-introduction/