
一、HBase介紹
1、基本概念
HBase是一種Hadoop數(shù)據(jù)庫(kù),經(jīng)常被描述為一種稀疏的,分布式的,持久化的,多維有序映射,它基于行鍵、列鍵和時(shí)間戳建立索引,是一個(gè)可以隨機(jī)訪問(wèn)的存儲(chǔ)和檢索數(shù)據(jù)的平臺(tái)。HBase不限制存儲(chǔ)的數(shù)據(jù)的種類,允許動(dòng)態(tài)的、靈活的數(shù)據(jù)模型,不用SQL語(yǔ)言,也不強(qiáng)調(diào)數(shù)據(jù)之間的關(guān)系。HBase被設(shè)計(jì)成在一個(gè)服務(wù)器集群上運(yùn)行,可以相應(yīng)地橫向擴(kuò)展。
2、HBase使用場(chǎng)景和成功案例
- 互聯(lián)網(wǎng)搜索問(wèn)題:爬蟲收集網(wǎng)頁(yè),存儲(chǔ)到BigTable里,MapReduce計(jì)算作業(yè)掃描全表生成搜索索引,從BigTable中查詢搜索結(jié)果,展示給用戶。
- 抓取增量數(shù)據(jù):例如,抓取監(jiān)控指標(biāo),抓取用戶交互數(shù)據(jù),遙測(cè)技術(shù),定向投放廣告等
- 內(nèi)容服務(wù)
- 信息交互
上面簡(jiǎn)單介紹一下hbase, 至于hbase的原理,以及架構(gòu), 后面我整理完, 再發(fā)出來(lái)。 現(xiàn)在只是對(duì)hbase會(huì)使用。 就先從使用開始入門。
二、 HBase使用
hbase是數(shù)據(jù)庫(kù), 數(shù)據(jù)庫(kù)那就是存儲(chǔ)數(shù)據(jù)的, 那就離不開curd.
類似mysql, 有shell客戶端以及語(yǔ)言的sdk方式。
2.1 HBASE shell
hbase shell 類似mysql的客戶端
help可以查看所有的命名幫助
下面是命令分組:
COMMAND GROUPS:
Group name: general
Commands: processlist, status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, create_layered, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_tableCFs, show_peer_tableCFs, update_peer_config
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission
Group name: procedures
Commands: abort_procedure, list_locks, list_procedures
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
Group name: rsgroup
Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup
1. 常規(guī)命名:
- 集群狀態(tài) status
hbase(main):005:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 793.0000 average load
Took 0.9453 seconds
- 版本 version
hbase(main):006:0> version
2.0.2, rc6f16dff66b5d7c4fb66d3bf7eda4f56515c63f3, Fri Jan 25 19:23:41 CST 2019
Took 0.0004 seconds
2. DDL命令
| 命令 | 命令含義 | 命令使用示例 |
|---|---|---|
| alter | 修改表的列族的描述屬性 | aliter 't1',NAME => 'f1',VERSIONS => 5 |
| alter_async | 異步修改表的列族的描述屬性,并不需要等待所有Region都完成操作。用法和alter命令相同 | alter_async 't1',NAME => 'f1',VERSIONS => 5 |
| alter_status | 獲取alter命令的狀態(tài),會(huì)標(biāo)注已經(jīng)有多少region更改了Schema。 命令的參數(shù)是表名 | alter_status 't1' |
| create | 創(chuàng)建表 | create 't1' ,{NAME => 'f1', VERSIONS => 5}; create 't1','f1','f2', 'f3' |
| describe | 獲取表的元數(shù)據(jù)信息和是否可用的的狀態(tài) | describe 't1' |
| disable | 禁用某個(gè)表 | disable 't1' |
| disable_all | 禁用所有正則匹配的表 | disable_all 't1.*' |
| drop | 刪除表 | drop 't1' |
| enable | 啟用表 | enable 't1' |
| enable_all | 啟用正則匹配的表 | enable_all 't1.*' |
| exists | 判斷表是否存在 | exists 't1' |
| is_disable | 判斷表是否是禁用的 | is_disable 't1' |
| is_enbale | 判斷表是否是啟用的 | is_disable 't1' |
| show_filter | 查看所支持的所有過(guò)濾器的名稱 | show_filters |
| list | 列出所有表的名稱 | list |
DML
- count
統(tǒng)計(jì)表的總行數(shù)
count 't1'
count 't1', INTERVAL => 1000
count 't1', CACHE => 1000,
count 't1', INTERVAL => 10, CACHE => 1000
- delete
刪除一個(gè)單元格
delete 't1', 'r1', 'c1', ts1
- deleteall
刪除一行或一列
deleteall 't1','r1'
deleteall 't1','r1','c1'
deleteall 't1', 'r1','c1', ts1
- get
單行讀
hbase> get 'ns1:t1', 'r1'
hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
hbase> get 't1', 'r1', 'c1'
hbase> get 't1', 'r1', 'c1', 'c2'
hbase> get 't1', 'r1', ['c1', 'c2']
hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
- get_counter
讀取計(jì)數(shù)器
hbase> get_counter 'ns1:t1', 'r1', 'c1'
hbase> get_counter 't1', 'r1', 'c1'
- incr
自增寫入
hbase> incr 'ns1:t1', 'r1', 'c1'
hbase> incr 't1', 'r1', 'c1'
hbase> incr 't1', 'r1', 'c1', 1
hbase> incr 't1', 'r1', 'c1', 10
hbase> incr 't1', 'r1', 'c1', 10, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> incr 't1', 'r1', 'c1', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> incr 't1', 'r1', 'c1', 10, {VISIBILITY=>'PRIVATE|SECRET'}
- put
數(shù)據(jù)寫入
hbase> put 'ns1:t1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value'
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}
- scan
掃描表
hbase> scan 'hbase:meta'
// 顯示指定列
hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
// limit start
hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
// 時(shí)間范圍
hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804000, 1303668904000]}
hbase> scan 't1', {REVERSED => true}
hbase> scan 't1', {ALL_METRICS => true}
hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']}
// 使用過(guò)濾器, show_filters查看所有可以使用的過(guò)濾器
hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => "
(QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"}
hbase> scan 't1', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
For setting the Operation Attributes
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
- truncate
清空表
truncate 't1'
還有其他命令, 就不多介紹了, 自己使用 help查看了
2.2 go操作 hbase
介紹一下go操作hbase
Install
go get github.com/tsuna/gohbase
Create a client
client := gohbase.NewClient("localhost")
Insert a cell
// Values maps a ColumnFamily -> Qualifiers -> Values.
values := map[string]map[string][]byte{"cf": map[string][]byte{"a": []byte{0}}}
putRequest, err := hrpc.NewPutStr(context.Background(), "table", "key", values)
rsp, err := client.Put(putRequest)
Get an entire row
getRequest, err := hrpc.NewGetStr(context.Background(), "table", "row")
getRsp, err := client.Get(getRequest)
Get a specific cell
// Perform a get for the cell with key "15", column family "cf" and qualifier "a"
family := map[string][]string{"cf": []string{"a"}}
getRequest, err := hrpc.NewGetStr(context.Background(), "table", "15",
hrpc.Families(family))
getRsp, err := client.Get(getRequest)
Get a specific cell with a filter
pFilter := filter.NewKeyOnlyFilter(true)
family := map[string][]string{"cf": []string{"a"}}
getRequest, err := hrpc.NewGetStr(context.Background(), "table", "15",
hrpc.Families(family), hrpc.Filters(pFilter))
getRsp, err := client.Get(getRequest)
Scan with a filter
pFilter := filter.NewPrefixFilter([]byte("7"))
scanRequest, err := hrpc.NewScanStr(context.Background(), "table",
hrpc.Filters(pFilter))
scanRsp, err := client.Scan(scanRequest)
我們看一下代碼架構(gòu)
├── AUTHORS
├── COPYING
├── Makefile
├── README.md
├── admin_client.go
├── caches.go
├── check_line_len.awk
├── client.go
├── discovery_test.go
├── filter
├── hrpc
├── install_ci.sh
├── integration_test.go
├── metacache_test.go
├── pb
├── region
├── rpc.go
├── rpc_test.go
├── scanner.go
├── scanner_test.go
├── table_test.go
├── test
└── zk
上面代碼的整理的很有條理,
hrpc主要是rpc調(diào)用的方法
filter是get或scan的filter過(guò)濾器
region是 region的一些接口
cache是緩存,hbase中為了提高性能,很多地方都采用cache方式。
zk就是zookeeper相關(guān)的。
我們下面閱讀以下源碼
gohbase操作的入口主要是 client和admin_client
我們圍繞 client 和admin_client看
// AdminClient to perform admistrative operations with HMaster
type AdminClient interface {
CreateTable(t *hrpc.CreateTable) error
DeleteTable(t *hrpc.DeleteTable) error
EnableTable(t *hrpc.EnableTable) error
DisableTable(t *hrpc.DisableTable) error
ClusterStatus() (*pb.ClusterStatus, error)
}
// CreateTable represents a CreateTable HBase call
type CreateTable struct {
base
families map[string]map[string]string
splitKeys [][]byte
}
// NewCreateTable creates a new CreateTable request that will create the given
// table in HBase. 'families' is a map of column family name to its attributes.
// For use by the admin client.
func NewCreateTable(ctx context.Context, table []byte,
families map[string]map[string]string,
options ...func(*CreateTable)) *CreateTable {
ct := &CreateTable{
base: base{
table: table,
ctx: ctx,
resultch: make(chan RPCResult, 1),
},
families: make(map[string]map[string]string, len(families)),
}
for _, option := range options {
option(ct)
}
for family, attrs := range families {
ct.families[family] = make(map[string]string, len(defaultAttributes))
for k, dv := range defaultAttributes {
if v, ok := attrs[k]; ok {
ct.families[family][k] = v
} else {
ct.families[family][k] = dv
}
}
}
return ct
}
主要是DDL
再看 client
// Client a regular HBase client
type Client interface {
Scan(s *hrpc.Scan) hrpc.Scanner
Get(g *hrpc.Get) (*hrpc.Result, error)
Put(p *hrpc.Mutate) (*hrpc.Result, error)
Delete(d *hrpc.Mutate) (*hrpc.Result, error)
Append(a *hrpc.Mutate) (*hrpc.Result, error)
Increment(i *hrpc.Mutate) (int64, error)
CheckAndPut(p *hrpc.Mutate, family string, qualifier string,
expectedValue []byte) (bool, error)
Close()
}
主要是DML相關(guān)的。
我們看一下put吧
// NewPut creates a new Mutation request to insert the given
// family-column-values in the given row key of the given table.
func NewPut(ctx context.Context, table, key []byte,
values map[string]map[string][]byte, options ...func(Call) error) (*Mutate, error) {
m, err := baseMutate(ctx, table, key, values, options...)
if err != nil {
return nil, err
}
m.mutationType = pb.MutationProto_PUT
return m, nil
}
// NewPutStr is just like NewPut but takes table and key as strings.
func NewPutStr(ctx context.Context, table, key string,
values map[string]map[string][]byte, options ...func(Call) error) (*Mutate, error) {
return NewPut(ctx, []byte(table), []byte(key), values, options...)
}
其中
// baseMutate returns a Mutate struct without the mutationType filled in.
func baseMutate(ctx context.Context, table, key []byte, values map[string]map[string][]byte,
options ...func(Call) error) (*Mutate, error) {
m := &Mutate{
base: base{
table: table,
key: key,
ctx: ctx,
resultch: make(chan RPCResult, 1),
},
values: values,
timestamp: MaxTimestamp,
}
err := applyOptions(m, options...)
if err != nil {
return nil, err
}
return m, nil
}
// 注意
func applyOptions(call Call, options ...func(Call) error) error {
call.(withOptions).setOptions(options)
for _, option := range options {
err := option(call)
if err != nil {
return err
}
}
return nil
}
其中option的使用如下:
client := gohbase.NewClient("localhost")
pFilter := filter.NewKeyOnlyFilter(true)
family := map[string][]string{"cf": []string{"a"}}
getRequest, _ := hrpc.NewGetStr(context.Background(), "table", "15",
hrpc.Families(family), hrpc.Filters(pFilter), hrpc.MaxVersions(2))
_, _ := client.Get(getRequest)
values := map[string]map[string][]byte{"cf": map[string][]byte{"a": []byte{0}}}
putRequest, err := hrpc.NewPutStr(context.Background(), "table", "key", values, hrpc.Timestamp(time.Time{}), hrpc.MaxVersions(1))
rsp, err := client.Put(putRequest)
}
