Phoenix常用操作記錄

Apache Phoenix 常用操作

基礎(chǔ)知識

1****、****Phoenix 主要技術(shù)點

a、將SQL轉(zhuǎn)化為HBase Scan,將結(jié)果封裝為JDBC Result Set。

b、表的元數(shù)據(jù)保存在HBase表(系統(tǒng)表)中。

c、使用了coprocessor 和 custom filter 保證高效,使得小規(guī)模查詢的延時在毫秒級,百萬行的查詢延時在秒級。

· coprocessors to perform operations on the server-side thus minimizing client/server data transfer

· custom filters to prune data as close to the source as possible In addition, to minimize any startup costs, Phoenix uses native HBase APIs rather than going through the map/reduce framework

2****、****JDBC****連接的****URL

jdbc:phoenix [ :<zookeeper quorum> [ :<port number> ] [ :<root node> ] ]

如:

Connection conn = DriverManager.getConnection("jdbc:phoenix:server1,server2:2181");

屬性對應(yīng)于:hbase.zookeeper.quorum, hbase.zookeeper.property.clientPort, zookeeper.znode.parent

3、Phoenix 不支持特性

a、Full Transaction, 現(xiàn)在只支持 TRANSACTION_READ_COMMITTED,不支持其他類型。

b、關(guān)系操作. Union, Intersect, Minus

c、雜項內(nèi)置函數(shù)

4****、****映射到已存在****HBase****表

使用 CREATE TABLE and CREATE VIEW

區(qū)別:

a、CREATE TABLE 啟用 KEEP_DELETED_CELLS = true, CREATE VIEW 不會

b、CREATE TABLE 能添加HBase 中不存在的列族,CREATE VIEW 不會

加鹽處理

因為HBase 數(shù)據(jù)儲存按照 row key 排序,如果HBase表的 row key 是單調(diào)遞增的,則HBase 容易有RegionServer 的局部熱點問題。加鹽可以緩解這個問題。

create table H3 (id varchar not null primary key, cf1.a varchar, cf2.b varchar) SALT_BUCKETS=20; 只能在創(chuàng)建表格時候加,創(chuàng)建后不可更改。

alter table h1 set salt_buckets=10;
Error: ERROR 1024 (42Y83): Salt bucket number may only be specified when creating a table. tableName=H1

加鹽后的注意事項:

a、sequential scan 返回的結(jié)果可能不是自然排序的,如果sequential scan使用了LIMIT語句,將與不加鹽的情況不一樣。

b、 Spit point:If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually, users need to include a salt byte in the split points they provide.

c、Row Key 排序:Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table

實際上是改寫了Row Key,添加了一個prefix

new_row_key = (++index % BUCKETS_NUMBER) + original_key

數(shù)據(jù)存儲到 Buckects_Number 個Bucket中 ,每個Bucket的Prefix 相同,在query的時候,同時在各個Bucket進(jìn)行。

提升效率常用方法

1****、加鹽:

加鹽可以將數(shù)據(jù)存入多個region里,從而提升讀寫性能。

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=42

如果有16臺region server,每臺server有4核CPU,則SALT_BUCKETS 設(shè)置為32~64之間。

即如果集群總的CPU核數(shù)為N,則SALT_BUCKETS為 0.5N ~ N 之間。

2****、****split

如果不想通過加鹽來分區(qū),可以自己手動設(shè)置分區(qū)的方法。這樣可以不引入額外的byte,或者改變row key的順序,例子

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')

3****、使用多個列族

CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)

4****、使用壓縮

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ'

5****、使用二級索引

參考其余博文

6****、優(yōu)化集群

參考其余博文

7****、優(yōu)化****phoenix 參數(shù)

使用任意時間戳

在Property里面設(shè)置屬性 "CurrentSCN"。

ts是一個long。

Properties props = new Properties();

props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));

Connection conn = DriverManager.connect(myUrl, props);

conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");

conn.commit();

相當(dāng)于:

myTable.put(Bytes.toBytes('a'),ts);

索引基礎(chǔ)

1、immutable Index

原文:Immutable indexing targets use cases that are write once, append only; this is common in time-series data, where you log once, but read multiple times. In this case, the indexing is managed entirely on the client - either we successfully write all the primary and index data or we return a failure to the client. Since once written, rows are never updated, no incremental index maintenance is required making them perform very well. This reduces the overhead of secondary indexing at write time. However, keep in mind that immutable indexing are only applicable in a limited set of use cases.

One restriction of immutable indexes is that rows from the data table may not be deleted. Instead, the only way to delete rows is to drop the entire data table.

Immutable 索引適用于一次寫入,數(shù)據(jù)只添加不修改的情況,例如時間序列數(shù)據(jù)。因為只需要一次寫入,數(shù)據(jù)行不會更新,不需要額外的索引維護(hù),所以性能非常好。

例子:

<pre style="line-height:15.0pt;background:whitesmoke;white-space:pre-wrap;
word-wrap: break-word">CREATE TABLE my_table (k VARCHAR PRIMARY KEY, v1 VARCHAR, v2 VARCHAR , v3 VARCHAR) IMMUTABLE_ROWS=true;</pre>

<pre style="margin-bottom:7.5pt;line-height:15.0pt;background:whitesmoke;
white-space:pre-wrap;word-wrap: break-word">CREATE INDEX my_index ON my_table (v2 DESC, v1) INCLUDE (v3);</pre>

2.Global Indexing (Mutable)

原文:Global indexing targets read heavy, low write uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETE, UPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. Note, however, if a column is referenced in a query that isn’t part of the index, the index will not be used for that query.

全局索引適用于 高數(shù)據(jù)量讀取,低數(shù)據(jù)量寫入的情況。全局索引的消耗主要在索引寫入的時候,在對數(shù)據(jù)表進(jìn)行寫操作的時候,同時更新所有的索引表。在讀取的時候,將直接從索引表掃描讀取數(shù)據(jù),如果讀取另外一個表一樣。如果query中某個列在索引表中不存在,全局索引將不會用到。

CREATE TABLE my_table (k VARCHAR PRIMARY KEY, v1 VARCHAR, v2 VARCHAR , v3 VARCHAR)

CREATE INDEX my_index ON my_table (v2 DESC, v1) INCLUDE (v3);

3.Local Indexing (Mutable)

原文: Local indexing targets *****write heavy*****, *****space constrained***** use cases. With local indexes index data and table data are co-reside at same server so no network overhead during writes and reads. Local indexes can be used even when the query isn’t fully covered i.e. Phoenix automatically retrieve the columns not in the index through point gets against the data table. Unlike global indexes all local indexes data of a table are stored in a separate shared table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined which incurs some overhead.

局部索引適用于大數(shù)據(jù)量寫、空間受限的情況下。使用局部索引,索引數(shù)據(jù)和表數(shù)據(jù)將同時放在同一臺server上,所以在讀寫的時候不會有網(wǎng)絡(luò)通信的開銷。如果query中某個列在索引表中不存在,局部索引也能用到。不同于全局索引,一個表的全部局部索引的數(shù)據(jù)保存在同一個共享表中。在讀取的時候,每一個region 都必須檢查數(shù)據(jù),因為索引數(shù)據(jù)的確切的區(qū)域位置無法預(yù)先,確定這會增加一些系統(tǒng)開銷。

例子:

CREATE TABLE my_table (k VARCHAR PRIMARY KEY, v1 VARCHAR, v2 VARCHAR , v3 VARCHAR)

CREATE LOCAL INDEX my_index ON my_table (v2 DESC, v1) INCLUDE (v3);

4****、不會使用到二級索引的情況

創(chuàng)建表:create table usertable (id varchar primary key, firstname varchar, lastname varchar);

創(chuàng)建全局索引:create index idx_name on usertable (firstname);

檢索:select id, firstname, lastname from usertable where firstname = 'foo';

不會使用到索引 idx_name。若要使用到,必須這樣:

create idx_name on usertable (firstname)include(lastname);

5****、不會使用主鍵索引情況:

創(chuàng)建表:CREATE TABLE TEST (pk1 char(1) not null, pk2 char(1) not null, pk3 char(1) not null, non-pk varchar **CONSTRAINT PK PRIMARY KEY(pk1, pk2, pk3) **);

不會使用到索引的檢索:select * from test where pk2='x' and pk3='y'

會使用到索引的檢索:select * from test where pk1='x' and pk2='y'

子查詢

1、IN 和 Not In 的子查詢

SELECT ItemName

FROM Items

WHERE ItemID IN

(SELECT ItemID

FROM Orders

WHERE Date >= to_date('2013/09/02'));

2、Exists 和Not Exists的子查詢

SELECT ItemName

FROM Items i

WHERE EXISTS

(SELECT *

FROM Orders

WHERE Date >= to_date('2013/09/02')

AND ItemID = i.ItemID);

3、半連接、反連接

見Join

4、比較運算

SELECT ID, Name

FROM Contest

WHERE Score >

(SELECT avg(Score)

FROM Contest)

ORDER BY Score DESC;

5、ANY/SOME/ALL 運算

SELECT OrderID

FROM Orders

WHERE quantity >= ANY

(SELECT max(quantity)

FROM Orders

GROUP BY ItemID);

6、相關(guān)子查詢

SELECT PatentID, Title

FROM Patents p

WHERE FileDate <= ALL

(SELECT FileDate

FROM Patents

WHERE Region = p.Region);

7、多重嵌套

SELECT ItemID, ItemName

FROM Items i

WHERE NOT EXISTS

(SELECT *

FROM Orders

WHERE CustomerID IN

(SELECT CustomerID

FROM Customers

WHERE Country = ‘Belgium’)

AND Quantity < 1000

AND ItemID = i.ItemID)

OR ItemID != ALL

(SELECT ItemID

FROM Orders

WHERE CustomerID IN

(SELECT CustomerID

FROM Customers

WHERE Country = ‘Germany’)

AND Quantity < 2000);

8、衍生表

SELECT m, count(*)

FROM

(SELECT max(x) m

FROM a1

GROUP BY name) AS t

GROUP BY m

ORDER BY count(*) DESC;

多租戶

創(chuàng)建多租戶表:

CREATE TABLE base.event (tenant_id VARCHAR, event_type CHAR(1), created_date DATE, event_id BIGINT)

MULTI_TENANT=true;

連接到某租戶的數(shù)據(jù)庫表:

Properties props = new Properties();

props.setProperty("TenantId", "Acme");

Connection conn = DriverManager.getConnection("localhost", props);

在特定租戶連接的情況下,以下語句只創(chuàng)建特定租戶的視圖

CREATE VIEW acme.event AS

SELECT * FROM base.event;

CREATE VIEW acme.login_event AS

SELECT * FROM base.event

WHERE event_type='L';

Array類型

creating a table:

CREATE TABLE regions (

region_name VARCHAR PRIMARY KEY,

zips VARCHAR ARRAY[10],

CONSTRAINT pk PRIMARY KEY (region_name));

Insert

UPSERT INTO regions(region_name,zips)

VALUES('SF Bay Area',ARRAY['94115','94030','94125']);

select:

SELECT zip[1] FROM regions WHERE region_name = 'SF Bay Area';

SELECT region_name FROM regions WHERE zip[1] = '94030' OR zip[2] = '94030' OR zip[3] = '94030';

SELECT region_name FROM regions WHERE zip[1] = '94030' OR zip[2] = '94030' OR zip[3] = '94030';

分頁查詢

組合使用order by, > , LIMIT :

SELECT title, author, isbn, description

FROM library

WHERE published_date > 2010

AND (title, author, isbn) > (?, ?, ?)

ORDER BY title, author, isbn

LIMIT 20

Skip Scan

SELECT * from T

WHERE ((KEY1 >='a' AND KEY1 <= 'b') OR (KEY1 > 'c' AND KEY1 <= 'e'))

AND KEY2 IN (1, 2)

The List<List<KeyRange>> for SkipScanFilter for the above query would be [ [ [ a - b ], [ d - e ] ], [ 1, 2 ] ] where [ [ a - b ], [ d - e ] ] is the range for KEY1and [ 1, 2 ] keys for KEY2.

跟蹤Tracing

只支持Hadoop2

配置 hadoop-metrics2-phoenix.properties

Sample from all the sources every 10 seconds

*.period=10

Write Traces to Phoenix

##########################

ensure that we receive traces on the server

phoenix.sink.tracing.class=org.apache.phoenix.trace.PhoenixMetricsSink

Tell the sink where to write the metrics

phoenix.sink.tracing.writer-class=org.apache.phoenix.trace.PhoenixTableMetricsWriter

Only handle traces with a context of "tracing"

phoenix.sink.tracing.context=tracing

配置 hadoop-metrics2-hbase.properties

ensure that we receive traces on the server

hbase.sink.tracing.class=org.apache.phoenix.trace.PhoenixMetricsSink

Tell the sink where to write the metrics

hbase.sink.tracing.writer-class=org.apache.phoenix.trace.PhoenixTableMetricsWriter

Only handle traces with a context of "tracing"

hbase.sink.tracing.context=tracing

配置 hbase-site.xml

<configuration>

<property>

<name>phoenix.trace.frequency</name>

<value>always</value>

</property>

</configuration>

<property>

<name>phoenix.trace.statsTableName</name>

<value><your custom tracing table name></value>

</property>

The tracing table is initialized via the ddl:

CREATE TABLE SYSTEM.TRACING_STATS (

trace_id BIGINT NOT NULL,

parent_id BIGINT NOT NULL,

span_id BIGINT NOT NULL,

description VARCHAR,

start_time BIGINT,

end_time BIGINT,

hostname VARCHAR,

tags.count SMALLINT,

annotations.count SMALLINT,

CONSTRAINT pk PRIMARY KEY (trace_id, parent_id, span_id)

統(tǒng)計收集

統(tǒng)計收集有助于提升query性能。

命令:

UPDATE STATISTICS my_table

等效于

UPDATE STATISTICS my_table ALL

如果只收集index或者column

UPDATE STATISTICS my_table INDEX

UPDATE STATISTICS my_table COLUMNS

參數(shù)配置

phoenix.stats.guidepost.width 默認(rèn)104857600 (100 MB)

phoenix.stats.guidepost.per.region

phoenix.stats.updateFrequency 默認(rèn)900000 (15 mins)

phoenix.stats.minUpdateFrequency 默認(rèn)7.5 mins

phoenix.stats.useCurrentTime 默認(rèn)true

mutable 和 immutable 表區(qū)別

分別創(chuàng)建表:

create table** my_mutable** (id varchar not null primary key, cf1.a varchar , cf1.b varchar, cf2.c varchar, cf2.d varchar) ;

create table my_immutable (id varchar not null primary key, cf1.a varchar , cf1.b varchar, cf2.c varchar, cf2.d varchar) immutable_rows=true ;

分別創(chuàng)建索引:

create index index_my_mutable on** my_mutable**(a,c) include (b,d);

create index** index_my_immutable** on my_immutable(a,c) include (b,d);

分別插入數(shù)據(jù)

upsert into my_mutable values ('1000001','a1','b1','c1','d1');
upsert into my_mutable values ('1000001','a2','b2','c2','d2');
upsert into my_mutable values ('1000001','a3','b3','c3','d3');

upsert into my_immutable values ('1000001','a1','b1','c1','d1');
upsert into my_immutable values ('1000001','a2','b2','c2','d2');
upsert into my_immutable values ('1000001','a3','b3','c3','d3');

查看數(shù)據(jù):

select * from my_mutable ;

| ID | A | B | C | D |

| 1000001 | a3 | b3 | c3 | d3 |

select * from my_immutable ;

| ID | A | B | C | D |

| 1000001 | a1 | b1 | c1 | d1 |
| 1000001 | a2 | b2 | c2 | d2 |
| 1000001 | a3 | b3 | c3 | d3 |


select * from** index_my_mutable** ;

| CF1:A | CF2:C | :ID | CF1:B | CF2:D |

| a3 | c3 | 1000001 | b3 | d3 |

select * from** index_my_immutable** ;

| CF1:A | CF2:C | :ID | CF1:B | CF2:D |

| a1 | c1 | 1000001 | b1 | d1 |
| a2 | c2 | 1000001 | b2 | d2 |
| a3 | c3 | 1000001 | b3 | d3 |


Global 和 Local 索引。

1****、創(chuàng)建表:

create table **immutable_local **(id varchar not null primary key, cf1.a varchar, cf1.b varchar, cf2.c varchar, cf2.d varchar ) immutable_rows=true;

create table **immutable_global **(id varchar not null primary key, cf1.a varchar, cf1.b varchar, cf2.c varchar, cf2.d varchar ) immutable_rows=true;

create table **mutable_local **(id varchar not null primary key, cf1.a varchar, cf1.b varchar, cf2.c varchar, cf2.d varchar ) immutable_rows=false;

create table **mutable_global **(id varchar not null primary key, cf1.a varchar, cf1.b varchar, cf2.c varchar, cf2.d varchar ) immutable_rows=false;

2****、創(chuàng)建索引:

create index index_mutable_global on mutable_global(a,c) include(b); 成功

create **local **index index_mutable_local on mutable_local(a,c) include(b); 成功

create index** index_immutable_global** on immutable_global(a,c) include(b); 成功

create local index index_immutable_local on immutable_local(a,c) include(b); 失敗,immutable 表無法創(chuàng)建Local . Error: ERROR 1048 (43A04): Local indexes aren't allowed on tables with immutable rows. tableName=INDEX_IMMUTABLE_LOCAL (state=43A04,code=1048)

3.****插入數(shù)據(jù)

upsert into mutable_global values ('100001','a1','b1','c1','d1');

upsert into mutable_global values ('100002','a2','b2','c2','d2');

upsert into mutable_local values ('100001','a1','b1','c1','d1');

upsert into mutable_local values ('100002','a2','b2','c2','d2');

upsert into immutable_global values ('100001','a1','b1','c1','d1');

upsert into immutable_global values ('100002','a2','b2','c2','d2');

4****、測試檢索

檢索中包含了列d, 此列不包含在索引中。

a****、****immutable 表使用的時全表掃描,沒有使用索引

explain select a,b,c,d from immutable_global where a='a1';

CLIENT PARALLEL 1-WAY **FULL SCAN **OVER IMMUTABLE_GLOBAL
SERVER FILTER BY CF1.A = 'a1'

b、mutable 表使用的時全表掃描,沒有使用****Global****索引

explain select a,b,c,d from mutable_global where a='a1';
CLIENT PARALLEL 1-WAY** FULL SCAN** OVER MUTABLE_GLOBAL
SERVER FILTER BY CF1.A = 'a1'

c、mutable 表使用了 LOCAL 索引

explain select a,b,c,d from mutable_local where a='a1';

CLIENT PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a1'] |
CLIENT MERGE SORT

5****、****Local 索引細(xì)節(jié)

索引定義:create local index index_mutable_local on mutable_local(a,c) include(b);

索引內(nèi)容:

a****、使用部分索引:索引組合的第一個

explain select a,b,c,d from mutable_local where a='a1' ;

CLIENT PARALLEL 1-WAY **RANGE SCAN **OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a1']
CLIENT MERGE SORT

b****、使用部分索引:索引組合的第二個

explain select a,b,c,d from mutable_local where c='c1' ;

CLIENT PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768]
**SERVER FILTER BY **C = 'c1'
CLIENT MERGE SORT

c****、使用部分索引:****include****的部分

explain select a,b,c,d from mutable_local where b='b1' ;
CLIENT PARALLEL 1-WAY** RANGE SCAN** OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768]
SERVER FILTER BY CF1.B = 'b1'
CLIENT MERGE SORT

d****、使用全部索引

explain select a,b,c,d from mutable_local where a='a1' and c='c1' ;
CLIENT PARALLEL 1-WAY **RANGE SCAN **OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a1','c1']
CLIENT MERGE SORT

調(diào)換a和c的位置,phoenix會自動優(yōu)化。

explain select a,b,c,d from mutable_local where c='c1' and a='a1' ;
CLIENT PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a1','c1']
CLIENT MERGE SORT

e****、使用索引中全部字段

explain select a,b,c,d from mutable_local where c='c1' and b='b1' and a='a1' ;
CLIENT PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a1','c1']
SERVER FILTER BY CF1.B = 'b1'
CLIENT MERGE SORT

f****、使用了不存在于索引中的字段。

explain select a,b,c,d from mutable_local where a='a1' and d='d1';

CLIENT PARALLEL 1-WAY** FULL SCAN** OVER MUTABLE_LOCAL
SERVER FILTER BY (CF1.A = 'a1' AND CF2.D = 'd1')

g. 在****Select ****中不要使用*********,*********會導(dǎo)致全表掃描

explain select **a,b,c,d **from MUTABLE_LOCAL where a='a';
CLIENT PARALLEL 1-WAY RANGE SCAN OVER _LOCAL_IDX_MUTABLE_LOCAL [-32768,'a']
CLIENT MERGE SORT

explain select * from MUTABLE_LOCAL where a='a';
** CLIENT PARALLEL 1-WAY FULL SCAN OVER MUTABLE_LOCAL
SERVER FILTER BY CF1.A = 'a' **

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容