大色网小色网7,九九99在线,日韩你懂得

1.在Hive中可以使用正則表達(dá)式

set hive.support.quoted.identifiers=None; 
select a.pin, `(pin)?+.+` from Table

2.輸出表數(shù)據(jù)時(shí)，顯示列名

set hive.cli.print.header=true;

3.排序優(yōu)化

order by全局排序，一個(gè)reduce實(shí)現(xiàn)，不能并行故效率偏低；
sort by部分有序，配合distribute by使用；
cluster by col1 == distribute by col1 sort by col1，但不能指定排序規(guī)則；

4.join優(yōu)化

多表join的key值統(tǒng)一則可以歸為一個(gè)reduce；
先過濾后join；
小表在前讀入內(nèi)存，大表在后；
使用left semi join 代替in功能，效率更高；
小表join大表時(shí)數(shù)據(jù)傾斜優(yōu)化：

select t1.a,t1.b from table t1 join table2 t2  on ( t1.a=t2.a)
select /*+ mapjoin(t1)*/ t1.a,t1.b from table t1 join table2 t2  on ( t1.a=t2.a)

5.分區(qū)插入

靜態(tài)插入:需要指定插入的分區(qū)dt，name的值；

insert overwrite table test partition (dt='2018-10-17', name='a') 
select col1, col2 from data_table where dt='2018-10-17' and  name='a';

動(dòng)態(tài)插入：可以自動(dòng)從列中匹配分區(qū)；

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table test partition (dt, name) 
select col1, col2, dt, name from data_table where dt='2018-10-17';

6.抽樣

---   隨機(jī)排序后取前10個(gè)
select * from table1 distribute by rand() sort by rand() limit 10
select * from table1 order by rand() limit 10 --不可分布式，效率低
---   抽樣函數(shù)
select * from table1 tablesample(50 PERCENT)
select * from table1 tablesample(1M)
select * from table1 tablesample(10 ROWS)
---   創(chuàng)建分桶表
create table bucketed_user(id int ,name string) 
clustered by (id)  sorted by (name)  --指定分桶列和排序列
into 4 buckets 
row format delimited fields terminated by '\t' stored as textfile;
---   向分桶表中插入數(shù)據(jù)
set hive.enforce.bucketing=true;（hive2.0好像沒有這個(gè)參數(shù)）
insert overwrite rable bucketed_users select id,name from users ;
select * from bucketed_users tablesample(bucket 1 out of 2 on id);
#從1號(hào)bucket起抽取  2/總bucket個(gè)數(shù)  個(gè)bucket的數(shù)據(jù)

998.與presto的差異
presto具有嚴(yán)格的數(shù)據(jù)類型，在做比較時(shí)兩側(cè)類型必須嚴(yán)格相同，手動(dòng)轉(zhuǎn)換。

cast(a as bigint)>b
cast(6 as double)/4
6*1.0/4
1
2
3
時(shí)間函數(shù)的差異性

dt=cast(current_date+interval '-1' day as varchar)#昨天
dt=date_format(current_timestamp-interval '1' month, '%Y-%m-01')#上個(gè)月1日
dt=date_format(cast(date_format(current_timestamp , '%Y-%m-01') as date) - INTERVAL '1' DAY, '%Y-%m-%d') #上個(gè)月底
1
2
3
999.數(shù)據(jù)拆分行

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

HIVE技巧

HIVE技巧

1.在Hive中可以使用正則表達(dá)式

2.輸出表數(shù)據(jù)時(shí)，顯示列名

3.排序優(yōu)化

4.join優(yōu)化

5.分區(qū)插入

6.抽樣

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

HIVE技巧

1.在Hive中可以使用正則表達(dá)式

2.輸出表數(shù)據(jù)時(shí)，顯示列名

3.排序優(yōu)化

4.join優(yōu)化

5.分區(qū)插入

6.抽樣

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

2.輸出表數(shù)據(jù)時(shí)，顯示列名