国产人妖另类久久,九九九精品视频网,飘花秋霞久久

分桶：把上傳數(shù)據(jù)，分為不同的文件
    將同一個目錄下的數(shù)據(jù)文件，拆分成多個
    同一目錄多個文件
    加快表連接的速度（join）
應(yīng)用場景：數(shù)據(jù)抽樣（sampling）、map-join

其他情況不建議分桶，小文件很恐怖！
資源調(diào)度和分配————最消耗時間

Hive分桶

分桶表是對列值取哈希值的方式，將不同數(shù)據(jù)放到不同文件中存儲。
對于hive中每一個表、分區(qū)都可以進一步進行分桶。
由列的哈希值/桶的個數(shù)來決定每條數(shù)據(jù)劃分在哪個桶中。

開啟支持分桶

set hive.enforce.bucketing=true;
默認：false；設(shè)置為true之后，mr運行時會根據(jù)bucket的個數(shù)自動分配reduce task個數(shù)。（用戶也可以通過mapred.reduce.tasks自己設(shè)置reduce任務(wù)個數(shù)，但分桶時不推薦使用）
注意：一次作業(yè)產(chǎn)生的桶（文件數(shù)量）和reduce task個數(shù)一致。

往分桶表中加載數(shù)據(jù)

insert into table bucket_table select columns from tbl;
insert overwrite table bucket_table select columns from tbl;

桶表抽樣查詢

hive> select * from bucket_table tablesample(bucket 1 out of 4 on columns);

TABLESAMPLE語法：
    TABLESAMPLE(BUCKET x OUT OF y)
    x：表示從哪個桶開始，抽取數(shù)據(jù)
    y：必須為該表總bucket桶的倍數(shù)或因子

當表總bucket數(shù)為32時
（1）TABLESAMPLE(BUCKET 3 OUT OF 8)，抽取哪些數(shù)據(jù)？
    32 / 89 = 4抽4個桶
    從3開始，11，19，27 這4個桶的數(shù)據(jù)
（2）TABLESAMPLE(BUCKET 3 OUT OF 256)，抽取哪些數(shù)據(jù)？
    抽取 32 / 256 = 1/8 ,從第3個桶取1/8的數(shù)據(jù)

不管是倍數(shù)還是因子，桶的個數(shù) / y = 需要的值

建原始表

hive> create table psn31(id int, name string, age int) 
row format delimited fields terminated by ',';

hive> load data local inpath '/root/psn31.data' into table psn21;

創(chuàng)建分桶表

給定指定列

hive> create table psnbucket(id int, name string, age int)
clustered by (age) into 4 buckets
row format delimited fields terminated by ',';

加載數(shù)據(jù)：
hive> insert into table psnbucket select id, name, age from psn31;
執(zhí)行map-reduce

抽樣
hive> select id, name, age from psnbucket tablesample(bucket 2 out of 4 on age);
id     name     age
7      alice     77
3      dog       33

測試數(shù)據(jù)

原始數(shù)據(jù)	分桶順序	分桶號
1,tom,11 2,cat,22 3,dog,33 4,hive,44 5,hbase,55 6,mr,66 7,alice,77 8,scala,88	3 2 1 0 3 2 1 0	8,scala,88 4,hive,44 7,alice,77 3,dog,33 6,mr,66 2,cat,22 5,hbase,55 1,tom,11

分桶數(shù)據(jù)

參考資料

Hadoop集群上搭建Hive
Hive建表并加載數(shù)據(jù)
Hive參數(shù)和動態(tài)分區(qū)

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Hive分桶

Hive分桶

Hive分桶

開啟支持分桶

往分桶表中加載數(shù)據(jù)

桶表抽樣查詢

建原始表

創(chuàng)建分桶表

測試數(shù)據(jù)

參考資料

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Hive分桶

Hive分桶

開啟支持分桶

往分桶表中加載數(shù)據(jù)

桶表抽樣查詢

建原始表

創(chuàng)建分桶表

測試數(shù)據(jù)

參考資料

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av