配置
-
hive-site.xml配置參數(shù)(CDH的配置參考如下圖)
hive.support.concurrency=true
hive.enforce.bucketing=true
hive.exec.dynamic.partition.mode=nonstrict
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on=true
hive.compactor.worker.threads=1
hive.in.test=true -
數(shù)據(jù)分桶并制定存儲(chǔ)格式這個(gè)必須指定,要不會(huì)報(bào)錯(cuò)
2.1 因?yàn)閘oad導(dǎo)入數(shù)據(jù)的時(shí)候,只能分區(qū)不能分桶,所以先先將數(shù)據(jù)導(dǎo)入一個(gè)沒有分桶的臨時(shí)表,然后insert into有分桶的表中。create table test_tmp( id string, name string )row format delimited fields terminated by '\t'; create table test(key string, id string, name string, device_id string)clustered by (name) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');
ORC壓縮率和查詢
-
壓縮率
image.png 查詢和更新效率
更新效率沒有明明顯變化,查詢count,1.2億數(shù)據(jù)3分鐘,orc存儲(chǔ)的5分鐘,具體的mapper和reducer的個(gè)數(shù)不同。

