ClickHouase讀寫HDFS
1. 從ClickHouse導(dǎo)數(shù)據(jù)到HDFS
1.1 準備測試數(shù)據(jù)
先在ClickHouse建本地表,并造測試數(shù)據(jù)備用
-- 建表
drop table if exists account_info_src;
create table account_info_src(
account_id UInt64 comment '',
user_id String comment '',
open_id String comment '',
device_id String comment '',
client_id String comment '',
car_number String comment '',
tel String comment '',
process_date UInt32 DEFAULT toYYYYMMDD(today()) comment '',
create_time DateTime DEFAULT now() ,
update_time DateTime DEFAULT now()
)
ENGINE=MergeTree()
order by account_id
PARTITION BY process_date
;
-- 給表里增加數(shù)據(jù)
insert into account_info_src(account_id, user_id, open_id, device_id, client_id, car_number, tel, process_date)
select number as account_id,
transform(number % 8, [0, 1, 4], ['', 'unknown', 'UNKNOWN'], concat('userid_', toString(number)) ) AS user_id,
transform(number % 7, [1, 5], ['', 'unknown'], concat('openid_', toString(number)) ) AS open_id,
transform(number % 6, [2, 6], ['', 'unknown'], concat('deviceid_', toString(number)) ) AS device_id,
transform(number % 6, [2, 6], ['', 'unknown'], concat('clientid_', toString(number)) ) AS client_id,
transform(number % 9, [5, 3], ['', 'unknown'], concat('car_number_', toString(number)) ) AS car_number,
transform(number % 10, [1, 4], ['', 'unknown'], concat('tel_', toString(number)) ) AS tel
,toYYYYMMDD(addDays(today(), 0))
from numbers(100)
;
1.2 建ClickHouase的外部引擎表
先在HDFS創(chuàng)建數(shù)據(jù)的目錄
hdfs dfs -mkdir -p /bdp/tmp/clickhouse/account
在CK創(chuàng)建外部引擎表
drop table if exists account_info_hdfs;
create table account_info_hdfs(
account_id UInt64 comment '',
user_id String comment '',
open_id String comment '',
device_id String comment '',
client_id String comment '',
car_number String comment '',
tel String comment '',
process_date UInt32 DEFAULT toYYYYMMDD(today()) comment '',
create_time DateTime DEFAULT now() ,
update_time DateTime DEFAULT now()
)
engine=HDFS('hdfs://node1:8020/bdp/tmp/clickhouse/account/info.Parquet','Parquet')
;
-- 注意:從ck導(dǎo)出hive中時會自己生成info.Parquet這個文件,執(zhí)行第二次時會提示文件已經(jīng)存在
1.3. 導(dǎo)數(shù)據(jù)
將數(shù)據(jù)從account_info_src(存儲在CK中) 導(dǎo)到 account_info_hdfs(存儲在HDFS中)
insert into account_info_hdfs
select * from account_info_src
;
這樣就成功將ClickHouse的數(shù)據(jù)導(dǎo)出到HDFS中了。
1.4 在Hive表中查看數(shù)據(jù)
drop table if exists account_info_hive;
create external table if not exists account_info_hive (
account_id BIGINT comment '',
user_id String comment '',
open_id String comment '',
device_id String comment '',
client_id String comment '',
car_number String comment '',
tel String comment '',
process_date BIGINT ,
create_time BIGINT ,
update_time BIGINT
)
stored as parquet
location '/bdp/tmp/clickhouse/account';
drop table if exists account_info_hive2;
create external table if not exists account_info_hive2 (
account_id BIGINT comment '',
user_id String comment '',
open_id String comment '',
device_id String comment '',
client_id String comment '',
car_number String comment '',
tel String comment ''
)
stored as parquet
location '/bdp/tmp/clickhouse/account2';
現(xiàn)在就可以在HIVE中查看數(shù)據(jù)了。
注意事項:
DateTime數(shù)據(jù)類型轉(zhuǎn)到HIVE后只能用Bigint,開始時我使用timestamp 時會報錯
有可能會提示權(quán)限問題
sudo -u hdfs hdfs dfs -chown -R clickhouse:clickhouse /bdp/tmp/clickhouse/account
2.導(dǎo)HDFS數(shù)據(jù)到ClickHouse
2.1 數(shù)據(jù)準備
直接用剛剛從CK導(dǎo)出的數(shù)據(jù)
2.2 建ClickHouse表
drop table if exists account_info_hdfs2;
create table account_info_hdfs2(
account_id UInt64 comment '',
user_id String comment '',
open_id String comment '',
device_id String comment '',
client_id String comment '',
car_number String comment '',
tel String comment '',
process_date UInt32 DEFAULT toYYYYMMDD(today()) comment '',
create_time DateTime DEFAULT now() ,
update_time DateTime DEFAULT now()
)
engine=HDFS('hdfs://node1:8020/bdp/tmp/clickhouse/account/*.Parquet','Parquet')
;
-- 注意:*.Parquet表示讀取多個文件
這樣就可以了。
為了驗證數(shù)據(jù),在HDFS中增加一個文件。
hdfs dfs -cp /bdp/tmp/clickhouse/account/info.Parquet /bdp/tmp/clickhouse/account/a.Parquet
在ClickHouse中查看發(fā)現(xiàn)數(shù)據(jù)多了一倍
結(jié)束