ClickHouase讀寫HDFS

ClickHouase讀寫HDFS

1. 從ClickHouse導(dǎo)數(shù)據(jù)到HDFS

1.1 準備測試數(shù)據(jù)

先在ClickHouse建本地表,并造測試數(shù)據(jù)備用

-- 建表
drop table if exists account_info_src;
create table account_info_src(
  account_id   UInt64     comment '',
  user_id      String     comment '',
  open_id      String     comment '',
  device_id    String     comment '',
  client_id    String     comment '',
  car_number   String     comment '',
  tel          String     comment '',
  process_date UInt32     DEFAULT toYYYYMMDD(today())  comment '',
  create_time  DateTime DEFAULT now() ,
  update_time  DateTime DEFAULT now()
)
ENGINE=MergeTree()
order by account_id
PARTITION BY process_date
;

-- 給表里增加數(shù)據(jù)
insert into account_info_src(account_id, user_id, open_id, device_id, client_id, car_number, tel, process_date)
select number as account_id,
transform(number % 8, [0, 1, 4], ['', 'unknown', 'UNKNOWN'], concat('userid_',   toString(number))  ) AS user_id,
transform(number % 7, [1, 5], ['', 'unknown'], concat('openid_',   toString(number))  ) AS open_id,
transform(number % 6, [2, 6], ['', 'unknown'], concat('deviceid_',   toString(number))  ) AS device_id,
transform(number % 6, [2, 6], ['', 'unknown'], concat('clientid_',   toString(number))  ) AS client_id,
transform(number % 9, [5, 3], ['', 'unknown'], concat('car_number_',   toString(number))  ) AS car_number,
transform(number % 10, [1, 4], ['', 'unknown'], concat('tel_',   toString(number))  ) AS tel
,toYYYYMMDD(addDays(today(), 0))
from numbers(100)
;

1.2 建ClickHouase的外部引擎表

先在HDFS創(chuàng)建數(shù)據(jù)的目錄

hdfs dfs -mkdir -p /bdp/tmp/clickhouse/account

在CK創(chuàng)建外部引擎表


drop table if exists account_info_hdfs;
create table account_info_hdfs(
  account_id   UInt64     comment '',
  user_id      String     comment '',
  open_id      String     comment '',
  device_id    String     comment '',
  client_id    String     comment '',
  car_number   String     comment '',
  tel          String     comment '',
  process_date UInt32     DEFAULT toYYYYMMDD(today())  comment '',
  create_time  DateTime DEFAULT now() ,
  update_time  DateTime DEFAULT now()
)
engine=HDFS('hdfs://node1:8020/bdp/tmp/clickhouse/account/info.Parquet','Parquet')
;

-- 注意:從ck導(dǎo)出hive中時會自己生成info.Parquet這個文件,執(zhí)行第二次時會提示文件已經(jīng)存在

1.3. 導(dǎo)數(shù)據(jù)

將數(shù)據(jù)從account_info_src(存儲在CK中) 導(dǎo)到 account_info_hdfs(存儲在HDFS中)

insert into account_info_hdfs
select * from account_info_src
;

這樣就成功將ClickHouse的數(shù)據(jù)導(dǎo)出到HDFS中了。

1.4 在Hive表中查看數(shù)據(jù)

drop table if exists account_info_hive;
create external table if not exists account_info_hive (
 account_id   BIGINT     comment '',
  user_id      String     comment '',
  open_id      String     comment '',
  device_id    String     comment '',
  client_id    String     comment '',
  car_number   String     comment '',
  tel          String     comment '',
  process_date BIGINT     ,
  create_time  BIGINT ,
  update_time  BIGINT 
)
stored as parquet
location '/bdp/tmp/clickhouse/account';


drop table if exists account_info_hive2;
create external table if not exists account_info_hive2 (
 account_id   BIGINT     comment '',
  user_id      String     comment '',
  open_id      String     comment '',
  device_id    String     comment '',
  client_id    String     comment '',
  car_number   String     comment '',
  tel          String     comment ''
)
stored as parquet
location '/bdp/tmp/clickhouse/account2';

現(xiàn)在就可以在HIVE中查看數(shù)據(jù)了。

注意事項:

  • DateTime數(shù)據(jù)類型轉(zhuǎn)到HIVE后只能用Bigint,開始時我使用timestamp 時會報錯

  • 有可能會提示權(quán)限問題

  • sudo -u hdfs hdfs dfs -chown -R clickhouse:clickhouse /bdp/tmp/clickhouse/account

2.導(dǎo)HDFS數(shù)據(jù)到ClickHouse

2.1 數(shù)據(jù)準備

直接用剛剛從CK導(dǎo)出的數(shù)據(jù)

2.2 建ClickHouse表

drop table if exists account_info_hdfs2;
create table account_info_hdfs2(
  account_id   UInt64     comment '',
  user_id      String     comment '',
  open_id      String     comment '',
  device_id    String     comment '',
  client_id    String     comment '',
  car_number   String     comment '',
  tel          String     comment '',
  process_date UInt32     DEFAULT toYYYYMMDD(today())  comment '',
  create_time  DateTime DEFAULT now() ,
  update_time  DateTime DEFAULT now()
)
engine=HDFS('hdfs://node1:8020/bdp/tmp/clickhouse/account/*.Parquet','Parquet')
;

-- 注意:*.Parquet表示讀取多個文件

這樣就可以了。

為了驗證數(shù)據(jù),在HDFS中增加一個文件。

hdfs dfs -cp /bdp/tmp/clickhouse/account/info.Parquet /bdp/tmp/clickhouse/account/a.Parquet

在ClickHouse中查看發(fā)現(xiàn)數(shù)據(jù)多了一倍

結(jié)束

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容