題記
本文資料來自于拉鉤教育大數(shù)據(jù)高薪訓練營
數(shù)據(jù)庫操作
- Hive有一個默認的數(shù)據(jù)庫default,在操作HQL時,如果不明確的指定要使用哪個庫,則使用默認數(shù)據(jù)庫;
- Hive的數(shù)據(jù)庫名、表名均不區(qū)分大小寫;
- 創(chuàng)建數(shù)據(jù)庫語法
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[MANAGEDLOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
# example
hive (default)> create database if not exists mydb;
- 查看數(shù)據(jù)庫
# 查看有哪些數(shù)據(jù)庫
hive (default)> show databases;
# 查看數(shù)據(jù)庫的具體信息
hive (default)> desc database mydb;
# 查看更加詳細的信息
hive (default)> desc database extended mydb;
- 使用數(shù)據(jù)庫
hive (default)> use mydb;
- 刪除數(shù)據(jù)庫
# 刪除一個空數(shù)據(jù)庫
hive (default)> drop database databasename;
# 如果數(shù)據(jù)庫不為空,使用 cascade 強制刪除
hive (default)> drop database databasename cascade;
建表語法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS]
[db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
- create table table_name: 按照給定的名稱創(chuàng)建表,若表已經(jīng)在當前數(shù)據(jù)庫存在,會拋出異常,可以通過使用 [if not exists] 來避免異常。
- EXTERNAL關(guān)鍵字。創(chuàng)建外部表,否則創(chuàng)建的是內(nèi)部表(管理表)。
刪除內(nèi)部表時,數(shù)據(jù)和表的定義同時被刪除;
刪除外部表時,僅僅刪除了表的定義,數(shù)據(jù)保留;
在生產(chǎn)環(huán)境中,多使用外部表; - (colName colType) : 標識表字段和類型,其后可以使用comment來對當前字段進行注釋。
- comment也可以對表進行注釋。
- partition by (colName colType,...) : 對表中的數(shù)據(jù)進行分區(qū),指定分區(qū)字段(注意:分區(qū)字段不能和表中的字段相同)。
- clustered by (colName, colName,...) : 創(chuàng)建分桶表,指定分桶字段。
- sorted by :對桶中的一個或多個列排序,較少使用
- 存儲子句:指定字段等數(shù)據(jù)結(jié)構(gòu)存儲數(shù)據(jù)的分隔符
ROW FORMAT DELIMITED
[FIELDS TERMINATED BY char]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char] | SERDE serde_name # serde 指定具體列的序列化
[WITH SERDEPROPERTIES (property_name=property_value,
property_name=property_value, ...)]
- stored as file_format : 數(shù)據(jù)的存儲方式, SEQUENCEFILE|TEXTFILE|RCFILE ,默認是
TEXTFILE,如果數(shù)據(jù)需要壓縮,可以選擇SEQUENCEFILE。 - location hdfs_path:指定數(shù)據(jù)在hdfs上存儲的位置。
- TBLPROPERTIES。定義表的屬性
- AS。后面可以接查詢語句,表示根據(jù)后面的查詢結(jié)果創(chuàng)建表
- LIKE。like 表名,允許用戶復(fù)制現(xiàn)有的表結(jié)構(gòu),但是不復(fù)制數(shù)據(jù)
example
CREATE TABLE page_view(
viewTime INT,
userid BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS SEQUENCEFILE;
外部表和內(nèi)部表
在創(chuàng)建表的時候,可指定表的類型。表有兩種類型,分別是內(nèi)部表(管理表)、外部
表。
- 默認情況下,創(chuàng)建內(nèi)部表。如果要創(chuàng)建外部表,需要使用關(guān)鍵字 external
- 在刪除內(nèi)部表時,表的定義(元數(shù)據(jù)) 和 數(shù)據(jù) 同時被刪除
- 在刪除外部表時,僅刪除表的定義,數(shù)據(jù)被保留
- 在生產(chǎn)環(huán)境中,多使用外部表
# 創(chuàng)建內(nèi)部表,不指定關(guān)鍵字external,創(chuàng)建的是內(nèi)部表
create table table_name(...);
# 創(chuàng)建外部表,指定關(guān)鍵字external
create extarnal table table_name(...);
# 內(nèi)外部表相互轉(zhuǎn)換
alter table t1 set tblproperties('EXTERNAL'='TRUE'); # 內(nèi)部表轉(zhuǎn)外部表
alter table t1 set tblproperties('EXTERNAL'='FALSE'); # 外部表轉(zhuǎn)內(nèi)部表
# 查詢表信息
desc formatted t1;
分區(qū)表
- 創(chuàng)建分區(qū)表并加載數(shù)據(jù)
# 數(shù)據(jù)
2;zhangsan;book,TV,code;beijing:chaoyang,shagnhai:pudong
3;lishi;book,code;nanjing:jiangning,taiwan:taibei
4;wangwu;music,book;heilongjiang:haerbin
# 建表
create table if not exists t1(
id int,
name string,
hobby array<string>,
addr map<string,string>
)
partitioned by (dt string COMMENT 'for date') # 建立分區(qū)
row format delimited
fields terminated by ';'
collection items terminated by ','
map keys terminated by ':';
# 導入數(shù)據(jù)到指定分區(qū)
load data local inpath '/root/data/hive_data/t1.dat' into table t1 partition(dt="20200102");
# 查看分區(qū)
show partitions t1;
- 新增分區(qū)并設(shè)置數(shù)據(jù)
# 新增分區(qū)
alter table t1 add partition(dt='20200103') ...;
# 設(shè)置數(shù)據(jù)
load data local inpath '/root/data/hive_data/t1.dat' into table t1 partition(dt='20200103');
dfs -cp /user/hive/warehouse/mydb.db/t1/dt=20200101/t1.dat /user/hive/warehouse/mydb.db/t1/dt=20200103; # 僅做測試使用
# 修改分區(qū)數(shù)據(jù)文件在HDFS上的位置
alter table t3 partition(dt='2020-06-01') set location '/user/hive/warehouse/t3/dt=2020-06-03';
- 刪除分區(qū)
alter table t1 drop partition(dt=20200104);
分桶表
當單個的分區(qū)或者表的數(shù)據(jù)量過大,分區(qū)不能更細粒度的劃分數(shù)據(jù),就需要使用分桶技術(shù)將數(shù)據(jù)劃分成更細的粒度。將數(shù)據(jù)按照指定的字段進行分成多個桶中去,即將數(shù)據(jù)按照字段進行劃分,數(shù)據(jù)按照字段劃分到多個文件當中去。分桶的原理:
- MR中:key.hashCode % reductTask
- Hive中:分桶字段.hashCode % 分桶個數(shù)
創(chuàng)建分桶表且加載數(shù)據(jù)
# 創(chuàng)建分桶表
hive (mydb)> create table if not exists course(
> id int,
> name string,
> score int
> )
> clustered by (id) into 3 buckets
> row format delimited
> fields terminated by '\t';
# 加載數(shù)據(jù)
hive (mydb)> load data local inpath '/root/data/hive_data/course.dat' into table course;
FAILED: SemanticException Please load into an intermediate table and use 'insert... select' to allow Hive to enforce bucketing. Load into bucketed tables are disabled for safety reasons. If you know what you are doing, please sethive.strict.checks.bucketing to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
# 說明不能使用給普通表加載數(shù)據(jù)的方式給分桶表加載數(shù)據(jù),方法如下:
# 先創(chuàng)建結(jié)構(gòu)相同的普通表
hive (mydb)> create table if not exists course_common(
> id int,
> name string,
> score int
> )
> row format delimited
> fields terminated by '\t';
# 給普通表加載數(shù)據(jù)
hive (mydb)> load data local inpath '/root/data/hive_data/course.dat' into table course_common;
# 使用insert ... into ...select
hive (mydb)> insert into table course select * from course_common;
修改表&刪除表
# 修改表名
hive (mydb)> alter table course_common rename to course_common1;
# 修改列名
hive (mydb)> alter table course_common change column id cid int;
# 修改字段類型
hive (mydb)> alter table course_common change column cid cid string;
# 修改字段類型時,需要注意類型不能溢出的細節(jié),如下將string轉(zhuǎn)為int
hive (mydb)> alter table course_common change column cid cid int;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
cid
# 增加字段
hive (mydb)> alter table course_common add columns (height float);
# 刪除字段
# 注意是把需要留下的字段寫在columns中
hive (mydb)> alter table course_common replace columns (cid string, name string, score int);
# 刪除表
hive (mydb)> drop table course_common;
數(shù)據(jù)導入
基本語法
LOAD DATA [LOCAL] INPATH 'filepath'
[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1,
partcol2=val2 ...)]
- LOCAL:
- LOAD DATA LOCAL ... 從本地文件系統(tǒng)加載數(shù)據(jù)到Hive表中。本地文件會拷
貝到Hive表指定的位置 - LOAD DATA ... 從HDFS加載數(shù)據(jù)到Hive表中。HDFS文件移動到Hive表指定
的位置
- LOAD DATA LOCAL ... 從本地文件系統(tǒng)加載數(shù)據(jù)到Hive表中。本地文件會拷
- INPATH:加載數(shù)據(jù)的路徑
- OVERWRITE:覆蓋表中已有數(shù)據(jù);否則表示追加數(shù)據(jù)
- PARTITION:將數(shù)據(jù)加載到指定的分區(qū)
example
# 創(chuàng)建表
hive (mydb)> create table tabA(
> id int,
> name string,
> area string
> )
> row format delimited
> fields terminated by ",";
# 從本地導入數(shù)據(jù)
hive (mydb)> load data local inpath '/root/data/hive_data/sourceA.txt' into table tabA;
# 從hdfs導入數(shù)據(jù)
hive (mydb)> load data inpath '/test/data/sourceA.txt' into table tabA;
# 加載數(shù)據(jù)覆蓋表中數(shù)據(jù)
hive (mydb)> load data inpath '/test/data/sourceA.txt' overwrite into table tabA;
# 通過Location加載數(shù)據(jù)
hive (mydb)> create table tabA(
> id int,
> name string,
> area string
> )
> row format delimited
> fields terminated by ","
> location '/test/data/sourceA.txt';
插入數(shù)據(jù)
# 創(chuàng)建表
hive (mydb)> create table tabC(
> id int,
> name string,
> area string
> )
> partitioned by (month string)
> row format delimited
> fields terminated by ",";
# 插入數(shù)據(jù)
hive (mydb)> insert into table tabC partition(month='202001') values (1, 'wangwu', 'gz'), (2, 'lisi', 'sh');
# 插入查詢的數(shù)據(jù)
hive (mydb)> insert into table tabC partition(month='202002')
> select id,name,area from tabC where month='202001';
# 多表多分區(qū)查詢
from tabC
insert overwrite table tabC partition(month='202003')
select id, name, area where month='202002'
insert overwrite table tabC partition(month='202004')
select id, name, area where month='202002';
# 創(chuàng)建并插入數(shù)據(jù)(只是復(fù)制數(shù)據(jù),創(chuàng)建的是普通表,不會復(fù)制分區(qū)表結(jié)構(gòu))
create table if not exists tabD as select * from tabC;
# import 導入數(shù)據(jù)
import table student2 partition(month='201709')
from '/user/hive/warehouse/export/student';
數(shù)據(jù)導出
# 數(shù)據(jù)導出本地覆蓋文件夾下的所有數(shù)據(jù)
hive (mydb)> insert overwrite local directory '/root/data/hive_data' select *from tabC;
# 數(shù)據(jù)導出具有格式的數(shù)據(jù)
hive (mydb)> insert overwrite local directory '/root/data/hive_data'
row format delimited fields terminated by ' '
select *from tabC;
# 數(shù)據(jù)導出到Hdfs
hive (mydb)> insert overwrite directory '/root/data/hive_data' select *from tabC;
# export 導出數(shù)據(jù)到HDFS。使用export導出數(shù)據(jù)時,不僅有數(shù)還有表的元數(shù)據(jù)信息
hive (mydb)> export table tabC to '/user/hadoop/data/tabC4';
# export 導出的數(shù)據(jù),可以使用 import 命令導入到 Hive 表中
hive (mydb)> import table tabE from ''/user/hadoop/data/tabC4';
# 使用like tname創(chuàng)建的表結(jié)構(gòu)與原表一致。create ... as select ... 結(jié)構(gòu)可能不一致
hive (mydb)> create table tabE like tabc;
刪除表數(shù)據(jù)
# 截斷表,清空數(shù)據(jù)。(注意:僅能操作內(nèi)部表)
truncate table tabE;
# 以下語句報錯,外部表不能執(zhí)行 truncate 操作
alter table tabC set tblproperties("EXTERNAL"="TRUE");
truncate table tabC;