最新在线蜜桃av,91av网站入口

練習：
需求描述：現(xiàn)在有一個文件score.csv文件，存放在集群的這個目錄下/scoredatas/month=201806，這個文件每天都會生成，存放到對應的日期文件夾下面去，文件別人也需要公用，不能移動。需求，創(chuàng)建hive對應的表，并將數(shù)據(jù)加載到表中，進行數(shù)據(jù)統(tǒng)計分析，且刪除表之后，數(shù)據(jù)不能刪除

1）外部表
2）分區(qū)表按照month字段進行分區(qū)
3）指定表的存儲位置 location
創(chuàng)建表之后，要進行表的修復，用于識別分區(qū)
msck repair table score4;
4）分桶表

? 按照分桶的字段，不同的數(shù)據(jù)分到不同的文件中去。（相當于hadoop中的分區(qū)）
開啟hive的桶表功能
set hive.enforce.bucketing=true;
設置reduce的個數(shù)
set mapreduce.job.reduces=3;
創(chuàng)建分桶表
create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format delimited fields terminated by '\t';
分桶表不能直接加載數(shù)據(jù)，需要通過間接表來加載數(shù)據(jù)
創(chuàng)建普通表：
create table course_common (c_id string,c_name string,t_id string) row format delimited fields terminated by '\t';
普通表中加載數(shù)據(jù)
load data local inpath '/export/servers/hivedatas/course.csv' into table course_common;
通過insert overwrite給桶表中加載數(shù)據(jù)
insert overwrite table course select * from course_common cluster by(c_id);
5）修改表
表的重命名：
alter table score4 rename to score5;
增加和修改列的信息：
（1）添加列
alter table score5 add columns (mycol string, mysco string);
（2）更新列
alter table score5 change column mysco mysconew int;
6）刪除表
drop table score5;
?7）hive中數(shù)據(jù)加載
通過查詢插入數(shù)據(jù)
通過load方式加載數(shù)據(jù)
load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');
通過查詢方式加載數(shù)據(jù)
create table score4 like score; insert overwrite table score4 partition(month = '201806') select s_id,c_id,s_score from score;
8）數(shù)據(jù)的導出
1）將查詢的結(jié)果導出到本地
truncate tableinsert overwrite local directory '/export/servers/exporthive' select * from score;
9）清空表數(shù)據(jù)
只能清空管理表，也就是內(nèi)部表
truncate table score6;
2）hive查詢語法

SELECT [ALL | DISTINCT] select_expr, select_expr, ... 
FROM table_reference
[WHERE where_condition] 
[GROUP BY col_list [HAVING condition]]        #數(shù)據(jù)的分組
[CLUSTER BY col_list                           #分文件進行查詢
  | [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]    #ORDER BY排序 全局排序
]                                                   #SORT  BY排序 局部排序
[LIMIT number]                                          #限制查詢數(shù)據(jù)返回的條數(shù)

8）join連接
? hive中只支持等值的join連接，不支持非等值連接。
? 內(nèi)連接（INNER JOIN）：只有進行連接的兩個表中都存在與連接條件相匹配的數(shù)據(jù)才會被保留下來。
? 左外連接：以左邊表為基準。
? 右外連接：以右邊表為基準。
? 滿外連接：以兩張表為基準，都查詢出來，如果對不上顯示NULL。
9)排序
? 全排序（Order By)：只有一個reduce
? 局部排序（Sort BY：要設定reduce的個數(shù)。
10）分區(qū)查詢排序：
? DISTRIBUTE BY
? 11）cluster by：
? 當DISTRIBUTE BY和Sort by的字段一致的時候，可以直接使用cluster by進行代替。
select * from score cluster by s_id;

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

Hive基礎(八)-測試題

Hive基礎(八)-測試題

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

Hive基礎(八)-測試題

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av