午夜精品久久久久日本,久久少妇高潮

hive的使用方式

1.使用CLI

直接使用hive命令即可進(jìn)入客戶(hù)端。

2. 使用hiveserver2服務(wù)

修改hdfs-site.xml，core-site.xml
- 在hdfs-site.xml加上dsf.webhdfs.enabled-->true
- core-site.xml加入hadoop.proxyuser.hadoop.hosts-->*
  ......groups-->*
把hive啟動(dòng)為一個(gè)后臺(tái)服務(wù)，只有啟動(dòng)為后臺(tái)服務(wù)之后，才能讓HJDBC,ODBC等程序去連接hive

nohup command &
nohup 的意思： no hang up 不掛起
輸入命令：nohup hiverser2 & 2>~/hive_err.log 1>~/hive_std.log
日志：0代表標(biāo)準(zhǔn)輸入，1代表標(biāo)準(zhǔn)輸出，2代表異常輸出
nohup hiverser2 & 2>/dev/null 1>/dev/null
進(jìn)入黑洞，所有日志都不保存
輸入jps 出現(xiàn) RunJar進(jìn)程表示啟動(dòng)成功
使用beeline客戶(hù)端工具去連接hiveserver2
1. $ beeline
2. >!connect jdbc:hive2://hadoop02:10000

HQL的使用

關(guān)于庫(kù)的DDL

創(chuàng)建庫(kù)
create database if not exists hadoop;
創(chuàng)建時(shí)使用if not exists 忽略異常
刪除時(shí)，使用 if exists 忽略異常
適用于創(chuàng)建表
查詢(xún)庫(kù)列表信息
show databases;
查詢(xún)正在使用的庫(kù)
select current_database();
切換庫(kù)
use dname;
查詢(xún)庫(kù)的詳細(xì)信息
desc database dname;
desc database extended dname;
刪除庫(kù)
drop database dname;
drop database dname restrict;
如果已經(jīng)有表是不能刪除的。
drop database dname cascade;
級(jí)聯(lián)的方式刪除數(shù)據(jù)庫(kù)
修改庫(kù)/基本不用

關(guān)于表的DDL

創(chuàng)建表
create

comment 表注釋
partioner by(col_name data_type...)
分區(qū)字段不能在表字段中出現(xiàn)
clustered by (col_name,....) 分桶
[sorted by (col_name[asc|desc],...)]是否排序按照哪個(gè)字段排序
into num_buckets BUCKETS 整個(gè)表分成多少個(gè)桶
分桶表的字段必須是表字段中的一部分
row format row_format 行的分隔符以什么字符終止
row format delimited fields terminated by "," lines terminated by "\n"
stored as file_format 存儲(chǔ)什么文件
file_format:
1. textfile 普通文本
2. sequencefile 序列化文本
3. rcfile行列存儲(chǔ)相結(jié)合的文件
4. 自定義文件格式
location hdfs_path
創(chuàng)建表的時(shí)候可以指定表的路徑。
內(nèi)外部表都是可以指定hdfs的存儲(chǔ)路徑的。
最佳實(shí)踐是：如果一份數(shù)據(jù)已經(jīng)存儲(chǔ)在hdfs上并且要讓多個(gè)客戶(hù)端使用，就用外部表。
set hive.exec.mode.local.auto=true;
hive嘗試本地模式運(yùn)行
會(huì)話(huà)斷或者reset就自動(dòng)失效;
復(fù)制表
create table student_1 like student;
復(fù)制一張表的定義，不包含數(shù)據(jù)
CTAS
create table .... as select ...
set property
查看配置文件

6個(gè)表DDL的例子

創(chuàng)建內(nèi)部表：create table student (id int, name string) row format delimited fields terminated by ',';
創(chuàng)建外部表： create external studen_ext row format delimited fields terminated by ',' location '/hive/student';
desc 表名就可以看到表結(jié)構(gòu)是externaltable
分區(qū)表：
create table student_ptn(id int , name string) partitioned by (city string) row format delimited fields terminated by ','

create table t01_ptn02 (count int) partitioned by (username string,month string) row format delimited fields terminated by ',';

添加分區(qū)：alter table student_ptn add partition(city="beijing")

city 是分區(qū)字段，如果有還有如zip那目錄結(jié)構(gòu)就是/city=beijing/zip=10011

分區(qū)字段不能使用表中存在的字段
如果某張表是分區(qū)表，某個(gè)分區(qū)就是這張表目錄下的一個(gè)分區(qū)目錄
數(shù)據(jù)文件只能放在分區(qū)文件夾中，不能放在表文件夾下。

查看分區(qū)： show partitions student_ptn;
分桶表
create table studen_bck (id int , name string) clustered by (id) sorted by (id asc,name desc) into 4 buckets row format delimited fields terminated by ','
使用CTAS創(chuàng)建表
就是從一個(gè)查詢(xún)sql結(jié)果來(lái)創(chuàng)建一個(gè)表進(jìn)行存儲(chǔ)
create table studnet_ctas as select * from student where id <10;
復(fù)制表結(jié)構(gòu)
create table sut_copy like student;

無(wú)論被復(fù)制的表是內(nèi)部表還是外部表，如果在table的前面沒(méi)有加exteral那么復(fù)制出來(lái)的新表都是內(nèi)部表

查看命令

show tables;
show tables in dname;
show tables like 'stu*';//使用正則表達(dá)式

查看表的詳細(xì)信息
desc studnet;
desc extended student;
desc formatted student;
show partitions stu; //查看分區(qū)信息
show functions;//查看函數(shù)
desc function extended substring;//查看函數(shù)用法
show create table stu;//查看建表的詳細(xì)語(yǔ)句

修改表

修改表名
alter table stu rename to new_stu;
修改字段定義
- 增加一個(gè)字段
  alter table stu **add columns **(sex string,age int);
- 修改一個(gè)字段定義
  alter talbe stu change age new_age string;
- 刪除一個(gè)字段
  不支持
- 替換所有字段
  alter table stu replace columns(id int,name string);
  int類(lèi)型可以轉(zhuǎn)成string，string轉(zhuǎn)不成int
  但hive-1.2.2版本可以任意替換
  hive schema on read //hive是讀模式的數(shù)據(jù)倉(cāng)庫(kù)
- 修改分區(qū)信息
  - 添加靜態(tài)分區(qū)：alter table stu_ptn add partioner(city="chongqing") partioner(city="kunming") ......;
  - 修改分區(qū)
    一般來(lái)說(shuō)只會(huì)修改分區(qū)數(shù)據(jù)的存儲(chǔ)目錄alter table stu_ptn partioner(city='beijing') set location '/stu_ptn_beijing';
  - 刪除分區(qū)
    alter table stu_ptn drop partition (city='beijing')
清空表
truncate table stu;
刪除表
drop table stu;

DML數(shù)據(jù)操縱語(yǔ)言

導(dǎo)入數(shù)據(jù)

load方式裝載數(shù)據(jù)
hive模式是讀模式，可以導(dǎo)入任何數(shù)據(jù)
- load data local inpath "/home/" into table student;
  從Linux本地導(dǎo)入數(shù)據(jù)到student表中。
  會(huì)把數(shù)據(jù)文件上傳進(jìn)/user/hive/warehouse/student
- load data inpath "/stu/test.txt" into table stu;
  從hdfs上導(dǎo)入數(shù)據(jù)
  如果數(shù)據(jù)已經(jīng)在hdfs上，就不要再創(chuàng)建內(nèi)部表。
  因?yàn)檫@樣會(huì)把這份數(shù)據(jù)移動(dòng)到/user/hive/warehouse/目錄下
  內(nèi)部表刪除時(shí)就會(huì)把這份數(shù)據(jù)刪掉。
- hadoop fs -put file user/hive/warehouse/studnet/
  直接上傳到上傳到hive表中
- load data local inpath "....." overwrite into talbe;
  覆蓋導(dǎo)入
inser 方式插入數(shù)據(jù)
- insert into student (id,name,sex,age,department)values(1111,'ss','f',12,'nn'),(xx,xxx,xxx,);
  insert方式，首先創(chuàng)建一張零時(shí)表如values_tmp_table_1 來(lái)保存inser語(yǔ)句的結(jié)果，然后再將記錄插入到表中
- insert into table student_c select *from student where age<=18;

多重插入

創(chuàng)建一張分區(qū)表create table stu_ptn_age(id int,name string, sex String )partioned by （age int）.....

從stu表中，把數(shù)據(jù)分成三類(lèi)，插入到stu_ptn這張表的三個(gè)分區(qū)中：
導(dǎo)入數(shù)據(jù)到分區(qū)表時(shí)，這個(gè)分區(qū)可以不存在。會(huì)自動(dòng)創(chuàng)建

insert into table  stu_ptn_age partition(age=18) select id,name,sex,department from student where age <=18;
insert into table  stu_ptn_age partition(age=19) select id,name,sex,department from student where age =19;
insert into table  stu_ptn_age partition(age=20) select  id,name,sex,department from student where age >=20;

這種方式比較耗時(shí)

可以使用多重插入來(lái)降低任務(wù)復(fù)雜度
主要減少的是原表的數(shù)據(jù)掃描次數(shù)

  from sudent
  insert into table stu_ptn_age partition(age=18) select id,name,sex,department where age<=18 ;
  insert into table stu_ptn_age partition(age=19) select id,name,sex,department where=19;
  insert into table stu_ptn_age partition(age=20) select id,name,sex,department where >=20;

清空表truncate時(shí)不會(huì)清空age=xx的分區(qū)信息
select * from stu_ptn;
分區(qū)字段也會(huì)顯示。
在使用過(guò)程中分區(qū)字段和普通字段是一樣的。
分區(qū)的信息存儲(chǔ)在partition表中

問(wèn)題：如果真實(shí)的需求是每一個(gè)年齡一個(gè)分區(qū)？

動(dòng)態(tài)分區(qū)插入

創(chuàng)建一張測(cè)試表：create stu_ptn_dpt .....partition by (department string)....
插入數(shù)據(jù)會(huì)報(bào)錯(cuò)：insert into table t01_ptn partition(username,month) select count,username,month from table01;
set hive.exec.dynamic.partition.mode=nonstrict
如果一張表有多個(gè)分區(qū)字段：那么在進(jìn)行動(dòng)態(tài)分區(qū)插入是，一定要有一列是靜態(tài)分區(qū)；如果不像受這樣的限制就把模式設(shè)置為nonstrict。

如果往分區(qū)表中插入數(shù)據(jù)，不要使用load方式，這容易使分區(qū)內(nèi)的數(shù)據(jù)混亂，除非在非常確定的情況下

insert方式導(dǎo)出數(shù)據(jù)
insert overwrite local directory "/home/hadoop/tem/stu_le18" select * from student where age<=18;
這種方式要注意路徑，因?yàn)槭莖verwriter
在查看到處數(shù)據(jù)時(shí)使用：sed -e 's/\x01/\t/g' file.txt 替換默認(rèn)的Ctrl+a字段分隔符。

字符串替換：s命令
sed 's/hello/hi/g' sed.txt              
##  在整行范圍內(nèi)把hello替換為hi。如果沒(méi)有g(shù)標(biāo)記，則只有每行第一個(gè)匹配的hello被替換成hi。

多點(diǎn)編輯：e命令
sed -e '1,5d' -e 's/hello/hi/' sed.txt
##  (-e)選項(xiàng)允許在同一行里執(zhí)行多條命令。如例子所示，第一條命令刪除1至5行，第二條命令用hello替換hi。
命令的執(zhí)行順序?qū)Y(jié)果有影響。如果兩個(gè)命令都是替換命令，那么第一個(gè)替換命令將影響第二個(gè)替換命令的結(jié)果。

sed --expression='s/hello/hi/' --expression='/today/d' sed.txt
##  一個(gè)比-e更好的命令是--expression。它能給sed表達(dá)式賦值。

查詢(xún)

distinct去重
show function;271個(gè)內(nèi)置函數(shù)--2.3.3
UDF 單行函數(shù)，輸入1，輸出1；
UDAF 多對(duì)一函數(shù)，輸入n 輸出1
UDTF 一對(duì)多函數(shù) 輸入1，輸出n
不支持update和delete
因?yàn)槭莌ive是數(shù)據(jù)倉(cāng)庫(kù)，聯(lián)機(jī)事務(wù)分析
支持in 和 exits
select * from student where in (18,19)
老版本不支持，hive推薦使用semi join半連接
支持 case when

select id,t_job,t_edu **case** t_edu 
when "碩士" then 1 
when "本科" then 2 
else 3 
end as level 
from lagou limit 1,100;

select count(distinct age )from join .. on ..where ... goup by ... having ... cluster by ...distribute by ..sort by .. order by ... limit ....

order by 全局排序 select * from studnet order by age desc
sort by
局部排序，每個(gè)分區(qū)內(nèi)有序，但是你會(huì)發(fā)現(xiàn)同一個(gè)age的條目會(huì)被分到不同分區(qū)中，因?yàn)闆](méi)有進(jìn)行hash散列。
一個(gè)sql就是一個(gè)mr程序，局部排序就是指，有多個(gè)reduceTask執(zhí)行的話(huà)，那么最終，每個(gè)reduceTask的結(jié)果是有序的，如果只有一個(gè)reduceTask sort by = order by
set mapreduce.job.reduces =3;
select * from student sort by age desc;
如果使用* 號(hào)查詢(xún)出來(lái)的是隨機(jī)進(jìn)行分區(qū)的。
distribute by
分桶操作
select * from student distribute by age sort by age desc;
分桶就是把a(bǔ)ge求hash值之后模以桶數(shù)得到的結(jié)果就知道要分到哪個(gè)桶中，分桶的個(gè)數(shù)就是reduceTask的個(gè)數(shù)。
sort by是進(jìn)行局部排序，所以每個(gè)桶中的數(shù)據(jù)是有序的
cluster by
cluster by age = distribute by age sort by age;
distribute by id sort by id,age != cluster by id sort by age;
cluster by 不能和sort by 同用。
如果要散列一個(gè)字段之后進(jìn)行多個(gè)分區(qū)的排序只能用distributed和sort組合。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

hive的基本操作

hive的基本操作

hive的使用方式

1.使用CLI

2. 使用hiveserver2服務(wù)

HQL的使用

關(guān)于庫(kù)的DDL

關(guān)于表的DDL

6個(gè)表DDL的例子

查看命令

修改表

DML數(shù)據(jù)操縱語(yǔ)言

導(dǎo)入數(shù)據(jù)

多重插入

問(wèn)題：如果真實(shí)的需求是每一個(gè)年齡一個(gè)分區(qū)？

動(dòng)態(tài)分區(qū)插入

查詢(xún)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

hive的基本操作

hive的使用方式

1.使用CLI

2. 使用hiveserver2服務(wù)

HQL的使用

關(guān)于庫(kù)的DDL

關(guān)于表的DDL

6個(gè)表DDL的例子

查看命令

修改表

DML數(shù)據(jù)操縱語(yǔ)言

導(dǎo)入數(shù)據(jù)

多重插入

問(wèn)題： 如果真實(shí)的需求是每一個(gè)年齡一個(gè)分區(qū)？

動(dòng)態(tài)分區(qū)插入

查詢(xún)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

問(wèn)題：如果真實(shí)的需求是每一個(gè)年齡一個(gè)分區(qū)？