查詢?nèi)w學(xué)生的學(xué)號與姓名
hive> select Sno,Sname from student;
查詢選修了課程的學(xué)生姓名
hive> select distinct Sname from student inner join sc on student.Sno=Sc.Sno;
----hive的group by 和集合函數(shù)
查詢學(xué)生的總?cè)藬?shù)
hive> select count(distinct Sno)count from student;
計(jì)算1號課程的學(xué)生平均成績
hive> select avg(distinct Grade) from sc where Cno=1;
查詢各科成績平均分
hive> select Cno,avg(Grade) from sc group by Cno;
查詢選修1號課程的學(xué)生最高分?jǐn)?shù)
select Grade from sc where Cno=1 sort by Grade desc limit 1;
(注意比較:select * from sc where Cno=1 sort by Grade
select Grade from sc where Cno=1 order by Grade)
求各個(gè)課程號及相應(yīng)的選課人數(shù)
hive> select Cno,count(1) from sc group by Cno;
查詢選修了3門以上的課程的學(xué)生學(xué)號
hive> select Sno from (select Sno,count(Cno) CountCno from sc group by Sno)a where a.CountCno>3;
或 hive> select Sno from sc group by Sno having count(Cno)>3;
hive的Order By/Sort By/Distribute By
Order By ,在strict 模式下(hive.mapred.mode=strict),order by 語句必須跟著limit語句,但是在nonstrict下就不是必須的,這樣做的理由是必須有一個(gè)reduce對最終的結(jié)果進(jìn)行排序,如果最后輸出的行數(shù)過多,一個(gè)reduce需要花費(fèi)很長的時(shí)間。
查詢學(xué)生信息,結(jié)果按學(xué)號全局有序
hive> set hive.mapred.mode=strict; <默認(rèn)nonstrict>
hive> select Sno from student order by Sno;//必須先設(shè)置為nonstrict或者跟著limit語句
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'Sno'
Sort By,它通常發(fā)生在每一個(gè)redcue里,“order by” 和“sort by"的區(qū)別在于,前者能給保證輸出都是有順序的,而后者如果有多個(gè)reduce的時(shí)候只是保證了輸出的部分有序。set mapred.reduce.tasks=<number>在sort by可以指定,在用sort by的時(shí)候,如果沒有指定列,它會(huì)隨機(jī)的分配到不同的reduce里去。distribute by 按照指定的字段對數(shù)據(jù)進(jìn)行劃分到不同的輸出reduce中
此方法會(huì)根據(jù)性別劃分到不同的reduce中 ,然后按年齡排序并輸出到不同的文件中。
查詢學(xué)生信息,按性別分區(qū),在分區(qū)內(nèi)按年齡有序
hive> set mapred.reduce.tasks=2;//設(shè)置兩個(gè)reduce 和性別對齊
hive> insert overwrite local directory '/home/hadoop/out' //導(dǎo)出排序結(jié)果 也可以直接導(dǎo)出到某個(gè)目錄的文件
select * from student distribute by Sex sort by Sage;//sort區(qū)內(nèi)排序
Join查詢,join只支持等值連接
查詢每個(gè)學(xué)生及其選修課程的情況
hive> select student.*,sc.* from student join sc on (student.Sno =sc.Sno);
查詢學(xué)生的得分情況。
hive>select student.Sname,course.Cname,sc.Grade from student
join sc on student.Sno=sc.Sno
join course on sc.cno=course.cno;
查詢選修2號課程且成績在90分以上的所有學(xué)生。---//學(xué)生表和成績表做join 過了才是限制條件
hive> select student.Sname,sc.Grade from student
join sc on student.Sno=sc.Sno
where sc.Cno=2 and sc.Grade>90;
LEFT,RIGHT 和 FULL OUTER JOIN ,inner join, left semi join
查詢所有學(xué)生的信息,如果在成績表中有成績,則輸出成績表中的課程號
hive> select student.Sname,sc.Cno from student left outer join sc on student.Sno=sc.Sno;
如果student的sno值對應(yīng)的sc在中沒有值,則會(huì)輸出student.Sname null.如果用right out join會(huì)保留右邊的值,左邊的為null。
Join 發(fā)生在WHERE 子句之前。如果你想限制 join 的輸出,應(yīng)該在 WHERE 子句中寫過濾條件——或是在join 子句中寫。
LEFT SEMI JOIN
Hive 當(dāng)前沒有實(shí)現(xiàn) IN/EXISTS 子查詢,可以用 LEFT SEMI JOIN 重寫子查詢語句
重寫以下子查詢?yōu)長EFT SEMI JOIN
SELECT a.key, a.value
FROM a
WHERE a.key exist in
(SELECT b.key
FROM B);//不支持exist子查詢
可以被重寫為:
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key) //只返回innor join左邊部分
查詢與“劉晨”在同一個(gè)系學(xué)習(xí)的學(xué)生---兩個(gè)表自己join自己
hive> select s1.Sname from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
注意比較:
select * from student s1 left join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 right join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 inner join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select * from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
select s1.Sname from student s1 right semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='劉晨';
保存數(shù)據(jù)的幾種方式
保存select查詢結(jié)果的幾種方式:
1、將查詢結(jié)果保存到一張新的hive表中
create table t_tmp
as
select * from t_p;
2、將查詢結(jié)果保存到一張已經(jīng)存在的hive表中
insert into table t_tmp
select * from t_p;
3、將查詢結(jié)果保存到指定的文件目錄(可以是本地,也可以是hdfs)
insert overwrite local directory '/home/hadoop/test'
select * from t_p;
insert overwrite directory '/aaa/test'
select * from t_p;
數(shù)據(jù)準(zhǔn)備
stu
95001,李勇,男,20,CS
95002,劉晨,女,19,IS
95003,王敏,女,22,MA
95004,張立,男,19,IS
95005,劉剛,男,18,MA
95006,孫慶,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,夢圓圓,女,18,MA
95010,孔小濤,男,19,CS
95011,包小柏,男,18,MA
95012,孫花,女,20,CS
95013,馮偉,男,21,CS
95014,王小麗,女,19,CS
95015,王君,男,18,MA
95016,錢國,男,21,MA
95017,王風(fēng)娟,女,18,IS
95018,王一,女,19,IS
95019,邢小麗,女,19,IS
95020,趙錢,男,21,IS
95021,周二,男,17,MA
95022,鄭明,男,20,MA
sc
95001,1,81
95001,2,85
95001,3,88
95001,4,70
95002,2,90
95002,3,80
95002,4,71
95002,5,60
95003,1,82
95003,3,90
95003,5,100
95004,1,80
95004,2,92
95004,4,91
95004,5,70
95005,1,70
95005,2,92
95005,3,99
95005,6,87
95006,1,72
95006,2,62
95006,3,100
95006,4,59
95006,5,60
95006,6,98
95007,3,68
95007,4,91
95007,5,94
95007,6,78
95008,1,98
95008,3,89
95008,6,91
95009,2,81
95009,4,89
95009,6,100
95010,2,98
95010,5,90
95010,6,80
95011,1,81
95011,2,91
95011,3,81
95011,4,86
95012,1,81
95012,3,78
95012,4,85
95012,6,98
95013,1,98
95013,2,58
95013,4,88
95013,5,93
95014,1,91
95014,2,100
95014,4,98
95015,1,91
95015,3,59
95015,4,100
95015,6,95
95016,1,92
95016,2,99
95016,4,82
95017,4,82
95017,5,100
95017,6,58
95018,1,95
95018,2,100
95018,3,67
95018,4,78
95019,1,77
95019,2,90
95019,3,91
95019,4,67
95019,5,87
95020,1,66
95020,2,99
95020,5,93
95021,2,93
95021,5,91
95021,6,99
95022,3,69
95022,4,93
95022,5,82
95022,6,100
course
1,數(shù)據(jù)庫
2,數(shù)學(xué)
3,信息系統(tǒng)
4,操作系統(tǒng)
5,數(shù)據(jù)結(jié)構(gòu)
6,數(shù)據(jù)處理
stu-sc-course-ddl
create table student(Sno int,Sname string,Sex string,Sage int,Sdept string)row format delimited fields terminated by ','stored as textfile;
create table course(Cno int,Cname string) row format delimited fields terminated by ',' stored as textfile;
create table sc(Sno int,Cno int,Grade int)row format delimited fields terminated by ',' stored as textfile;
load data local inpath '/home/hadoop/hivedata/students.txt' overwrite into table student;
load data local inpath '/home/hadoop/hivedata/sc.txt' overwrite into table sc;
load data local inpath '/home/hadoop/hivedata/course.txt' overwrite into table course;