通過mapreduce清洗數(shù)據(jù)綁定到hive,再通過hive查詢出結(jié)果集導(dǎo)入到hive的表,再通過sqoop導(dǎo)出到mysql
1.在hive中創(chuàng)建表
create external table mydb.access(ip string,day string,url string,upflow string) row format delimited fields terminated by ',';
2.加載清洗后的數(shù)據(jù)到剛創(chuàng)建的表
load data inpath '/hive/output/' into table mydb.access;
3.再創(chuàng)建一張表用于存放結(jié)果集
create external table mydb.upflow (ip string,sum string) row format delimited fields terminated by ',';
4.將查詢結(jié)果存放到結(jié)果集表
insert into mydb.upflow select ip, sum(upflow) as sum from mydb.access group by ip order by sum desc;
5.在mysql中創(chuàng)建一張用于存放結(jié)果集的表
create table upflow (
ip varchar(200),
sum varchar(200)
);
6.通過sqoop將hive中的結(jié)果集導(dǎo)入mysql中的表
sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password admin --table uv_info --export-dir /user/hive/warehouse/uv/dt=2011-08-03