1. 使用HQL實現(xiàn)wordcount
hive> create table hive_wordcount(context string);
OK
Time taken: 0.897 seconds
hive> show tables;
OK
hive_wordcount
Time taken: 0.117 seconds, Fetched: 1 row(s)
hive> select * from hive_wordcount;
OK
Time taken: 0.394 seconds
mysql> select * from TBLS;
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
| 1 | 1531277888 | 1 | 0 | hadoop | 0 | 1 | hive_wordcount | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)
mysql> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | context | string | 0 |
+-------+---------+-------------+-----------+-------------+
1 row in set (0.00 sec)
hive> load data local inpath '/home/hadoop/data/hello.txt' into table hive_wordcount;
Loading data to table default.hive_wordcount
Table default.hive_wordcount stats: [numFiles=1, totalSize=34]
OK
Time taken: 1.344 seconds
hive> select * from hive_wordcount;
OK
hello world welcome
hello welcome
Time taken: 0.405 seconds, Fetched: 2 row(s)
- 使用HQL實現(xiàn)wordcount
- select word, count(1) from hive_wordcount lateral view explode(split(context, '\t')) wc as word group by word;
- lateral view explode(split(context, '\t'))是把每行記錄按指定分隔符進行拆解;
- HQL會提交一個MR作業(yè)到Y(jié)ARN上運行;
hive> select word, count(1) from hive_wordcount lateral view explode(split(context, '\t')) wc as word group by word;
Query ID = hadoop_20180711030606_4dd72901-a4ea-4b79-b00f-b27f599343c8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1531222938551_0002, Tracking URL = http://hadoop001:8088/proxy/application_1531222938551_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1531222938551_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-11 03:12:32,410 Stage-1 map = 0%, reduce = 0%
2018-07-11 03:12:40,914 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.68 sec
2018-07-11 03:12:48,273 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.07 sec
MapReduce Total cumulative CPU time: 3 seconds 70 msec
Ended Job = job_1531222938551_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.07 sec HDFS Read: 8801 HDFS Write: 26 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 70 msec
OK
hello 2
welcome 2
world 1
Time taken: 28.304 seconds, Fetched: 3 row(s)
2. 使用HQL實現(xiàn)部門員工數(shù)統(tǒng)計
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7900 JAMES CLERK 7698 1981-12-3 950.00 30
7902 FORD ANALYST 7566 1981-12-3 3000.00 20
7934 MILLER CLERK 7782 1982-1-23 1300.00 10
8888 HIVE PROGRAM 7839 1988-1-23 10300.00
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
hive> create table emp(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double,
> deptno int
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.399 seconds
hive> create table dept(
> deptno int,
> dname string,
> location string
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.106 seconds
hive> load data local inpath '/home/hadoop/data/emp.txt' into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, totalSize=700]
OK
Time taken: 0.316 seconds
hive> load data local inpath '/home/hadoop/data/dept.txt' into table dept;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, totalSize=79]
OK
Time taken: 0.284 seconds
hive> select deptno, count(1) from emp group by deptno;
Query ID = hadoop_20180711030606_4dd72901-a4ea-4b79-b00f-b27f599343c8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1531222938551_0003, Tracking URL = http://hadoop001:8088/proxy/application_1531222938551_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1531222938551_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-11 03:56:44,051 Stage-1 map = 0%, reduce = 0%
2018-07-11 03:56:51,509 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec
2018-07-11 03:56:59,837 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.89 sec
MapReduce Total cumulative CPU time: 2 seconds 890 msec
Ended Job = job_1531222938551_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.89 sec HDFS Read: 7958 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 890 msec
OK
NULL 1
10 3
20 5
30 6
Time taken: 23.755 seconds, Fetched: 4 row(s)