環(huán)境準備(08) Hive基本使用

1. 使用HQL實現(xiàn)wordcount

  • 創(chuàng)建表
hive> create table hive_wordcount(context string);
OK
Time taken: 0.897 seconds

hive> show tables;
OK
hive_wordcount
Time taken: 0.117 seconds, Fetched: 1 row(s)

hive> select * from hive_wordcount;
OK
Time taken: 0.394 seconds
  • 查看MySQL中的數(shù)據(jù)變化
mysql> select * from TBLS;
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME       | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
|      1 |  1531277888 |     1 |                0 | hadoop |         0 |     1 | hive_wordcount | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+--------+-----------+-------+----------------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)

mysql> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | context     | string    |           0 |
+-------+---------+-------------+-----------+-------------+
1 row in set (0.00 sec)
  • 加載數(shù)據(jù)到Hive表中
hive> load data local inpath '/home/hadoop/data/hello.txt' into table hive_wordcount;
Loading data to table default.hive_wordcount
Table default.hive_wordcount stats: [numFiles=1, totalSize=34]
OK
Time taken: 1.344 seconds

hive> select * from hive_wordcount;
OK
hello   world   welcome
hello   welcome
Time taken: 0.405 seconds, Fetched: 2 row(s)
  • 使用HQL實現(xiàn)wordcount
    • select word, count(1) from hive_wordcount lateral view explode(split(context, '\t')) wc as word group by word;
    • lateral view explode(split(context, '\t'))是把每行記錄按指定分隔符進行拆解;
    • HQL會提交一個MR作業(yè)到Y(jié)ARN上運行;
hive> select word, count(1) from hive_wordcount lateral view explode(split(context, '\t')) wc as word group by word;
Query ID = hadoop_20180711030606_4dd72901-a4ea-4b79-b00f-b27f599343c8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1531222938551_0002, Tracking URL = http://hadoop001:8088/proxy/application_1531222938551_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1531222938551_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-11 03:12:32,410 Stage-1 map = 0%,  reduce = 0%
2018-07-11 03:12:40,914 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.68 sec
2018-07-11 03:12:48,273 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.07 sec
MapReduce Total cumulative CPU time: 3 seconds 70 msec
Ended Job = job_1531222938551_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.07 sec   HDFS Read: 8801 HDFS Write: 26 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 70 msec
OK
hello   2
welcome 2
world   1
Time taken: 28.304 seconds, Fetched: 3 row(s)

2. 使用HQL實現(xiàn)部門員工數(shù)統(tǒng)計

  • 原始數(shù)據(jù) - 員工
7369    SMITH   CLERK   7902    1980-12-17  800.00      20
7499    ALLEN   SALESMAN    7698    1981-2-20   1600.00 300.00  30
7521    WARD    SALESMAN    7698    1981-2-22   1250.00 500.00  30
7566    JONES   MANAGER 7839    1981-4-2    2975.00     20
7654    MARTIN  SALESMAN    7698    1981-9-28   1250.00 1400.00 30
7698    BLAKE   MANAGER 7839    1981-5-1    2850.00     30
7782    CLARK   MANAGER 7839    1981-6-9    2450.00     10
7788    SCOTT   ANALYST 7566    1987-4-19   3000.00     20
7839    KING    PRESIDENT       1981-11-17  5000.00     10
7844    TURNER  SALESMAN    7698    1981-9-8    1500.00 0.00    30
7876    ADAMS   CLERK   7788    1987-5-23   1100.00     20
7900    JAMES   CLERK   7698    1981-12-3   950.00      30
7902    FORD    ANALYST 7566    1981-12-3   3000.00     20
7934    MILLER  CLERK   7782    1982-1-23   1300.00     10
8888    HIVE    PROGRAM 7839    1988-1-23   10300.00
  • 原始數(shù)據(jù) - 部門
10  ACCOUNTING  NEW YORK
20  RESEARCH    DALLAS
30  SALES   CHICAGO
40  OPERATIONS  BOSTON
  • 創(chuàng)建表
hive> create table emp(
    > empno int,
    > ename string,
    > job string,
    > mgr int,
    > hiredate string,
    > sal double,
    > comm double,
    > deptno int
    > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.399 seconds

hive> create table dept(
    > deptno int,
    > dname string,
    > location string
    > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.106 seconds
  • 導(dǎo)入數(shù)據(jù)
hive> load data local inpath '/home/hadoop/data/emp.txt' into table emp;
Loading data to table default.emp
Table default.emp stats: [numFiles=1, totalSize=700]
OK
Time taken: 0.316 seconds

hive> load data local inpath '/home/hadoop/data/dept.txt' into table dept;
Loading data to table default.dept
Table default.dept stats: [numFiles=1, totalSize=79]
OK
Time taken: 0.284 seconds
  • HQL查詢部門人數(shù)
hive> select deptno, count(1) from emp group by deptno;
Query ID = hadoop_20180711030606_4dd72901-a4ea-4b79-b00f-b27f599343c8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1531222938551_0003, Tracking URL = http://hadoop001:8088/proxy/application_1531222938551_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1531222938551_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-11 03:56:44,051 Stage-1 map = 0%,  reduce = 0%
2018-07-11 03:56:51,509 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.22 sec
2018-07-11 03:56:59,837 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.89 sec
MapReduce Total cumulative CPU time: 2 seconds 890 msec
Ended Job = job_1531222938551_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.89 sec   HDFS Read: 7958 HDFS Write: 20 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 890 msec
OK
NULL    1
10  3
20  5
30  6
Time taken: 23.755 seconds, Fetched: 4 row(s)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容