將HDFS中不同目錄下面的數(shù)據(jù)合并在一起并放入指定目錄中
1、數(shù)據(jù)
new_staff
1 AAA male
2 BBB male
3 CCC male
4 DDD male
old_staff
1 AAA female
2 CCC female
3 BBB female
6 DDD female
尖叫提示:上邊數(shù)據(jù)的列之間的分隔符應(yīng)該為\t,行與行之間的分割符為\n,不要復(fù)制,在vim編輯器里面手敲
2、創(chuàng)建數(shù)據(jù),上傳hdfs
[yinggu@hadoop102 sqoop]$ mkdir tdata
[yinggu@hadoop102 sqoop]$ cd tdata/
[yinggu@hadoop102 tdata]$ mkdir newdata
[yinggu@hadoop102 tdata]$ mkdir olddata
[yinggu@hadoop102 tdata]$ vim newdata/new.txt
[yinggu@hadoop102 tdata]$ vim olddata/old.txt
[yinggu@hadoop102 sqoop]$ ../hadoop-2.8.2/bin/hadoop fs -put tdata/ /
3、創(chuàng)建JavaBean
[victor@node1 sqoop-1.4.7]$ bin/sqoop codegen \
--connect jdbc:mysql://node1:3306/company \
--username root \
--password 000000 \
--table staff \
--bindir /opt/module/sqoop/staff \
--class-name Staff \
--fields-terminated-by "\t"
4、開(kāi)始合并
[victor@node1 sqoop-1.4.7]$ bin/sqoop merge \
--new-data /tdata/newdata/ \
--onto /tdata/olddata/ \
--target-dir /tdata/merged \
--jar-file /opt/module/sqoop/staff/Staff.jar \
--class-name Staff \
--merge-key id
5、結(jié)果
1 AAA MALE
2 BBB MALE
3 CCC MALE
4 DDD MALE
6 DDD FEMALE
6、參數(shù)
| 序號(hào) | 參數(shù) | 說(shuō)明 |
|---|---|---|
| 1 | --new-data <path> | HDFS 待合并的數(shù)據(jù)目錄,合并后在新的數(shù)據(jù)集中保留 |
| 2 | --onto <path> | HDFS中合并后的數(shù)據(jù)存放目錄,合并后,重復(fù)的部分在新的數(shù)據(jù)集中被覆蓋 |
| 3 | --merge-key <col> | 合并鍵,一般是主鍵ID |
| 4 | --jar-file <file> | 合并時(shí)引入的jar包,該jar包是通過(guò)Codegen工具生成的jar包 |
| 5 | --class-name <class> | 對(duì)應(yīng)的表名或?qū)ο竺?,該class類是包含在jar包中的 |
| 6 | --target-dir <path> | 合并后的數(shù)據(jù)在HDFS里存放的目錄 |