一、Sqoop 介紹
Sqoop 是一款用于 hadoop 和關(guān)系型數(shù)據(jù)庫之間數(shù)據(jù)導(dǎo)入導(dǎo)出的工具??梢酝ㄟ^ Sqoop 把數(shù)據(jù)從數(shù)據(jù)庫(比如 mysql,oracle)導(dǎo)入到 hdfs 中;也可以把數(shù)據(jù)從 hdfs 中導(dǎo)出到關(guān)系型數(shù)據(jù)庫中。
通過將 Sqoop 的操作命令轉(zhuǎn)化為 Hadoop 的 MapReduce 作業(yè)(通常只涉及到 Map 任務(wù))進(jìn)行導(dǎo)入導(dǎo)出,即 Sqoop 生成的 Job 主要是并發(fā)運行 MapTask 實現(xiàn)數(shù)據(jù)并行傳輸以提升數(shù)據(jù)傳送速度和效率,如果使用 Shell 腳本來實現(xiàn)多線程數(shù)據(jù)傳送則存在很大的難度。
Sqoop2(Sqoop1.99.7)需要在 Hadoop 安裝目錄下的配置文件中設(shè)置代理,屬于重量級嵌入安裝,文中使用 Sqoop1(Sqoop1.4.7)。
二、Sqoop 安裝
安裝前提:
已經(jīng)具備了Java和hadoop的環(huán)境。
下載安裝包:
http://mirror.bit.edu.cn/apache/sqoop/1.4.7/
解壓:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n13" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">sudo tar -zxv -f sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
sudo mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop</pre>
配置環(huán)境變量:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="sh" cid="n15" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">export SQOOP_HOME=/opt/sqoop
export PATH=SQOOP_HOME/bin</pre>
使配置生效:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n17" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">source /etc/profile</pre>
上傳 mysql 驅(qū)動至 Sqoop lib 下:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n19" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">cp mysql-connector-java-5.1.46.jar /opt/sqoop/lib/</pre>
修改 Sqoop 配置:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n21" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">cd /opt/sqoop/conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh</pre>
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="sh" cid="n22" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/hadoop
?
Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/hadoop
?
set the path to where bin/hbase is available
export HBASE_HOME=/opt/hbase
?
Set the path to where bin/hive is available
export HIVE_HOME=/opt/hive
?
Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/zookeeper/conf</pre>
注意:這里的 HADOOP_COMMON_HOME 和 HADOOP_MAPRED_HOME 配成一個就行了,因為現(xiàn)在安裝的 hadoop 的開源的版本,但是在 hadoop 的商業(yè)版本中這兩個配置是分別安裝在不同的目錄下的。
三、驗證安裝是否成功
執(zhí)行 sqoop version,看到版本信息,即安裝成功。
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n26" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017</pre>
四、測試
4.1、mysql 整個表導(dǎo)入 hdfs
運行下面的指令前確保 mysql 中存在 test db 和 mysql_input table。
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n30" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">sqoop import
--connect jdbc:mysql://master:3306/test
--username root
--password xxx
--table mysql_input
--target-dir /user/sqoop/data_from_mysql</pre>
運行后有報錯:
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="" cid="n32" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">ERROR tool.ImportTool: Import failed: No primary key could be found for table mysql_input. Please specify one with --split-by or perform a sequential import with '-m 1'.</pre>
因為沒有主鍵,又沒有指定切分的列,Sqoop 不知道用多少個 map 進(jìn)行數(shù)據(jù)同步,可以通過設(shè)置 split-by 和 -m 1來規(guī)避。
當(dāng)-m 設(shè)置的值大于1時,split-by 必須設(shè)置字段。
4.2、mysql 表按查詢條件導(dǎo)入 hdfs
<pre spellcheck="false" class="md-fences md-end-block ty-contain-cm modeLoaded" lang="shell" cid="n36" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: var(--monospace); --select-text-font-color: #fff; font-size: 0.9rem; line-height: 1.71429em; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(218, 218, 218); position: relative !important; margin-bottom: 3em; margin-left: 2em; padding-left: 1ch; padding-right: 1ch; width: inherit; color: rgb(31, 9, 9); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">sqoop import
--connect jdbc:mysql://master:3306/test
--username root
--password xxx
--query 'select * from mysql_input where size>1 and $CONDITIONS'
--target-dir /user/sqoop/data_from_mysql
-m 1</pre>