1.datax for greenplum安裝
下載地址
https://github.com/HashDataInc/DataX
安裝準(zhǔn)備
安裝mevan
下載地址1:https://maven.apache.org/download.cgi
安裝包版本3.5.4,下載二進(jìn)制包,解壓即可使用
下載地址2:
wget https://mirrors.cnnic.cn/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz --no-check-certificate
2)解壓安裝maven軟件包
tar -xf apache-maven-3.5.4-bin.tar.gz
mv apache-maven-3.5.4 /usr/local/maven
ln -s /usr/local/maven/bin/mvn /usr/bin/mvn # 與jenkins聯(lián)合使用時(shí),jenkins會到/usr/bin/下找mvn命令,如果沒有回報(bào)錯(cuò)
ll /usr/local/maven/
ll /usr/bin/mvn
3)配置環(huán)境變量
echo " ">>/etc/profile
echo "# Made for mvn env by zhaoshuai on $(date +%F)">>/etc/profile
echo 'export MAVEN_HOME=/usr/local/maven'>>/etc/profile
echo 'export PATH=$MAVEN_HOME/bin:$PATH'>>/etc/profile
tail -4 /etc/profile
source /etc/profile
echo $PATH
4)查看安裝的mvn版本號
which mvn
mvn -version
至此maven安裝完成
開始安裝源碼版本
目錄結(jié)構(gòu)!?。∵@個(gè)是源碼版本,因此目錄結(jié)構(gòu)不一樣
adswriter elasticsearchwriter hbase094xwriter hdfsreader mongodbreader odpswriter otsstreamreader postgresqlreader rpm txtfilereader
common ftpreader hbase11xreader hdfswriter mongodbwriter oraclereader otswriter postgresqlwriter sqlserverreader txtfilewriter
core ftpwriter hbase11xsqlwriter images mysqlreader oraclewriter package.xml rdbmsreader sqlserverwriter userGuid.md
datax-opensource-dingding.png gpdbjsonwriter hbase11xwriter introduction.md mysqlwriter ossreader plugin-rdbms-util rdbmswriter streamreader
drdsreader gpdbwriter hbasereader license.txt ocswriter osswriter plugin-unstructured-storage-util README streamwriter
drdswriter hbase094xreader hbasewriter mongodbjsonreader odpsreader otsreader pom.xml README.md transformer
編譯安裝:
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
最后結(jié)果
[WARNING] Assembly file: /app/DataX/target/datax-v1.0.4-hashdata is not a regular file (it may be a directory). It cannot be attached to the project build for installation or deployment.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] datax-all v1.0.4-hashdata .......................... SUCCESS [ 11.003 s]
[INFO] datax-common ....................................... SUCCESS [01:33 min]
[INFO] datax-transformer .................................. SUCCESS [ 47.629 s]
[INFO] datax-core ......................................... SUCCESS [ 26.107 s]
[INFO] plugin-rdbms-util .................................. SUCCESS [ 8.208 s]
[INFO] mysqlreader ........................................ SUCCESS [ 0.990 s]
[INFO] sqlserverreader .................................... SUCCESS [ 3.124 s]
[INFO] streamreader ....................................... SUCCESS [ 5.794 s]
[INFO] mysqlwriter ........................................ SUCCESS [ 0.730 s]
[INFO] streamwriter ....................................... SUCCESS [ 0.582 s]
[INFO] sqlserverwriter .................................... SUCCESS [ 0.715 s]
[INFO] gpdbwriter ......................................... SUCCESS [ 2.225 s]
[INFO] plugin-unstructured-storage-util v1.0.4-hashdata ... SUCCESS [01:39 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:11 min
[INFO] Finished at: 2020-12-24T11:01:03+08:00
[INFO] ------------------------------------------------------------------------
找到目錄
打包成功后的DataX包位于 {DataX_source_code_home}/target/datax-v1.0.4-hashdata/datax/ ,結(jié)構(gòu)如下:
這個(gè)與官網(wǎng)文檔不一樣,該目錄位置在打包成功后的提示文檔中??!注意查找!
[root@ares datax]# ls /app/DataX/target/datax-v1.0.4-hashdata
datax
[root@ares datax]# ls /app/DataX/target/datax-v1.0.4-hashdata/datax/
bin conf job lib plugin script tmp
自檢腳本
自檢腳本: python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
python /app/DataX/target/datax-v1.0.4-hashdata/datax/bin/datax.py /app/datax/job/job.json
我這里的job.json 是用的一鍵安裝版的json,目的僅僅是測試下功能,他自帶的那個(gè)job.json目的不明
2020-12-24 11:13:58.859 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-12-24 11:13:58.862 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-12-24 11:13:58.862 [main] INFO JobContainer - DataX jobContainer starts job.
2020-12-24 11:13:58.865 [main] INFO JobContainer - Set jobId = 0
2020-12-24 11:13:58.890 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-12-24 11:13:58.890 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-12-24 11:13:58.891 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-12-24 11:13:58.891 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-12-24 11:13:58.893 [job-0] INFO JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2020-12-24 11:13:58.894 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2020-12-24 11:13:58.895 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2020-12-24 11:13:58.924 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2020-12-24 11:13:58.930 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2020-12-24 11:13:58.933 [job-0] INFO JobContainer - Running by standalone Mode.
2020-12-24 11:13:58.944 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-12-24 11:13:58.950 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-12-24 11:13:58.950 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2020-12-24 11:13:58.966 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-12-24 11:13:59.067 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms
2020-12-24 11:13:59.068 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-12-24 11:14:08.958 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.051s | All Task WaitReaderTime 0.065s | Percentage 100.00%
2020-12-24 11:14:08.958 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2020-12-24 11:14:08.959 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
2020-12-24 11:14:08.960 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.
2020-12-24 11:14:08.960 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2020-12-24 11:14:08.961 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /app/DataX/target/datax-v1.0.4-hashdata/datax/hook
2020-12-24 11:14:08.963 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2020-12-24 11:14:08.964 [job-0] INFO JobContainer - PerfTrace not enable!
2020-12-24 11:14:08.964 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.051s | All Task WaitReaderTime 0.065s | Percentage 100.00%
2020-12-24 11:14:08.965 [job-0] INFO JobContainer -
任務(wù)啟動時(shí)刻 : 2020-12-24 11:13:58
任務(wù)結(jié)束時(shí)刻 : 2020-12-24 11:14:08
任務(wù)總計(jì)耗時(shí) : 10s
任務(wù)平均流量 : 253.91KB/s
記錄寫入速度 : 10000rec/s
讀出記錄總數(shù) : 100000
讀寫失敗總數(shù) : 0
自帶json內(nèi)容如下
{
"job": {
"setting": {
"speed": {
"byte": 1048576
}
},
"content": [
{
"reader": {
"name": "sqlserverreader",
"parameter": {
// 數(shù)據(jù)庫連接用戶名
"username": "ReadOnly01",
// 數(shù)據(jù)庫連接密碼
"password": "1qaz!QAZ",
"column": [
"id"
],
// "splitPk": "db_id",
"connection": [
{
"table": [
"table"
],
"jdbcUrl": [
"jdbc:sqlserver://192.168.0.65;DatabaseName=MyCost_Erp352"
]
}
]
}
},
"writer": {
"name": "sqlserverwriter",
"parameter": {
"username": "root",
"password": "root",
"column": [
"db_id",
"db_type",
"db_ip",
"db_port",
"db_role",
"db_name",
"db_username",
"db_password",
"db_modify_time",
"db_modify_user",
"db_description",
"db_tddl_info"
],
"connection": [
{
"table": [
"db_info_for_writer"
],
"jdbcUrl": "jdbc:sqlserver://[HOST_NAME]:PORT;DatabaseName=[DATABASE_NAME]"
}
],
"preSql": [
"delete from @table where db_id = -1;"
],
"postSql": [
"update @table set db_modify_time = now() where db_id = 1;"
]
}
}
}
]
}
2.datax-web安裝
項(xiàng)目地址:
https://github.com/WeiYe-Jing/datax-web
我這里采用的是一鍵部署安裝
https://pan.baidu.com/s/13yoqhGpD00I82K4lOYtQhg 提取碼:cpsk
解壓后目錄格式如下:
[root@ares app]# ls datax-web-2.1.2
bin modules packages README.md userGuid.md
以下為全文轉(zhuǎn)載
開始部署
1)解壓安裝包
在選定的安裝目錄,解壓安裝包
tar -zxvf datax-web-{VERSION}.tar.gz
2)執(zhí)行一鍵安裝腳本
進(jìn)入解壓后的目錄,找到bin目錄下面的install.sh文件,如果選擇交互式的安裝,則直接執(zhí)行
./bin/install.sh
在交互模式下,對各個(gè)模塊的package壓縮包的解壓以及configure配置腳本的調(diào)用,都會請求用戶確認(rèn),可根據(jù)提示查看是否安裝成功,如果沒有安裝成功,可以重復(fù)嘗試; 如果不想使用交互模式,跳過確認(rèn)過程,則執(zhí)行以下命令安裝
./bin/install.sh --force
3)數(shù)據(jù)庫初始化
如果你的服務(wù)上安裝有mysql命令,在執(zhí)行安裝腳本的過程中則會出現(xiàn)以下提醒:
Scan out mysql command, so begin to initalize the database
Do you want to initalize database with sql: [{INSTALL_PATH}/bin/db/datax-web.sql]? (Y/N)y
Please input the db host(default: 127.0.0.1):
Please input the db port(default: 3306):
Please input the db username(default: root):
Please input the db password(default: ):
Please input the db name(default: exchangis)
按照提示輸入數(shù)據(jù)庫地址,端口號,用戶名,密碼以及數(shù)據(jù)庫名稱,大部分情況下即可快速完成初始化。 如果服務(wù)上并沒有安裝mysql命令,則可以取用目錄下/bin/db/datax-web.sql腳本去手動執(zhí)行,完成后修改相關(guān)配置文件
vi ./modules/datax-admin/conf/bootstrap.properties
#Database
#DB_HOST=
#DB_PORT=
#DB_USERNAME=
#DB_PASSWORD=
#DB_DATABASE=
按照具體情況配置對應(yīng)的值即可。
4) 配置
安裝完成之后,
在項(xiàng)目目錄: /modules/datax-admin/bin/env.properties 配置郵件服務(wù)(可跳過)
MAIL_USERNAME=""
MAIL_PASSWORD=""
此文件中包括一些默認(rèn)配置參數(shù),例如:server.port,具體請查看文件。
在項(xiàng)目目錄下/modules/datax-execute/bin/env.properties 指定PYTHON_PATH的路徑 非常重要?。。?!
/app/DataX/target/datax-v1.0.4-hashdata/datax/bin/datax.py
vim /app/datax-web-2.1.2/modules/datax-executor/bin/env.properties
vi ./modules/{module_name}/bin/env.properties
### 執(zhí)行datax的python腳本地址
PYTHON_PATH=
### 保持和datax-admin服務(wù)的端口一致;默認(rèn)是9527,如果沒改datax-admin的端口,可以忽略
DATAX_ADMIN_PORT=
此文件中包括一些默認(rèn)配置參數(shù),例如:executor.port,json.path,data.path等,具體請查看文件。
5)啟動服務(wù)
- 一鍵啟動所有服務(wù)
./bin/start-all.sh
中途可能發(fā)生部分模塊啟動失敗或者卡住,可以退出重復(fù)執(zhí)行,如果需要改變某一模塊服務(wù)端口號,則:
vi ./modules/{module_name}/bin/env.properties
找到SERVER_PORT配置項(xiàng),改變它的值即可。 當(dāng)然也可以單一地啟動某一模塊服務(wù):
./bin/start.sh -m {module_name}
- 一鍵取消所有服務(wù)
./bin/stop-all.sh
當(dāng)然也可以單一地停止某一模塊服務(wù):
./bin/stop.sh -m {module_name}
6)查看服務(wù)(注意!注意?。?/h4>
在Linux環(huán)境下使用JPS命令,查看是否出現(xiàn)DataXAdminApplication和DataXExecutorApplication進(jìn)程,如果存在這表示項(xiàng)目運(yùn)行成功
如果項(xiàng)目啟動失敗,請檢查啟動日志:modules/datax-admin/bin/console.out或者modules/datax-executor/bin/console.out
Tips: 腳本使用的都是bash指令集,如若使用sh調(diào)用腳本,可能會有未知的錯(cuò)誤
7)運(yùn)行
部署完成后,在瀏覽器中輸入 http://ip:port/index.html 就可以訪問對應(yīng)的主界面(ip為datax-admin部署所在服務(wù)器ip,port為為datax-admin 指定的運(yùn)行端口)
輸入用戶名 admin 密碼 123456 就可以直接訪問系統(tǒng)
8) 運(yùn)行日志
部署完成之后,在modules/對應(yīng)的項(xiàng)目/data/applogs下(用戶也可以自己指定日志,修改application.yml 中的logpath地址即可),用戶可以根據(jù)此日志跟蹤項(xiàng)目實(shí)際啟動情況
如果執(zhí)行器啟動比admin快,執(zhí)行器會連接失敗,日志報(bào)"拒絕連接"的錯(cuò)誤,一般是先啟動admin,再啟動executor,30秒之后會重連,如果成功請忽略這個(gè)異常。
訪問datax-web 記住務(wù)必加/index.html
http://172.18.1.25:9527/index.html
不加報(bào)錯(cuò)!
http://192.168.10.227:9527/
Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.
Thu Dec 24 11:36:39 CST 2020
There was an unexpected error (type=Forbidden, status=403).
Access Denied

?。?!未完之配置,郵件設(shè)置?。?!
源碼安裝datax-web 非一鍵部署方式
文件目錄
[root@ares datax-web-master]# ls /app/datax-web-master
bin datax-admin datax-assembly datax-core datax-executor datax-rpc doc LICENSE pom.xml README.md userGuid.md
執(zhí)行打包,耗時(shí)較長,網(wǎng)速相關(guān)!
mvn clean install
[INFO] Building tar : /app/datax-web-master/build/datax-web-2.1.2.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] datax-web 2.1.2 .................................... SUCCESS [ 20.613 s]
[INFO] datax-rpc .......................................... SUCCESS [06:09 min]
[INFO] datax-core ......................................... SUCCESS [06:23 min]
[INFO] datax-admin ........................................ SUCCESS [44:46 min]
[INFO] datax-executor ..................................... SUCCESS [ 21.653 s]
[INFO] datax-assembly 2.1.2 ............................... SUCCESS [ 13.877 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 58:34 min
[INFO] Finished at: 2020-12-24T15:06:53+08:00
[INFO] ------------------------------------------------------------------------
1.linux環(huán)境部署
2.開發(fā)環(huán)境部署(或參考文檔 Debug)
2.1 創(chuàng)建數(shù)據(jù)庫
執(zhí)行bin/db下面的datax_web.sql文件(注意老版本更新語句有指定庫名)
2.2 修改項(xiàng)目配置
1.修改datax_admin下resources/application.yml文件
#數(shù)據(jù)源
datasource:
username: root
password: root
url: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
driver-class-name: com.mysql.jdbc.Driver
修改數(shù)據(jù)源配置,目前僅支持mysql
# 配置mybatis-plus打印sql日志
logging:
level:
com.wugui.datax.admin.mapper: error
path: ./data/applogs/admin
修改日志路徑path
# datax-web email
mail:
host: smtp.qq.com
port: 25
username: xxx@qq.com
password: xxx
properties:
mail:
smtp:
auth: true
starttls:
enable: true
required: true
socketFactory:
class: javax.net.ssl.SSLSocketFactory
修改郵件發(fā)送配置(不需要可以不修改)
2.修改datax_executor下resources/application.yml文件
# log config
logging:
config: classpath:logback.xml
path: ./data/applogs/executor/jobhandler
修改日志路徑path
datax:
job:
admin:
### datax-web admin address
addresses: http://127.0.0.1:8080
executor:
appname: datax-executor
ip:
port: 9999
### job log path
logpath: ./data/applogs/executor/jobhandler
### job log retention days
logretentiondays: 30
executor:
jsonpath: /Users/mac/data/applogs
pypath: /Users/mac/tools/datax/bin/datax.py
修改datax.job配置
- admin.addresses datax_admin部署地址,如調(diào)度中心集群部署存在多個(gè)地址則用逗號分隔,執(zhí)行器將會使用該地址進(jìn)行"執(zhí)行器心跳注冊"和"任務(wù)結(jié)果回調(diào)";
- executor.appname 執(zhí)行器AppName,每個(gè)執(zhí)行器機(jī)器集群的唯一標(biāo)示,執(zhí)行器心跳注冊分組依據(jù);
- executor.ip 默認(rèn)為空表示自動獲取IP,多網(wǎng)卡時(shí)可手動設(shè)置指定IP,該IP不會綁定Host僅作為通訊實(shí)用;地址信息用于 "執(zhí)行器注冊" 和 "調(diào)度中心請求并觸發(fā)任務(wù)";
- executor.port 執(zhí)行器Server端口號,默認(rèn)端口為9999,單機(jī)部署多個(gè)執(zhí)行器時(shí),注意要配置不同執(zhí)行器端口;
- executor.logpath 執(zhí)行器運(yùn)行日志文件存儲磁盤路徑,需要對該路徑擁有讀寫權(quán)限;
- executor.logretentiondays 執(zhí)行器日志文件保存天數(shù),過期日志自動清理, 限制值大于等于3時(shí)生效; 否則, 如-1, 關(guān)閉自動清理功能;
- executor.jsonpath datax json臨時(shí)文件保存路徑
- pypath DataX啟動腳本地址,例如:xxx/datax/bin/datax.py
如果系統(tǒng)配置DataX環(huán)境變量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和臨時(shí)json存放在環(huán)境變量路徑下。
四、啟動項(xiàng)目
1.本地idea開發(fā)環(huán)境
- 1.運(yùn)行datax_admin下 DataXAdminApplication
- 2.運(yùn)行datax_executor下 DataXExecutorApplication

admin啟動成功后日志會輸出三個(gè)地址,兩個(gè)接口文檔地址,一個(gè)前端頁面地址
五、啟動成功
啟動成功后打開頁面(默認(rèn)管理員用戶名:admin 密碼:123456)
http://localhost:8080/index.html#/dashboard
