內(nèi)容簡介
適合 人群:
大數(shù)據(jù)開發(fā)者、DevOps、運維工程師
您將了解到:
azkaban API 調(diào)用、帶參數(shù)化的 workflow、郵件報警、控制臺查看和用戶管理
選擇 理由:
開源、官方文檔支持很好,對比 airflow 時間概念清晰、UI 優(yōu)秀,良好的用戶權限控制
使用 場景:
在每天要完成數(shù)據(jù)倉庫的清洗,數(shù)據(jù)更新的任務下,azkaban 具有 schedule 和任務處理邏輯的功能; 同時 DevOps 也具有安全的可交付性
您的 收獲:
azkaban 的部署、API、ETL 參數(shù)、user、notice 的快速實施
內(nèi)容概覽
azkaban deploy
azkaban user management
azkaban API
azkaban 參數(shù)化 Run Job
azkaban email notice 、 azkaban UI console
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
文章內(nèi)容
-azkaban??deploy
deploy 主要說明:
azkaban 作為 workflow ,運行的任務大部分遠程主機上;
而提供azkaban服務的主機負責存放 ETL 任務的腳本;
當面對任務并發(fā)數(shù)多時,可 deply 為azkaban-multi-executor 模式;
而 azkaban-multi-executor 模式的 是增加 executor 連接上 mysql 并使用統(tǒng)一的配置文件就好
加入示例:
?.../azkaban/azkaban-exec-server/build/install/azkaban-exec-server/conf/azkaban.properties
?#?mysql????
?......
?mysql.host=mysqlhost
?......
deploy 主要角色:
mysql: azkaban 后端存儲數(shù)據(jù)庫
azkaban-web: UI
console 和 API控制主機
azkaban-executor:
workflow 任務執(zhí)行的host
deploy 主要步驟:
編譯 azkaban 源碼
初始化 azkaban 數(shù)據(jù)庫
配置 azkaban 連接 、用戶 、任務調(diào)度 、郵件信息
- **源碼編譯:
git clonehttps://github.com/azkaban/azkaban.git
cd azkaban
./gradlewinstallDist
編譯后的主要文件:
azkaban/azkaban-db/build/sql
azkaban/azkaban-exec-server/build/
azkaban/azkaban-web-server/build/
zkaban/az-exec-util
azkaban/az-examples/flow20-projects/basicFlow20Project.zip
- **初始化 數(shù)據(jù)庫:
sql 文件:azkaban/azkaban-db/build/sql/create-all-sql-3.82.0-8-g11595ad.sql
- **連接配置:
database.type=mysql
- ** 用戶
azkaban-users.xml
- ** 郵箱
mail.sender=help@xxx.com
……
- ** 任務調(diào)度
允許分配至上個executor
azkaban.executorselector.comparator.LastDispatched=0
允許內(nèi)存不足1G 時分配任務
azkaban.executorselector.comparator.Memory=0
允許的任務數(shù)
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=10
允許CPU 不足1G 分配
azkaban.executorselector.comparator.CpuUsage=0……
- ** azkaban lib
cdazkaban/az-exec-util/src/main/c gcc execute-as-user.c -o execute-as-user chownroot execute-as-user chmod 6050 execute-as-user
azkaban.jobtype.plugin.dir=azkaban/azkaban-exec-server/build/install/azkaban-exec-server/plugins/jobtypesazkaban.native.lib=azkaban/az-exec-util/src/main/c
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban?user management
azkaban 用戶管理 也是 azkaban 權限管理
user.manager.xml.file=azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban-users.xml
lockdown.create.projects=true
權限級別:
Permissions? ? Values
ADMIN? ? ? ? ? ?Grants allaccess to everything in Azkaban.
READ? ? ? ? ? ? ? Gives usersread only access to every project and their logs
WRITE? ? ? ? ? ? Allowsusers to upload files, change job properties or remove any project
EXECUTE? ? ? Allowsusers to trigger the execution of any flow
SCHEDULE? ?Userscan add or remove schedules for any flows
CREATEPROJECTS? ?Allows users to create new projects if project creation is locked down
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban?API
SessionId:
curl -k -X POST--data "action=login&username=azkaban&password=azkaban"http://localhost:8081
Execute a Flow:
curl -k --get
--data 'session.id=session.id′??data′ajax=executeFlow′??data′project=session.id′??data′ajax=executeFlow′??data′project={projectname}'
--data 'flow=${flowname}' http://localhost:8081/executor
-- - parameter :
? ? ? ? ? failureEmails=xxx@xxx.com, xxy@xxx.com
-- - scriptsparameter?& otherparameter:
? ? ? ? ?flowOverride[parameter_name]=value
Schedule a period-based Flow:
curl -khttp://HOST:PORT/schedule -d "ajax=scheduleFlow&isrecurring=on
&period=5w &projectName=PROJECTNAME &flow=FLOWNAME
&projectId=PROJECTID &scheduleTime=12,00,pm,PDT&scheduleDate=07/22/2014" -b azkaban.browser.session.id=SESSION_ID
-- - parameter:
? ? ? PROJECT_ID : select * from azkaban.projects where project =''project_name";
? ? ? scheduleTime:按照 北京時間的話,要減去8小時候時間后與當前同步
? ? ? scheduleDate:flow開始時間
---
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban參數(shù)化 Run Job
模板 flow 文件:
azkaban/az-examples/flow20-projects/basicFlow20Project.zip
-- - flow 文件:flowname.flow
"${parame}"
-- - 參數(shù)文件:flowname.job
parame={parame}
-- -?flow 依賴
flowname.flow:
dependsOn:
?- jobA
--------------------------------------------------------------------------------------------------------
-azkaban?email notice
config sender
#mail settings
mail.sender=help@xxx.com
mail.host=smtp.xxx.xxx.cn
mail.port=587
mail.user=help@xxx.com
mail.password=xxx
mail.tls=true
notice config:
curl -k --get
--data 'session.id=session.id′??data′ajax=executeFlow′??data′project=session.id′??data′ajax=executeFlow′??data′project={projectname}'--data 'failureEmails=xxx@xxx.com, xxy@xxx.com' --data 'flow=${flowname}'http://localhost:8081/executor
---
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban?UI console & run a project
web lib config:
web.resource.dir=azkaban/azkaban-web-server/build/install/azkaban-web-server/web/
executor port:
?executor.port=12321
start server
-- - executor:
azkaban/azkaban-exec-server/build/install/azkaban-exec-server/bin/start-exec.sh
curlhttp://executorhost:12321/executor?action=activate
-- - web:azkaban/azkaban-web-server/build/install/azkaban-web-server/bin/start-web.sh
run a project flow by UI








