簡述
前端時間嘗試著搭建了airflow的高可用(HA)環(huán)境,為避免自己遺忘,現(xiàn)將安裝過程整理成文檔。公司環(huán)境沒有外網(wǎng),只是配置了本地yum源,安裝之前將所有的用到的包都預(yù)先整理了一下。為了方便大家閱讀我稍后會將包結(jié)構(gòu)給出,大家在閱讀的時候有些地方請根據(jù)自己的環(huán)境進行替換。
airflow_ha_install/
├── airflow-packages //該目錄下主要存放了airflow依賴的三方包,沒有外網(wǎng)只能先下載下來搞咯
├── airflow-scheduler-failover-controller-master
│ ├── CHANGELOG.md
│ ├── License.md
│ ├── README.md
│ ├── scheduler_failover_controller
│ ├── scripts
│ ├── setup.cfg
│ └── setup.py
├── pip-9.0.1.tar.gz
├── rabbitmq
│ ├── erlang-19.3.6.4-1.el7.x86_64.rpm //這個包是我編譯之后的包,大家可以通過下載源碼自行編譯
│ └── rabbitmq-server-3.7.4-1.el7.noarch.rpm
└── systemd
├── airflow
├── airflow.conf
├── airflow-flower.service
├── airflow-kerberos.service
├── airflow-scheduler.service
├── airflow-webserver.service
├── airflow-worker.service
└── README
環(huán)境規(guī)劃
| host | IP | service |
|---|---|---|
| airflow-01 | 192.168.3.191 | airflow-worker/webserver/ASFC/rabbitmq |
| airflow-02 | 192.168.3.192 | airflow-worker/webserver/ASFC/rabbitmq |
| airflow-03 | 192.168.3.193 | airflow-worker/webserver/Haproxy |
準備環(huán)境
centos7
準備環(huán)境的操作需要以root用戶完成
關(guān)閉防火墻和selinux
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
新建用戶airflow 并賦予sudo權(quán)限
useradd airflow
安裝依賴包
yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python-devel wget cyrus-sasl-devel.x86_64 libffi-devel python-psycopg2
安裝pip
上傳安裝包到目標環(huán)境中的每臺服務(wù)器
以root身份登錄目標環(huán)境中的每臺服務(wù)器執(zhí)行如下命令
yum groupinstall "Development tools" -y
yum install python-setuptools python-devel -y
tar xzvf pip-9.0.1.tar.gz
cd pip-9.0.1
python setup.py install
安裝rabbitmq_ha
單機上安裝
分別在192.168.3.191 192.168.3.192 節(jié)點上安裝rabbitmq server
- Install erlang
yum install airflow_ha_install/rabbitmq/erlang-19.3.6.4-1.el7.x86_64.rpm - Install RabbitMQ Server
yum install airflow_ha_install/rabbitmq/rabbitmq-server-3.7.4-1.el7.noarch.rpm - 啟動
rabbitmq-server start - 設(shè)置開機自啟動
systemctl enable rabbitmq-server - 啟用插件rabbitmq management
rabbitmq-plugins enable rabbitmq_management 輸入http://ip:15672可以登錄管理界面,默認賬戶guest/guest只能使用http://localhost:15672登錄,要想遠程登錄,需要添加一個新的用戶: # rabbitmqctl add_user admin admin #用戶設(shè)置為administrator才能遠程訪問 rabbitmqctl set_user_tags admin administrator rabbitmqctl set_permissions -p / admin ".*" ".*" ".*" //該命令使用戶admin具有‘/’這個virtual host中所有資源的配置、寫、讀權(quán)限以便管理其中的資源,查看所有用戶#rabbitmqctl list_users
集群搭建
#將192.168.3.191上的/var/lib/rabbitmq/.erlang.cookie復(fù)制到192 var/lib/rabbitmq/.erlang.cookie, 即服務(wù)器必須具有相同的cookie,如果不相同的話,無法搭建集群.
#192.168.3.192節(jié)點上分別執(zhí)行命令,加入到集群
systemctl restart rabbitmq-server
rabbitmqctl stop_app
rabbitmqctl join_cluster rabbit@airflow-01
rabbitmqctl start_app
#其中--ram代表是內(nèi)存節(jié)點,如果希望是磁盤節(jié)點則不用加--ram,在rabbitmq集群中,至少需要一個磁盤節(jié)點
#查看集群的狀態(tài)
rabbitmqctl cluster_status
設(shè)置成鏡像隊列
在192.168.3.191上執(zhí)行
rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode": "automatic"}'
安裝HAproxy
在192.168.3.191上安裝HAproxy
yum install haproxy
vi /etc/haproxy/haproxy.cfg
#1.修改默認的模式為tcp
#2.在文件末尾添加如下內(nèi)容:
# port forwarding from 8080 to the airflow webserver on 8080
listen impala
bind 0.0.0.0:8080
balance roundrobin
server airflow_webserver_1 auto-nn-01.embrace.com:8080 check
server airflow_webserver_2 auto-nn-02.embrace.com:8080 check
listen rabbitmq-web-ui
bind 0.0.0.0:15677
balance roundrobin
server rabbitmq_server_1 auto-cn-01.embrace.com:15672 check
server rabbitmq_server_2 auto-nn-01.embrace.com:15672 check
listen rabbitmq-ui
bind 0.0.0.0:5677
balance roundrobin
server rabbitmq_server_1 auto-cn-01.embrace.com:5672 check
server rabbitmq_server_2 auto-nn-01.embrace.com:5672 check
# This sets up the admin page for HA Proxy at port 1936.
listen stats :1936
mode http
stats enable
stats uri /
stats hide-version
stats refresh 30s
啟動
systemctl start haproxy
關(guān)閉
systemctl stop haproxy
加入系統(tǒng)服務(wù)
systemctl enbale haproxy
安裝配置airflow
安裝
以root用戶登錄目標環(huán)境中的一臺服務(wù)器,并執(zhí)行如下命令進行安裝
cd airflow_ha_install/airflow-packages
pip install --no-index --find-links . apache-airflow
pip install --no-index --find-links . apache-airflow[celery]
pip install --no-index --find-links . apache-airflow[crypto]
配置環(huán)境變量(AIRFLOW_HOME)并創(chuàng)建必要的目錄
以airflow用戶登錄目標環(huán)境中的每一臺服務(wù)器,執(zhí)行如下命令
mkdir ~/airflow
# vi ~/.bash_profile 添加如下內(nèi)容
export AIRFLOW_HOME=~/airflow
#載入環(huán)境變量
source ~/.bash_profile
#創(chuàng)建必須的目錄
cd $AIRFLOW_HOME
mkdir dags
mkdir logs
配置airflow
該部分操作只需要在目標環(huán)境中的某一臺服務(wù)器上執(zhí)行即可,執(zhí)行完成之后需將配置文件同步到其他服務(wù)器上
- 初始化數(shù)據(jù)庫
airflow initdb
2.修改配置文件($AIRFLOW_HOME/airflow.cfg)
* 修改執(zhí)行器為CeleryExecutor
excutor=CeleryExcutor
* 修改數(shù)據(jù)庫鏈接信息
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@$ip4addr:5432/airflow
* 設(shè)置dags初始化后為暫停狀態(tài)
dags_are_paused_at_creation = True
* 不引用例子
load_examples = False
* 設(shè)置消息隊列鏈接信息
broker_url = amqp://admin:admin@192.168.3.191:5677/
* 設(shè)置celery執(zhí)行pg
celery_result_backend = db+postgresql+psycopg2://postgres:postgres@192.168.3.191:5432/airflow
* 再次運行 airflow initdb,創(chuàng)建相關(guān)表結(jié)構(gòu)
airflow initdb
- 下發(fā)airflow.cfg
將airflow.cfg到集群中的每臺機器上
將airflow加入系統(tǒng)服務(wù)
以root用戶登錄目標環(huán)境中每一臺服務(wù)器,執(zhí)行如下命令
- 拷貝service文件
cd airflow_ha_install/systemd cp airflow*.service /usr/lib/systemd/system chown airflow:airflow /usr/lib/systemd/system/airflow*.service #確保文件中airflow命令的路徑是正確的 - copy the airflow.conf to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/
cd airflow_ha_install/systemd cp airflow.conf /etc/tmpfiles.d/ - 確保 /run/airflow 存在,且所有者和所屬組正確 (0755 airflow airflow)
- 拷貝airflow至/etc/sysconfig/airflow 并修改 AIRFLOW_HOME
cd airflow_ha_install/systemd cp airflow /etc/sysconfig/airflow - 加入到開機自啟動
systemctl enable airflow-webserver systemctl enable airflow-wroker systemctl enable airflow-flower
安裝airflow-scheduler-failover-controller
-
Install ASFC
cd airflow_ha_install/airflow-scheduler-failover-controller-master pip install -e . --no-index --find-links ../airflow-packages -
初始化
scheduler_failover_controller init -
更新airflow.cfg
scheduler_nodes_in_cluster = auto-nn-01,auto-nn-02 #it is recommended that you use the value printed from the following command:scheduler_failover_controller get_current_host 啟動airflow免密登錄
-
測試聯(lián)通行
scheduler_failover_controller test_connection -
確保運行正常
scheduler_failover_controller metadata -
加入到系統(tǒng)服務(wù)
Login to each of the machines acting as Scheduler Failover Controllers
Login as root
Copy the scheduler_failover_controller.service file to the systemd directory. See the bellow list for the correct location based off your environment.
? * /usr/lib/systemd/system/ for CentOS
? * /lib/systemd/system/ for UbuntuEdit the scheduler_failover_controller.service and change any configurations you would like
? a. Change user and group options as needed to ensure that it is the same as the user and group the main airflow processes are running asEnable the service to be ran on startup of the machine
? systemctl enable scheduler_failover_controllerYou're done!