airflow高可用(HA)環(huán)境搭建

簡述

前端時間嘗試著搭建了airflow的高可用(HA)環(huán)境,為避免自己遺忘,現(xiàn)將安裝過程整理成文檔。公司環(huán)境沒有外網(wǎng),只是配置了本地yum源,安裝之前將所有的用到的包都預(yù)先整理了一下。為了方便大家閱讀我稍后會將包結(jié)構(gòu)給出,大家在閱讀的時候有些地方請根據(jù)自己的環(huán)境進行替換。
airflow_ha_install/
├── airflow-packages //該目錄下主要存放了airflow依賴的三方包,沒有外網(wǎng)只能先下載下來搞咯
├── airflow-scheduler-failover-controller-master
│ ├── CHANGELOG.md
│ ├── License.md
│ ├── README.md
│ ├── scheduler_failover_controller
│ ├── scripts
│ ├── setup.cfg
│ └── setup.py
├── pip-9.0.1.tar.gz
├── rabbitmq
│ ├── erlang-19.3.6.4-1.el7.x86_64.rpm //這個包是我編譯之后的包,大家可以通過下載源碼自行編譯
│ └── rabbitmq-server-3.7.4-1.el7.noarch.rpm
└── systemd
├── airflow
├── airflow.conf
├── airflow-flower.service
├── airflow-kerberos.service
├── airflow-scheduler.service
├── airflow-webserver.service
├── airflow-worker.service
└── README

環(huán)境規(guī)劃

host IP service
airflow-01 192.168.3.191 airflow-worker/webserver/ASFC/rabbitmq
airflow-02 192.168.3.192 airflow-worker/webserver/ASFC/rabbitmq
airflow-03 192.168.3.193 airflow-worker/webserver/Haproxy

準備環(huán)境

centos7
準備環(huán)境的操作需要以root用戶完成

關(guān)閉防火墻和selinux

systemctl stop firewalld
systemctl disable firewalld
setenforce 0 
sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config

新建用戶airflow 并賦予sudo權(quán)限

useradd airflow

安裝依賴包

yum groupinstall "Development tools"
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel python-devel wget cyrus-sasl-devel.x86_64 libffi-devel python-psycopg2

安裝pip

上傳安裝包到目標環(huán)境中的每臺服務(wù)器
以root身份登錄目標環(huán)境中的每臺服務(wù)器執(zhí)行如下命令

yum groupinstall "Development tools" -y
yum install python-setuptools python-devel -y
tar xzvf pip-9.0.1.tar.gz
cd pip-9.0.1
python setup.py install

安裝rabbitmq_ha

單機上安裝

分別在192.168.3.191 192.168.3.192 節(jié)點上安裝rabbitmq server

  1. Install erlang
    yum install airflow_ha_install/rabbitmq/erlang-19.3.6.4-1.el7.x86_64.rpm
    
  2. Install RabbitMQ Server
    yum install airflow_ha_install/rabbitmq/rabbitmq-server-3.7.4-1.el7.noarch.rpm
    
  3. 啟動
    rabbitmq-server start
    
  4. 設(shè)置開機自啟動
    systemctl enable rabbitmq-server
    
  5. 啟用插件rabbitmq management
    rabbitmq-plugins enable rabbitmq_management
    輸入http://ip:15672可以登錄管理界面,默認賬戶guest/guest只能使用http://localhost:15672登錄,要想遠程登錄,需要添加一個新的用戶:
    # rabbitmqctl add_user admin admin
    #用戶設(shè)置為administrator才能遠程訪問
    rabbitmqctl set_user_tags admin administrator 
    rabbitmqctl set_permissions -p / admin ".*" ".*" ".*" //該命令使用戶admin具有‘/’這個virtual host中所有資源的配置、寫、讀權(quán)限以便管理其中的資源,查看所有用戶#rabbitmqctl list_users
    

集群搭建

#將192.168.3.191上的/var/lib/rabbitmq/.erlang.cookie復(fù)制到192     var/lib/rabbitmq/.erlang.cookie, 即服務(wù)器必須具有相同的cookie,如果不相同的話,無法搭建集群.
#192.168.3.192節(jié)點上分別執(zhí)行命令,加入到集群
systemctl restart rabbitmq-server
rabbitmqctl stop_app
rabbitmqctl join_cluster  rabbit@airflow-01
rabbitmqctl start_app
#其中--ram代表是內(nèi)存節(jié)點,如果希望是磁盤節(jié)點則不用加--ram,在rabbitmq集群中,至少需要一個磁盤節(jié)點
#查看集群的狀態(tài)
rabbitmqctl cluster_status

設(shè)置成鏡像隊列
在192.168.3.191上執(zhí)行

rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode": "automatic"}'

安裝HAproxy

在192.168.3.191上安裝HAproxy

yum install haproxy
vi /etc/haproxy/haproxy.cfg 
#1.修改默認的模式為tcp
#2.在文件末尾添加如下內(nèi)容:
# port forwarding from 8080 to the airflow webserver on 8080
listen impala
 bind 0.0.0.0:8080
 balance roundrobin
 server airflow_webserver_1 auto-nn-01.embrace.com:8080 check
 server airflow_webserver_2 auto-nn-02.embrace.com:8080 check
 
listen rabbitmq-web-ui
 bind 0.0.0.0:15677
 balance roundrobin
 server rabbitmq_server_1 auto-cn-01.embrace.com:15672 check
 server rabbitmq_server_2 auto-nn-01.embrace.com:15672 check
 
listen rabbitmq-ui
 bind 0.0.0.0:5677
 balance roundrobin
 server rabbitmq_server_1 auto-cn-01.embrace.com:5672 check
 server rabbitmq_server_2 auto-nn-01.embrace.com:5672 check

# This sets up the admin page for HA Proxy at port 1936.
listen stats :1936
 mode http
 stats enable
 stats uri /
 stats hide-version
 stats refresh 30s

啟動

systemctl start haproxy

關(guān)閉

systemctl stop haproxy

加入系統(tǒng)服務(wù)

systemctl enbale haproxy

安裝配置airflow

安裝

以root用戶登錄目標環(huán)境中的一臺服務(wù)器,并執(zhí)行如下命令進行安裝

cd airflow_ha_install/airflow-packages
pip install --no-index --find-links . apache-airflow
pip install --no-index --find-links . apache-airflow[celery]
pip install --no-index --find-links . apache-airflow[crypto]

配置環(huán)境變量(AIRFLOW_HOME)并創(chuàng)建必要的目錄

以airflow用戶登錄目標環(huán)境中的每一臺服務(wù)器,執(zhí)行如下命令

mkdir ~/airflow
# vi ~/.bash_profile 添加如下內(nèi)容
export AIRFLOW_HOME=~/airflow

#載入環(huán)境變量
source ~/.bash_profile
#創(chuàng)建必須的目錄

cd $AIRFLOW_HOME
mkdir dags
mkdir logs

配置airflow

該部分操作只需要在目標環(huán)境中的某一臺服務(wù)器上執(zhí)行即可,執(zhí)行完成之后需將配置文件同步到其他服務(wù)器上

  1. 初始化數(shù)據(jù)庫
    airflow initdb
    

2.修改配置文件($AIRFLOW_HOME/airflow.cfg)
* 修改執(zhí)行器為CeleryExecutor
excutor=CeleryExcutor
* 修改數(shù)據(jù)庫鏈接信息
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@$ip4addr:5432/airflow
* 設(shè)置dags初始化后為暫停狀態(tài)
dags_are_paused_at_creation = True
* 不引用例子
load_examples = False
* 設(shè)置消息隊列鏈接信息
broker_url = amqp://admin:admin@192.168.3.191:5677/
* 設(shè)置celery執(zhí)行pg
celery_result_backend = db+postgresql+psycopg2://postgres:postgres@192.168.3.191:5432/airflow
* 再次運行 airflow initdb,創(chuàng)建相關(guān)表結(jié)構(gòu)
airflow initdb

  1. 下發(fā)airflow.cfg
    將airflow.cfg到集群中的每臺機器上

將airflow加入系統(tǒng)服務(wù)

以root用戶登錄目標環(huán)境中每一臺服務(wù)器,執(zhí)行如下命令

  1. 拷貝service文件
    cd airflow_ha_install/systemd
    cp airflow*.service /usr/lib/systemd/system
    chown airflow:airflow /usr/lib/systemd/system/airflow*.service
    #確保文件中airflow命令的路徑是正確的
    
  2. copy the airflow.conf to /etc/tmpfiles.d/ or /usr/lib/tmpfiles.d/
    cd airflow_ha_install/systemd
    cp  airflow.conf  /etc/tmpfiles.d/
    
  3. 確保 /run/airflow 存在,且所有者和所屬組正確 (0755 airflow airflow)
  4. 拷貝airflow至/etc/sysconfig/airflow 并修改 AIRFLOW_HOME
    cd airflow_ha_install/systemd
    cp airflow /etc/sysconfig/airflow
    
  5. 加入到開機自啟動
    systemctl enable airflow-webserver
    systemctl enable airflow-wroker
    systemctl enable airflow-flower
    

安裝airflow-scheduler-failover-controller

  1. Install ASFC

    cd airflow_ha_install/airflow-scheduler-failover-controller-master
    pip install -e . --no-index --find-links ../airflow-packages
    
  2. 初始化

    scheduler_failover_controller init
    
  3. 更新airflow.cfg

    scheduler_nodes_in_cluster = auto-nn-01,auto-nn-02
    #it is recommended that you use the value printed from the following command:scheduler_failover_controller get_current_host
    
  4. 啟動airflow免密登錄

  5. 測試聯(lián)通行

    scheduler_failover_controller test_connection
    
  6. 確保運行正常

    scheduler_failover_controller metadata
    
  7. 加入到系統(tǒng)服務(wù)

    1. Login to each of the machines acting as Scheduler Failover Controllers

    2. Login as root

    3. Copy the scheduler_failover_controller.service file to the systemd directory. See the bellow list for the correct location based off your environment.
      ? * /usr/lib/systemd/system/ for CentOS
      ? * /lib/systemd/system/ for Ubuntu

    4. Edit the scheduler_failover_controller.service and change any configurations you would like
      ? a. Change user and group options as needed to ensure that it is the same as the user and group the main airflow processes are running as

    5. Enable the service to be ran on startup of the machine
      ? systemctl enable scheduler_failover_controller

    6. You're done!

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容