簡介
Swarm是Docker官方提供的一款集群管理工具,其主要作用是把若干臺Docker主機抽象為一個整體,并且通過一個入口統(tǒng)一管理這些Docker主機上的各種Docker資源。Swarm和Kubernetes比較類似,但是更加輕,具有的功能也較kubernetes更少一些。
node
swarm 中的每個 Docker Engine 都是一個 node,有兩種類型的 node:manager 和 worker。
我們在 manager node 上執(zhí)行部署命令,manager node 會將部署任務(wù)拆解并分配給一個或多個 worker node 完成部署。
manager node 負(fù)責(zé)執(zhí)行編排和集群管理工作,保持并維護(hù) swarm 處于期望的狀態(tài)。swarm 中如果有多個 manager node,它們會自動協(xié)商并選舉出一個 leader 執(zhí)行編排任務(wù)。
woker node 接受并執(zhí)行由 manager node 派發(fā)的任務(wù)。默認(rèn)配置下 manager node 同時也是一個 worker node,不過可以將其配置成 manager-only node,讓其專職負(fù)責(zé)編排和集群管理工作。
work node 會定期向 manager node 報告自己的狀態(tài)和它正在執(zhí)行的任務(wù)的狀態(tài),這樣 manager 就可以維護(hù)整個集群的狀態(tài)。
service
service 定義了 worker node 上要執(zhí)行的任務(wù)。swarm 的主要編排任務(wù)就是保證 service 處于期望的狀態(tài)下。
舉一個 service 的例子:在 swarm 中啟動一個 http 服務(wù),使用的鏡像是 httpd:latest,副本數(shù)為 3。
manager node 負(fù)責(zé)創(chuàng)建這個 service,經(jīng)過分析知道需要啟動 3 個 httpd 容器,根據(jù)當(dāng)前各 worker node 的狀態(tài)將運行容器的任務(wù)分配下去,比如 worker1 上運行兩個容器,worker2 上運行一個容器。
運行了一段時間,worker2 突然宕機了,manager 監(jiān)控到這個故障,于是立即在 worker3 上啟動了一個新的 httpd 容器。
這樣就保證了 service 處于期望的三個副本狀態(tài)。
初始化Swarm
命令參考
[root@node191 docker]# docker swarm --help
Usage: docker swarm COMMAND
Manage Swarm
Options:
Commands:
ca Display and rotate the root CA
init Initialize a swarm
join Join a swarm as a node and/or manager
join-token Manage join tokens
leave Leave the swarm
unlock Unlock swarm
unlock-key Manage the unlock key
update Update the swarm
Run 'docker swarm COMMAND --help' for more information on a command.
[root@node191 docker]# docker node --help
Usage: docker node COMMAND
Manage Swarm nodes
Options:
Commands:
demote Demote one or more nodes from manager in the swarm
inspect Display detailed information on one or more nodes
ls List nodes in the swarm
promote Promote one or more nodes to manager in the swarm
ps List tasks running on one or more nodes, defaults to current node
rm Remove one or more nodes from the swarm
update Update a node
Run 'docker node COMMAND --help' for more information on a command.
初始化、加入節(jié)點(manager|worker)
- [x] --advertise-addr 指定與其他 node 通信的地址。根據(jù)端口開放防火墻
[root@localhost ~]# docker swarm init --advertise-addr 172.16.1.146
Swarm initialized: current node (v2tjxinr9jxfg52evpswn4yb6) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \
172.16.1.146:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
- [x] 如果當(dāng)時沒有記錄下 docker swarm init 提示的添加 worker 的完整命令,可以通過 docker swarm join-token worker 查看。
- [x] 同樣的,加入manager通過 docker swarm join-token manager 查看。
[root@localhost ~]# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-f01zejqjqfnry2tubl3cractn \
172.16.1.146:2377
[root@localhost ~]# docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \
172.16.1.146:2377
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n494afsdjzs74q5y5vb4xlgd4 node136 Ready Active
v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader
刪除swam節(jié)點
- [x] 從swarm集群中刪除節(jié)點,需要先把這個節(jié)點容器排空,然后再把節(jié)點從集群中去掉。
- [x] 排空節(jié)點
- 這個節(jié)點上的容器會先從其它節(jié)點啟動,再停掉排空節(jié)點上的容器,保證服務(wù)不受影響。
## 排空node136
[root@node146 ~]# docker node update --availability drain n494afsdjzs74q5y5vb4xlgd4
n494afsdjzs74q5y5vb4xlgd4
-
刪除指定節(jié)點
docker node rm node136
docker node rm --force node16
-
恢復(fù)節(jié)點
##將一個排空的節(jié)點恢復(fù)過來,可以正常使用
docker node update --availability Active n494afsdjzs74q5y5vb4xlgd4
-
節(jié)點離開(節(jié)點主機執(zhí)行)
## 強制離開swarm集群 docker swarm leave--force
[root@node136 ~]# docker swarm leave
Node left the swarm.
## 此時節(jié)點node136 是down的。
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n494afsdjzs74q5y5vb4xlgd4 node136 Down Active
v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader
## manager節(jié)點刪除掉這個廢棄的節(jié)點
[root@node146 ~]# docker node rm n494afsdjzs74q5y5vb4xlgd4
n494afsdjzs74q5y5vb4xlgd4
## 以manager身份重新加入
[root@node136 ~]# docker swarm join \
> --token SWMTKN-1-5tvspbnrp9g6oxu6qixwhx98wtzx0t7efwfrh6wbpfbk4id1f7-8pm1wzhfqx5e7jvl8fg61an3w \
> 172.16.1.146:2377
This node joined a swarm as a manager.
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Reachable
v2tjxinr9jxfg52evpswn4yb6 * node146 Ready Active Leader
節(jié)點降級
節(jié)點從manager降級到worker
docker node demote v2tjxinr9jxfg52evpswn4yb6
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Leader
v2tjxinr9jxfg52evpswn4yb6 node146 Down Active Unreachable
yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Reachable
[root@node146 ~]# docker node demote v2tjxinr9jxfg52evpswn4yb6
Manager v2tjxinr9jxfg52evpswn4yb6 demoted in the swarm.
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Leader
v2tjxinr9jxfg52evpswn4yb6 node146 Down Active
yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Reachable
節(jié)點升級
- [x] 節(jié)點從worker升級到manager
- [x] docker node promote c9kynm13tvcf1vfrt0m6y7pbi
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active
n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active
yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Leader
[root@node146 ~]# docker node promote c9kynm13tvcf1vfrt0m6y7pbi
Node c9kynm13tvcf1vfrt0m6y7pbi promoted to a manager in the swarm.
[root@node146 ~]# docker node promote n8dgcax0vcqmsjtc0aosx9k2q
Node n8dgcax0vcqmsjtc0aosx9k2q promoted to a manager in the swarm.
[root@node146 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
c9kynm13tvcf1vfrt0m6y7pbi node135 Ready Active Reachable
n8dgcax0vcqmsjtc0aosx9k2q node136 Ready Active Reachable
yvjirlxwpgvjohi3iagtzzkh2 * node146 Ready Active Leader
swarm 日常操作
命令參考
[root@node191 docker]# docker service --help
Usage: docker service COMMAND
Manage services
Options:
Commands:
create Create a new service
inspect Display detailed information on one or more services
logs Fetch the logs of a service or task
ls List services
ps List the tasks of one or more services
rm Remove one or more services
rollback Revert changes to a service's configuration
scale Scale one or multiple replicated services
update Update a service
Run 'docker service COMMAND --help' for more information on a command.
官方文檔:
https://docs.docker.com/engine/reference/commandline/service/
創(chuàng)建服務(wù)
- [x] 參考:https://docs.docker.com/engine/reference/commandline/service_create/#options
- [x] publish 發(fā)布的端口,所有swarm節(jié)點都可以訪問,哪怕沒有運行對應(yīng)服務(wù)的容器。
docker service create --name nginx-service --replicas=3 --publish 8080:8080 nginx:latest
如果倉庫是私有倉庫,記得增加--with-registry-auth 這個參數(shù),否則其他節(jié)點無法拉取鏡像,例如:
docker login 172.16.1.146 -p ***** -u admin; docker service create --with-registry-auth --name tomcat-logs-test --replicas=2 --publish 10080:8080 172.16.1.146/wondertek/docker-test:1.0.0-2018091910
查看服務(wù)信息
docker service ps docker-test
服務(wù)擴容
docker service scale docker-test=3
label 定義
- [x] 約束可以匹配節(jié)點或docker engine的labels,如下:
| 節(jié)點屬性 | 匹配 | 示例 |
|---|---|---|
| node.id | 節(jié)點ID | node.id == 2ivku8v2gvtg4 |
| node.hostname | 節(jié)點主機名 | node.hostname != node-2 |
| node.role | 節(jié)點角色:manager | node.role == manager |
| node.labels | 用戶定義節(jié)點labels | node.labels.security == high |
| engine.labels | Docker Engine的labels | engine.labels.operatingsystem == ubuntu 14.04 |
- [x] engine.labels匹配docker engine的lables,如操作系統(tǒng),驅(qū)動等。集群管理員通過使用docker node update命令來添加node.labels以更好使用節(jié)點。
- [x] 添加標(biāo)簽
docker node update --label-add type=manager node146
[root@node146 ~]# docker node inspect node146 --pretty
ID: v2tjxinr9jxfg52evpswn4yb6
Labels:
- type = manager
Hostname: node146
Joined at: 2018-07-16 06:26:49.516457267 +0000 utc
Status:
State: Ready
Availability: Active
Address: 127.0.0.1
Manager Status:
Address: 172.16.1.146:2377
Raft Status: Reachable
Leader: Yes
Platform:
Operating System: linux
Architecture: x86_64
Resources:
CPUs: 8
Memory: 9.765 GiB
Plugins:
Network: bridge, host, macvlan, null, overlay
Volume: local
Engine Version: 1.13.1
- [x] 刪除標(biāo)簽
docker node update --label-rm type node146
- [x] 指定標(biāo)簽運行
docker service rm my_web
docker node update --label-add env=test node135
docker node update --label-add env=prod node136
docker service create \
--constraint node.labels.env==test \
--replicas 3 \
--name my_web2 \
--publish 8080:80 \
httpd
[root@node146 ~]# docker service ps my_web2
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
lzle9hto7mk0 my_web2.1 httpd:latest node135 Running Running 4 seconds ago
j9ujd6mcs2ex my_web2.2 httpd:latest node135 Running Running 5 seconds ago
lqc4apjhonen my_web2.3 httpd:latest node135 Running Running 3 seconds ago
[root@node146 ~]# docker service inspect my_web2 --pretty
ID: m7s5ura6bmjg1nd60lfwn8voa
Name: my_web2
Service Mode: Replicated
Replicas: 3
Placement:Contraints: [node.labels.env==test]
UpdateConfig:
Parallelism: 1
On failure: pause
Max failure ratio: 0
ContainerSpec:
Image: httpd:latest@sha256:2edbf09d0dbdf2a3e21e4cb52f3385ad916c01dc2528868bc3499111cc54e937
Resources:
Endpoint Mode: vip
Ports:
PublishedPort 8080
Protocol = tcp
TargetPort = 80
刪除服務(wù)
docker service rm docker-test
自定義overlay網(wǎng)絡(luò)
[root@node135 ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
4888eb34115b bridge bridge local
5dda44146214 docker_gwbridge bridge local
4dda8692018b host host local
mumblsrh5oe4 ingress overlay swarm
1fcd0ef0748f none null local
docker network create --driver overlay --subnet 10.22.1.0/24 swarm_net
- [x] 使用自定義網(wǎng)絡(luò)創(chuàng)建服務(wù)
docker service create --name my_web --replicas=3 --network swarm_net httpd
docker service create --name util --network swarm_net busybox sleep 10000000
- [x] 同一overlay網(wǎng)絡(luò)網(wǎng)絡(luò)測試
docker exec util.1.muu3o4906mihbp1v8r3ejh80p nslookup tasks.my_web
docker exec util.1.muu3o4906mihbp1v8r3ejh80p ping -c 3 my_web
服務(wù)升級
docker service update --image httpd:2.2.32 my_web
- [x] Swarm 可以在 service 創(chuàng)建或運行過程中靈活地通過 --replicas 調(diào)整容器副本的數(shù)量,內(nèi)部調(diào)度器則會根據(jù)當(dāng)前集群的資源使用狀況在不同 node 上啟停容器,這就是 service 默認(rèn)的 replicated mode。
- [x] 在此模式下,node 上運行的副本數(shù)有多有少,一般情況下,資源更豐富的 node 運行的副本數(shù)更多,反之亦然。
- [x] 除了 replicated mode,service 還提供了一個 globalmode,其作用是強制在每個 node 上都運行一個且最多一個副本。
## global mode
docker service create \
--mode global \
--name logspout \
--mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock \
gliderlabs/logspout
- [x] service 增加到六個副本,每次更新兩個副本,間隔時間一分半鐘。
docker service update --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web
## 指定新的鏡像
docker service update --image httpd:2.2.32 --replicas 6 --update-parallelism 2 --update-delay 1m30s my_web
- [x] 查看服務(wù)升級過程
[root@node146 ~]# docker service ps my_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ku14zmzkpo9a my_web.1 httpd:2.2.32 node135 Running Running about a minute ago
qh6pzjb6syt0 \_ my_web.1 httpd:latest node135 Shutdown Shutdown about a minute ago
0muer26mxx1d my_web.2 httpd:latest node136 Running Running 22 hours ago
k8ybfbc6j20y my_web.3 httpd:2.2.32 node146 Running Running about a minute ago
xr0adp42t7tm \_ my_web.3 httpd:latest node146 Shutdown Shutdown about a minute ago
acd06qrmmnrr my_web.4 httpd:2.2.32 node135 Running Running about a minute ago
jae5i5lhlnb2 my_web.5 httpd:2.2.32 node146 Running Running about a minute ago
3zk4i1drb1nk my_web.6 httpd:2.2.32 node136 Running Running about a minute ago
- [x] 刪除并添加新的 constraint,設(shè)置 node.labels.env==prod
docker service update --constraint-rm node.labels.env==test my_web2
docker service update --constraint-add node.labels.env==prod my_web2
回滾
- [x] 回滾到上一次操作,只能回滾一次。
docker service update --rollback my_web
- [x] 再次回滾,就是重復(fù)剛剛的升級操作。
健康檢查
- [x] 對于提供 HTTP 服務(wù)接口的應(yīng)用,常用的 Health Check 是通過 curl 檢查 HTTP 狀態(tài)碼,比如:
curl --fail http://localhost:8080/ || exit 1 - [x] 如果 curl 命令檢測到任何一個錯誤的 HTTP 狀態(tài)碼,則返回 1,Health Check 失敗。
docker service create --name my_web3 \
--health-cmd "curl --fail http://localhost:8091 || exit 1" \
httpd
[x] --health-cmd Health Check 的命令,還有幾個相關(guān)的參數(shù):
[x] 1. --timeout 命令超時的時間,默認(rèn) 30s。
[x] 2. --interval 命令執(zhí)行的間隔時間,默認(rèn) 30s。
[x] 3. --retries 命令失敗重試的次數(shù),默認(rèn)為 3,如果 3 次都失敗了則會將容器標(biāo)記為 unhealthy。swarm 會銷毀并重建 unhealthy 的副本。
[x] 查看健康檢查信息
docker inspect b671e3100133
"Health": {
"Status": "unhealthy",
"FailingStreak": 3,
"Log": [
{
"Start": "2018-07-18T14:40:18.941056152+08:00",
"End": "2018-07-18T14:40:19.027466281+08:00",
"ExitCode": 1,
"Output": "/bin/sh: 1: curl: not found\n"
},
{
"Start": "2018-07-18T14:40:49.027620925+08:00",
"End": "2018-07-18T14:40:49.076160261+08:00",
"ExitCode": 1,
"Output": "/bin/sh: 1: curl: not found\n"
},
{
"Start": "2018-07-18T14:41:19.076291897+08:00",
"End": "2018-07-18T14:41:19.124894642+08:00",
"ExitCode": 1,
"Output": "/bin/sh: 1: curl: not found\n"
}
]
}