參考:
- https://zhuanlan.zhihu.com/p/32328591
- https://juejin.im/post/5d397288f265da1bce3e1585
- http://www.yfshare.vip/2019/01/28/k8s%E9%9B%86%E7%BE%A4%E6%B0%B4%E5%B9%B3%E6%89%A9%E5%B1%95-HPA/
簡介
Horizontal Pod Autoscaling,簡稱HPA,是Kubernetes中實現(xiàn)POD水平自動伸縮的功能。
K8S集群可以通過Replication Controller的scale機制完成服務的擴容或縮容,實現(xiàn)具有伸縮性的服務。
K8S自動伸縮分為:
- sacle手動伸縮。見k8s滾動升級(RollingUpdate)
- autoscale自動伸縮,見HPA。
自動擴展主要分為兩種:
- 水平擴展(scale out),針對于實例數(shù)目的增減。
- 垂直擴展(scal up),即單個實例可以使用的資源的增減, 比如增加cpu和增大內(nèi)存。
HPA屬于前者。它可以根據(jù)CPU使用率或應用自定義metrics自動擴展Pod數(shù)量(支持 replication controller、deployment 和 replica set)。
獲取metrics的兩種方式:
- Heapster:heapster提供metrics服務,但是在v1(autoscaling/v1)版本中僅支持以CPU作為擴展度量指標。而其他比如:內(nèi)存,網(wǎng)絡流量,qps等目前處于beta階段(autoscaling/v2beta1)。
- Cousom:同樣處于beta階段(autoscaling/v2beta1),但是涉及到自定義的REST API的開發(fā),復雜度會大一些,并且當需要從自定義的監(jiān)控中獲取數(shù)據(jù)時,只能設(shè)置絕對值,無法設(shè)置使用率。
工作流程
- 創(chuàng)建HPA資源,設(shè)定目標CPU使用率限額,以及最大/最小實例數(shù),一定要設(shè)置Pod的資源限制參數(shù): request,否則HPA不會工作。
- 控制管理器每隔30s(在kube-controller-manager.service中可以通過
–-horizontal-pod-autoscaler-sync-period修改)查詢metrics的資源使用情況。 - 然后與創(chuàng)建時設(shè)定的值和指標做對比(平均值之和/限額),求出目標調(diào)整的實例個數(shù)。
- 目標調(diào)整的實例數(shù)不能超過第一條中設(shè)定的最大/最小實例數(shù)。如果沒有超過,則擴容;超過,則擴容至最大的實例個數(shù)。
- 重復第2-4步。
自動伸縮算法
HPA Controller會通過調(diào)整副本數(shù)量使得CPU使用率盡量向期望值靠近,而且不是完全相等。另官方考慮到自動擴展的決策可能需要一段時間才會生效:例如當pod所需要的CPU負荷過大,從而在創(chuàng)建一個新pod的過程中,系統(tǒng)的CPU使用量可能會同樣在有一個攀升的過程。所以在每一次作出決策后的一段時間內(nèi),將不再進行擴展決策。對于擴容而言,這個時間段為3分鐘,縮容為5分鐘(可以通過--horizontal-pod-autoscaler-downscale-delay,--horizontal-pod-autoscaler-upscale-delay進行調(diào)整)。
- HPA Controller中有一個tolerance(容忍力)的概念,它允許一定范圍內(nèi)的使用量的不穩(wěn)定,現(xiàn)在默認為0.1,這也是出于維護系統(tǒng)穩(wěn)定性的考慮。例如設(shè)定HPA調(diào)度策略為cpu使用率高于50%觸發(fā)擴容,那么只有當使用率大于55%或者小于45%才會觸發(fā)伸縮活動,HPA會盡力把Pod的使用率控制在這個范圍之間。
- 具體的每次擴容或者縮容的多少Pod的算法為:Ceil(前采集到的使用率 / 用戶自定義的使用率) * Pod數(shù)量)。
- 每次最大擴容pod數(shù)量不會超過當前副本數(shù)量的2倍。
HPA apiversion的三個版本
- autoscaling/v1 只支持CPU一種伸縮指標
- autoscaling/v2beta1 支持了
Resource Metrics和Custom Metrics - autoscaling/v2beta2 中額外增加了
External Metrics的支持
基本原理
假設(shè)存在一個叫 A 的 Deployment,包含3個 Pod,每個副本的 Request 值是 1 核,當前 3 個 Pod 的 CPU 利用率分別是 60%、70% 與 80%,此時我們設(shè)置 HPA閾值為 50%,最小副本為 3,最大副本為 10。接下來我們將上述的數(shù)據(jù)帶入公式中:
- 總的 Pod 的利用率是 60%+70%+80% = 210%
- 當前的 Target 是 3
- 算式的結(jié)果是 70%,大于50%閾值,因此當前的 Target 數(shù)目過小,需要進行擴容
- 重新設(shè)置 Target 值為 5,此時算式的結(jié)果為 42% 低于 50%,判斷還需要擴容兩個容器
- 此時 HPA 設(shè)置 Replicas 為 5,進行 Pod 的水平擴容。
經(jīng)過上面的推演,可以協(xié)助開發(fā)者快速理解 HPA 最核心的原理,不過上面的推演結(jié)果和實際情況下是有所出入的,如果開發(fā)者進行試驗的話,會發(fā)現(xiàn) Replicas 最終的結(jié)果是 6 而不是 5。這是由于 HPA 中一些細節(jié)的處理導致的,主要包含如下三個主要的方面:
噪聲處理
通過上面的公式可以發(fā)現(xiàn),Target 的數(shù)目很大程度上會影響最終的結(jié)果,而在 Kubernetes 中,無論是變更或者升級,都更傾向于使用 Recreate 而不是 Restart 的方式進行處理。這就導致了在 Deployment 的生命周期中,可能會出現(xiàn)某一個時間,Target 會由于計算了 Starting 或者 Stopping 的 Pod 而變得很大。這就會給 HPA 的計算帶來非常大的噪聲,在 HPA Controller 的計算中,如果發(fā)現(xiàn)當前的對象存在 Starting 或者 Stopping 的 Pod 會直接跳過當前的計算周期,等待狀態(tài)都變?yōu)?Running 再進行計算。
冷卻周期
在彈性伸縮中,冷卻周期是不能逃避的一個話題,很多時候我們期望快速彈出與快速回收,而另一方面,我們又不希望集群震蕩,所以一個彈性伸縮活動冷卻周期的具體數(shù)值是多少,一直被開發(fā)者所挑戰(zhàn)。在 HPA 中,默認的擴容冷卻周期是 3 分鐘,縮容冷卻周期是 5 分鐘。
界值計算
我們回到剛才的計算公式,第一次我們算出需要彈出的容器數(shù)目是 5,此時擴容后整體的負載是 42%,但是我們似乎忽略了一個問題:一個全新的 Pod 啟動會不會自己就占用了部分資源?此外,8% 的緩沖區(qū)是否就能夠緩解整體的負載情況?要知道當一次彈性擴容完成后,下一次擴容要最少等待 3 分鐘才可以繼續(xù)擴容。為了解決這些問題,HPA 引入了邊界值 △,目前在計算邊界條件時,會自動加入 10% 的緩沖,這也是為什么在剛才的例子中最終的計算結(jié)果為 6 的原因。
測試前準備
環(huán)境說明
| IP | 角色 |
|---|---|
| 192.168.1.155 | master |
| 192.168.1.156 | node01 |
| 192.168.1.157 | node02 |
$ kubectl get hpa
No resources found.
[root@master rbtest]# kubectl cluster-info
Kubernetes master is running at https://192.168.1.155:6443
KubeDNS is running at https://192.168.1.155:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
wget http://download.baiyongjie.com/kubernetes/yaml/heapster-1.5.3.tar.gz
tar zxf heapster-1.5.3.tar.gz
cd heapster
kubectl apply -f .
kubectl apply -f influxdb/influxdb.yaml
kubectl apply -f influxdb/heapster.yaml
#查看啟動狀態(tài)
$ kubectl get pods -n kube-system |grep -E 'heap|influxdb'
heapster-5478bf8664-qttvz 1/1 Running 0 23s
monitoring-influxdb-c5c9dfd5d-tqxtm 1/1 Running 0 27s
測試heapster是否生效
# kubectl top nodes node02
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node02 62m 0% 792Mi 21%
# kubectl top pod -n kube-system heapster-5478bf8664-qttvz
NAME CPU(cores) MEMORY(bytes)
heapster-5478bf8664-qttvz 2m 39Mi
測試HPA(apiVersion: autoscaling/v1)
部署deployment用于測試
$ vim hpatest.yaml
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: hpatest
spec:
replicas: 2
selector:
matchLabels:
app: hpatest
template:
metadata:
labels:
app: hpatest
spec:
containers:
- name: hpatest
image: nginx:1.10
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c","/usr/sbin/nginx; while true;do echo `hostname -I` > /usr/share/nginx/html/index.html; sleep 120;done"]
ports:
- name: http
containerPort: 80
resources:
requests:
cpu: 1m
memory: 100Mi
limits:
cpu: 3m
memory: 400Mi
---
apiVersion: v1
kind: Service
metadata:
name: hpatest-svc
spec:
selector:
app: hpatest
ports:
- port: 80
targetPort: 80
protocol: TCP
# 創(chuàng)建svc和deploy
$ kubectl apply -f hpatest.yaml
# 查看啟動狀態(tài)
$ kubectl get pods,svc | grep hpatest
pod/hpatest-5fb79d5cd-9w2kv 1/1 Running 0 9s
pod/hpatest-5fb79d5cd-k4pb8 1/1 Running 0 16s
service/hpatest-svc ClusterIP 10.99.75.184 <none> 80/TCP 42s
修改kube-controller-manager參數(shù)
因為我是用kubeadm安裝的,所以修改/etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
......
- --horizontal-pod-autoscaler-use-rest-clients=false #新增
image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.13.4
==如果是二進制方式安裝的話則修改/etc/systemd/system/kube-controller-manager.service服務文件==
創(chuàng)建hpa資源文件
$ vim hpatest-hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: haptest-nginx
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: haptest
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 50
# 查看是否獲取到pod cpu值
# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpatest-nginx Deployment/hpatest 0%/50% 2 6 2 2m
模擬壓測,查看是否可以自動伸縮
壓測腳本
$ vim hpatest.sh
while true
do
wget -q -O- http://10.99.75.184
done
$ sh hpatest.sh
關(guān)注壓測過程中pod副本數(shù)變化
開始壓測前監(jiān)控pod狀態(tài),查看是可以做到自動擴容
$ kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hpatest-5fb79d5cd-9w2kv 1/1 Running 0 46m 10.244.1.121 node01 <none> <none>
hpatest-5fb79d5cd-k4pb8 1/1 Running 0 46m 10.244.0.201 master <none> <none>
--- 可以看到下面新增pod的過程 ---
hpatest-5fb79d5cd-w7q8z 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-w7q8z 0/1 Pending 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 Pending 0 0s <none> node01 <none> <none>
hpatest-5fb79d5cd-w7q8z 0/1 ContainerCreating 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 0/1 ContainerCreating 0 0s <none> node01 <none> <none>
hpatest-5fb79d5cd-w7q8z 1/1 Running 0 6s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Running 0 7s 10.244.1.122 node01 <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 Pending 0 0s <none> <none> <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 Pending 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 Pending 0 0s <none> master <none> <none>
hpatest-5fb79d5cd-4vgpb 0/1 ContainerCreating 0 0s <none> node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 0/1 ContainerCreating 0 0s <none> master <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Running 0 6s 10.244.0.202 master <none> <none>
hpatest-5fb79d5cd-4vgpb 1/1 Running 0 6s 10.244.2.157 node02 <none> <none>
停止壓測后查看pod能否在自動伸縮
# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hpatest-5fb79d5cd-4vgpb 1/1 Running 0 2m4s 10.244.2.157 node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Running 0 2m4s 10.244.0.202 master <none> <none>
hpatest-5fb79d5cd-9w2kv 1/1 Running 0 50m 10.244.1.121 node01 <none> <none>
hpatest-5fb79d5cd-k4pb8 1/1 Running 0 50m 10.244.0.201 master <none> <none>
hpatest-5fb79d5cd-w7q8z 1/1 Running 0 2m19s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Running 0 2m19s 10.244.1.122 node01 <none> <none>
--- 可以看到已經(jīng)開始刪除了 ---
hpatest-5fb79d5cd-w7q8z 1/1 Terminating 0 6m45s 10.244.2.156 node02 <none> <none>
hpatest-5fb79d5cd-wcpf4 1/1 Terminating 0 6m45s 10.244.1.122 node01 <none> <none>
hpatest-5fb79d5cd-4vgpb 1/1 Terminating 0 6m30s 10.244.2.157 node02 <none> <none>
hpatest-5fb79d5cd-4zp2q 1/1 Terminating 0 6m30s 10.244.0.202 master <none> <none>
查看自動伸縮產(chǎn)生的事件記錄
$ kubectl describe hpa hpatest-nginx
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 21m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 21m horizontal-pod-autoscaler New size: 6; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 14m horizontal-pod-autoscaler New size: 2; reason: All metrics below target
測試HPA(autoscaling/v2beta1)
刪除autoscaling/v1的HPA
$ kubectl delete horizontalpodautoscalers.autoscaling hpatest-nginx
horizontalpodautoscaler.autoscaling "hpatest-nginx" deleted
創(chuàng)建autoscaling/v2beta1的HPA
$ vim hpatest-hpa-v2beta1.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpav2beta1
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: hpatest
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageValue: 200Mi
$ kubectl apply -f hpatest-hpa-v2beta1.yaml
horizontalpodautoscaler.autoscaling/hpav2beta1 created
$ kubectl get hpa hpav2beta1
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 2 14m
模擬壓測,查看是否可以自動伸縮
獲取svc的IP地址
$ kubectl get svc|grep hpatest
hpatest-svc ClusterIP 10.99.75.184 <none> 80/TCP 4h32m
$ vim hpatest.sh
while true
do
wget -q -O- http://10.99.75.184
done
$ sh hpatest.sh
以下監(jiān)控同時進行 不分先后順序
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
hpatest-59fc9f47b6-fmnm6 1/1 Running 0 36m
hpatest-59fc9f47b6-vmgz7 1/1 Running 0 36m
--- 可以看到下面新增pod的過程 ---
hpatest-59fc9f47b6-xhvv4 0/1 Pending 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 Pending 0 0s
hpatest-59fc9f47b6-xhvv4 0/1 Pending 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 Pending 0 0s
hpatest-59fc9f47b6-xhvv4 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-k5gmz 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-k5gmz 1/1 Running 0 7s
hpatest-59fc9f47b6-xhvv4 1/1 Running 0 8s
hpatest-59fc9f47b6-lx7dv 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 Pending 0 0s
hpatest-59fc9f47b6-lx7dv 0/1 Pending 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 Pending 0 0s
hpatest-59fc9f47b6-zncgh 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 Pending 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 Pending 0 0s
hpatest-59fc9f47b6-zncgh 0/1 Pending 0 0s
hpatest-59fc9f47b6-mdjbl 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-zncgh 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-lx7dv 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-dj6dj 0/1 ContainerCreating 0 0s
hpatest-59fc9f47b6-mdjbl 1/1 Running 0 7s
hpatest-59fc9f47b6-dj6dj 1/1 Running 0 7s
hpatest-59fc9f47b6-lx7dv 1/1 Running 0 7s
hpatest-59fc9f47b6-zncgh 1/1 Running 0 9s
$ kubectl get hpa -o wide -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 50%/80% 2 8 2 20m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 2 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 4 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 8 21m
hpav2beta1 Deployment/hpatest 4%/70%, 750%/80% 2 8 8 22m
hpav2beta1 Deployment/hpatest 4%/70%, 587%/80% 2 8 8 23m
停止監(jiān)控后的監(jiān)控 不分先后順序
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
hpatest-59fc9f47b6-dj6dj 1/1 Running 0 2m42s
hpatest-59fc9f47b6-fmnm6 1/1 Running 0 41m
hpatest-59fc9f47b6-k5gmz 1/1 Running 0 2m58s
hpatest-59fc9f47b6-lx7dv 1/1 Running 0 2m42s
hpatest-59fc9f47b6-mdjbl 1/1 Running 0 2m42s
hpatest-59fc9f47b6-vmgz7 1/1 Running 0 41m
hpatest-59fc9f47b6-xhvv4 1/1 Running 0 2m58s
hpatest-59fc9f47b6-zncgh 1/1 Running 0 2m42s
--- 可以看到已經(jīng)開始刪除了 ---
hpatest-59fc9f47b6-zncgh 1/1 Terminating 0 8m16s
hpatest-59fc9f47b6-mdjbl 1/1 Terminating 0 8m16s
hpatest-59fc9f47b6-xhvv4 1/1 Terminating 0 8m32s
hpatest-59fc9f47b6-k5gmz 1/1 Terminating 0 8m32s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m4s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m48s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m49s
hpatest-59fc9f47b6-mdjbl 0/1 Terminating 0 8m49s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-xhvv4 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m5s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m10s
hpatest-59fc9f47b6-k5gmz 0/1 Terminating 0 9m10s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m54s
hpatest-59fc9f47b6-zncgh 0/1 Terminating 0 8m54s
$ kubectl get hpa -o wide -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpav2beta1 Deployment/hpatest 4%/70%, 50%/80% 2 8 2 20m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 2 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 4 21m
hpav2beta1 Deployment/hpatest 4%/70%, 850%/80% 2 8 8 21m
hpav2beta1 Deployment/hpatest 4%/70%, 750%/80% 2 8 8 22m
hpav2beta1 Deployment/hpatest 4%/70%, 587%/80% 2 8 8 23m
--- 觀察REPLICAS的變化,可以觀察到收縮操作要5分鐘以后 ---
hpav2beta1 Deployment/hpatest 4%/70%, 387%/80% 2 8 8 24m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 25m
hpav2beta1 Deployment/hpatest 4%/70%, 37%/80% 2 8 8 26m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 27m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 28m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 8 29m
hpav2beta1 Deployment/hpatest 5%/70%, 0%/80% 2 8 4 30m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 4 31m
hpav2beta1 Deployment/hpatest 4%/70%, 0%/80% 2 8 2 32m