k8s-探針(四)
存活探針 (liveness probe)
k8s 可以通過存活探針 (liveness probe) 檢查容器是否還在運(yùn)行。 可以為 pod 中的每個容器單獨(dú)指定存活探針。如果探測失敗, k8s 將定期執(zhí)行探針并重新啟動容器。
三種探測容器的機(jī)制
HTTPGET探針對容器的 IP 地址(你指定的端口和路徑)執(zhí)行 HTTP GET 請求。
如果探測器收到響應(yīng),并且響應(yīng)狀態(tài)碼不代表錯誤(換句話說,如果HTTP響應(yīng)狀態(tài)碼是2xx或3xx), 則認(rèn)為探測成功。
如果服務(wù)器返回錯誤響應(yīng)狀態(tài)碼或者根本沒有響應(yīng),那么探測就被認(rèn)為是失敗的,容器將被重新啟動。
TCP套接字探針嘗試與容器指定端口建立TCP連接。如果連接成功建立,則探測成功。否則,容器重新啟動。
Exec探針在容器內(nèi)執(zhí)行任意命令,并檢查命令的退出狀態(tài)碼。如果狀態(tài)碼是0, 則探測成功。所有其他狀態(tài)碼都被認(rèn)為失敗
HTTPGET探針
準(zhǔn)備工作
修改app.js程序,讓其可以產(chǎn)生連續(xù)3次的錯誤響應(yīng)
// app.js
const http = require('http');
const os = require('os');
console.log("Kubia server starting...");
var x = 0;
var tmp = 0;
var handler = function(request, response) {
console.log("Received request from " + request.connection.remoteAddress);
response.writeHead(200);
x++;
tmp = x % 10;
if (tmp % 7 == 0 || tmp % 8 ==0 || tmp % 9 == 0) {
response.writeHead(400);
console.log("error 400");
response.end();
} else {
response.end("You've hit " + os.hostname() + "\n");
}
};
var www = http.createServer(handler);
www.listen(8080);
使用這個app.js構(gòu)建出對應(yīng)的鏡像.
[root@node1 kubia-err]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kubia-err400 latest f6427314b863 2 weeks ago 64.6MB
[root@node1 kubia-err]#
yaml定義
apiVersion: v1
kind: Pod
metadata:
name: kubia-err400
spec:
containers:
- name: kubia-err400
image: kubia-err400:latest
imagePullPolicy: Never
ports:
- containerPort: 8080
livenessProbe: ## 存活探針
httpGet:
path: / ## 路徑
port: 8080 ## 端口
initialDelaySeconds: 15 ## 探測延時,容器在啟動initialDelaySeconds時長后,開始探測
探針是針對Pod的,所以只需要在Pod上, template.pec處配置,這里測試使用ReplicaSet來測試,用下面的yaml.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: kubia-err400
spec:
replicas: 1
selector:
matchLabels:
app: kubia-err400
template:
metadata:
name: kubia-err400
labels:
app: kubia-err400
spec:
containers:
- name: kubia-err400
image: kubia-err400:latest
imagePullPolicy: Never
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 15
演示
[root@master1 kubeyaml]# vi kubia-err-liveness.yaml
[root@master1 kubeyaml]# kubectl apply -f kubia-err-liveness.yaml
replicaset.apps/kubia-err400 created
[root@master1 kubeyaml]# kubectl get rs kubia-err400
NAME DESIRED CURRENT READY AGE
kubia-err400 1 1 1 21m
[root@master1 kubeyaml]#
查看rs的狀態(tài)
[root@master1 kubeyaml]# kubectl get rs kubia-err400
NAME DESIRED CURRENT READY AGE
kubia-err400 1 1 0 21m
[root@master1 kubeyaml]# kubectl describe rs kubia-err400
Name: kubia-err400
Namespace: default
Selector: app=kubia-err400
Labels: <none>
Annotations: <none>
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=kubia-err400
Containers:
kubia-err400:
Image: kubia-err400:latest
Port: 8080/TCP
Host Port: 0/TCP
Liveness: http-get http://:8080/ delay=15s timeout=1s period=10s #success=1 #failure=3
...
除了明確指定的存活探針選項,還可以看到其他屬性,例如delay(延遲)、 timeout(超時)、period(周期)等。delay=15s顯示在容器啟動15s后開始探測。timeout僅設(shè)置為1秒,因此容器必須在1秒內(nèi)進(jìn)行響應(yīng), 不然這次 探測記作失敗。每10秒探測一次容器(period=10s), 并在探測連續(xù)三次失敗 (#failure= 3)后重啟容器。
過段時間后再去查看pod
## 容器已經(jīng)重啟了8次,因為我們的app.js有個連續(xù)三次失敗的邏輯
[root@master1 kubeyaml]# kubectl get po
NAME READY STATUS RESTARTS AGE
kubia-err400-q4sr9 1/1 Running 8 26m
[root@master1 kubeyaml]#
查看容器失敗原因
[root@master1 kubeyaml]# kubectl describe po kubia-err400-q4sr9
Name: kubia-err400-q4sr9
Namespace: default
Priority: 0
Node: node1.wt.com/192.168.2.15
Start Time: Mon, 05 Jul 2021 13:59:20 +0800
Labels: app=kubia-err400
Annotations: cni.projectcalico.org/podIP: 100.109.35.224/32
Status: Running
IP: 100.109.35.224
IPs:
IP: 100.109.35.224
Controlled By: ReplicaSet/kubia-err400
Containers:
kubia-err400:
Container ID: docker://ee51eba377b5d55f3e060526385c5cdd6f40645f3b68444ad9af384d8ea54480
Image: kubia-err400:latest
Image ID: docker://sha256:f6427314b863d764356d554660b6fcd0febc9de608b2498cd7e547cff9c90777
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 05 Jul 2021 14:26:41 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 05 Jul 2021 14:24:27 +0800
Finished: Mon, 05 Jul 2021 14:26:41 +0800
Ready: True
Restart Count: 9
Liveness: http-get http://:8080/ delay=15s timeout=1s period=10s #success=1 #failure=3
...
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
...
[root@master1 kubeyaml]#
Last State這一欄及后面信息,可以看到,上次終止,Reason=Error,Exit Code=137.這個退出碼128+x, 137=128+9(SIGKILL).9的原因是因為我們?nèi)萜魇呛玫模翘结樚綔y到之后,需要重啟容器,所以直接殺死進(jìn)程(kill -9)
就緒探針(readiness probe)
Pod可能需要時間來加載配置或數(shù)據(jù),或者可能需要執(zhí)行預(yù)熱過程以防止第一個用戶請求時間太長影響了用戶體驗。在這種情況下,不希望該P(yáng)od立即開始接收請求,尤其是在運(yùn)行的示例可以正確快速的處理請求的情況下。不要講請求轉(zhuǎn)發(fā)到正在啟動的Pod中,直到完全準(zhǔn)備就緒。
這個準(zhǔn)備就緒的概念顯然是每個容器特有的東西。k8s只能檢查咋容器中運(yùn)行的應(yīng)用程序是否響應(yīng)一個簡單的GET請求,或者他可以響應(yīng)特定的URL路徑(該URL導(dǎo)致應(yīng)用程序執(zhí)行一系列檢查已確定它是否準(zhǔn)備就緒)。
就緒探針類型
像存活探針一樣,就緒探針也有三種類型
Exec 探針,執(zhí)行進(jìn)程的地方。容器的狀態(tài)由進(jìn)程的退出狀態(tài)代碼確定。
HTTP GET 探針,向容器發(fā)送 HTTP GET 請求,通過響應(yīng)的 HTTP 狀態(tài)代碼判斷容器是否準(zhǔn)備好
TCP socket 探針,它打開一個TCP連接到容器的指定端口。如果連接已建立,則認(rèn)為容器已準(zhǔn)備就緒
介紹
啟動容器時,可以為k8s配置一個等待時間,經(jīng)過等待時間后才可以執(zhí)行第一次準(zhǔn)備就緒檢查。之后,它會周期性的調(diào)用探針,并根據(jù)就緒探針的結(jié)果采取行動。如果某個Pod報告它尚未準(zhǔn)備就緒,則會從該服務(wù)中刪除該P(yáng)od。如果Pod再次準(zhǔn)備就緒,則重新添加Pod。
與存活探針不同,如果容器未通過準(zhǔn)備檢查,則不會 被終止或重新啟動。這是存活探針與就緒探針之間的重要區(qū)別。存活探針通過殺死異常的容器,并用新的正常容器替代他們來保持Pod正常工作,而就緒探針確保只有準(zhǔn)備好處理請求的Pod才可以接收請求。
如果一個容器的就緒探測失敗,則將該P(yáng)od從endpoints中移除

Exec探針
yaml定義
apiVersion: v1
kind: Pod
metadata:
name: kubia-err400
spec:
containers:
- name: kubia-err400
image: kubia-err400:latest
imagePullPolicy: Never
ports:
- containerPort: 8080
readinessProbe: ## 每個容器都會有一個就緒探針
exec:
command: ## 通過執(zhí)行命令來檢查容器是否正常,便于測試
- ls
- /var/ready
就緒探針將定期在容器內(nèi)執(zhí)行ls /var/ready命令。如果文件存在, 則ls命令返回退出碼 0, 否則返回非0的退出碼。如果文件存在,則就緒探針將成功,否則,它會失敗。
測試
同樣通過rs來測試,如下yaml
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: kubia-readiness
spec:
replicas: 3
selector:
matchLabels:
app: kubia-readiness
template:
metadata:
name: kubia-readiness
labels:
app: kubia-readiness
spec:
containers:
- name: kubia-readiness
image: kubia:latest
imagePullPolicy: Never
ports:
- containerPort: 8080
readinessProbe:
exec:
command:
- ls
- /var/ready
查看Pod,因為還沒有/var/ready文件,所以探針失敗,所有pod處于NotReady狀態(tài)(0/1).
[root@master1 kubeyaml]# kubectl apply -f kubia-readiness.yaml
replicaset.apps/kubia-readiness created
[root@master1 kubeyaml]# kubectl get po
NAME READY STATUS RESTARTS AGE
kubia-readiness-d7nfm 0/1 Running 0 29s
kubia-readiness-nmnbx 0/1 Running 0 5s
kubia-readiness-zf5gc 0/1 Running 0 5s
[root@master1 kubeyaml]#
給pod創(chuàng)建service
apiVersion: v1
kind: Service
metadata:
name: kubia-readiness
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: kubia-readiness
查看endpoints
[root@master1 kubeyaml]# kubectl apply -f kubia-service-readiness.yaml
service/kubia-readiness created
[root@master1 kubeyaml]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14d
kubia-readiness ClusterIP 10.110.17.61 <none> 80/TCP 9s
[root@master1 kubeyaml]# kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.2.14:6443 14d
kubia-readiness 22s
[root@master1 kubeyaml]#
可以看到,因為Pod全都是未就緒,所以endpoints一個ip都沒有。
給第一個容器創(chuàng)建
/var/ready,等待10秒(默認(rèn)10秒探測一次)
[root@master1 kubeyaml]# kubectl exec kubia-readiness-d7nfm -- touch /var/ready
[root@master1 kubeyaml]# kubectl get po
NAME READY STATUS RESTARTS AGE
kubia-readiness-d7nfm 1/1 Running 0 12m
kubia-readiness-nmnbx 0/1 Running 0 11m
kubia-readiness-zf5gc 0/1 Running 0 11m
[root@master1 kubeyaml]# kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.2.14:6443 14d
kubia-readiness 100.109.35.244:8080 7m54s
[root@master1 kubeyaml]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14d
kubia-readiness ClusterIP 10.110.17.61 <none> 80/TCP 8m59s
[root@master1 kubeyaml]# curl 10.110.17.61
You've hit kubia-readiness-d7nfm
[root@master1 kubeyaml]# curl 10.110.17.61
You've hit kubia-readiness-d7nfm
[root@master1 kubeyaml]#
可以看到第一個容器已經(jīng)ready了,并且endpoints中也已經(jīng)加入了對應(yīng)的Pod的IP,訪問也正常,且只有這一個Pod響應(yīng)請求。
未就緒的Pod
其實(shí)Pod是已經(jīng)啟動了的,只是因為就緒探針探測失敗,所以是NotReady狀態(tài)(0/1).我們可以嘗試訪問一下另外的未就緒的容器。
[root@master1 kubeyaml]# kubectl get po
NAME READY STATUS RESTARTS AGE
kubia-err400-q4sr9 1/1 Running 28 120m
kubia-m982c 1/1 Running 3 10d
kubia-n6sgn 1/1 Running 3 10d
kubia-readiness-d7nfm 1/1 Running 0 18m
kubia-readiness-nmnbx 0/1 Running 0 18m
kubia-readiness-zf5gc 0/1 Running 0 18m
kubia-sx7wp 1/1 Running 3 10d
[root@master1 kubeyaml]# kubectl describe po kubia-readiness-nmnbx
Name: kubia-readiness-nmnbx
Namespace: default
Priority: 0
Node: node1.wt.com/192.168.2.15
Start Time: Mon, 05 Jul 2021 15:41:30 +0800
Labels: app=kubia-readiness
Annotations: cni.projectcalico.org/podIP: 100.109.35.235/32
Status: Running
IP: 100.109.35.235
IPs:
IP: 100.109.35.235
Controlled By: ReplicaSet/kubia-readiness
...
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
...
already present on machine
Normal Created 18m kubelet, node1.wt.com Created container kubia-readiness
Normal Started 18m kubelet, node1.wt.com Started container kubia-readiness
Warning Unhealthy 3m41s (x91 over 18m) kubelet, node1.wt.com Readiness probe failed: ls: /var/ready: No such file or directory
[root@master1 kubeyaml]# curl 100.109.35.235:8080
You've hit kubia-readiness-nmnbx
[root@master1 kubeyaml]#
可以看到,容器其實(shí)是在運(yùn)行的,只是探針一直未就緒而已,就緒探針不會殺死容器(和存活探針的區(qū)別),curl直接訪問Pod的IP也是可以正常接收請求的。
可以看到未就緒狀態(tài)部分描述。
Warning Unhealthy 3m41s (x91 over 18m) kubelet, node1.wt.com Readiness probe failed: ls: /var/ready: No such file or directory