K8S 集群定時(shí)清理 Evicted 狀態(tài)的 Pod

問(wèn)題現(xiàn)象

看到 k8s 集群中有 Evicted 狀態(tài)的 pod,沒(méi)有被清理

# kubectl get pod -o wide -A | grep Evicted
simulation-prod      cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h               0/1     Evicted       0          42d     <none>          cn-shanghai.172.22.0.194   <none>           <none>

排查過(guò)程

可以看到 pod 的狀態(tài)是 Status:FailedReason:Evicted,從 Message 可以知道,Evicted 的原因是 node 磁盤(pán)資源不足

# kubectl -n simulation-prod describe pod cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Name:           cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Namespace:      simulation-prod
Priority:       0
Node:           cn-shanghai.172.22.0.194/
Start Time:     Mon, 29 Nov 2021 15:48:25 +0800
Labels:         app.kubernetes.io/instance=cloud-simulation-dead-letter-worker
                app.kubernetes.io/name=cloud-simulation-dead-letter-worker
                pod-template-hash=d96bdcf98
Annotations:    kubernetes.io/psp: ack.privileged
Status:         Failed
Reason:         Evicted
Message:        The node was low on resource: ephemeral-storage. Container cloud-simulation-dead-letter-worker was using 291599484Ki, which exceeds its request of 0. 
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/cloud-simulation-dead-letter-worker-d96bdcf98
Containers:
  cloud-simulation-dead-letter-worker:
    Image:      registry-vpc.cn-shanghai.aliyuncs.com/xxx/cloud_sim:1.1.2111290718.f0cfa04
    Port:       <none>
    Host Port:  <none>
    Command:
      /root/entry/dead_letter_worker.py
    Environment:
      DEPLOYMENT:  prod
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from cloud-simulation-dead-letter-worker-token-4z2xv (ro)
Volumes:
  cloud-simulation-dead-letter-worker-token-4z2xv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloud-simulation-dead-letter-worker-token-4z2xv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-type=simulation-prod
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

問(wèn)題原因

節(jié)點(diǎn)壓力驅(qū)逐是 kubelet 主動(dòng)終止 Pod 以回收節(jié)點(diǎn)上資源的過(guò)程。
kubelet 監(jiān)控集群節(jié)點(diǎn)的 CPU、內(nèi)存、磁盤(pán)空間和文件系統(tǒng)的 inode 等資源。 當(dāng)這些資源中的一個(gè)或者多個(gè)達(dá)到特定的消耗水平, kubelet 可以主動(dòng)地使節(jié)點(diǎn)上一個(gè)或者多個(gè) Pod 失效,以回收資源防止饑餓。
在節(jié)點(diǎn)壓力驅(qū)逐期間,kubelet 將所選 Pod 的 PodPhase 設(shè)置為 Failed。這將終止 Pod。
節(jié)點(diǎn)壓力驅(qū)逐不同于 API 發(fā)起的驅(qū)逐。kubelet 并不理會(huì)你配置的 PodDisruptionBudget 或者是 Pod 的 terminationGracePeriodSeconds。

解決辦法

kubectl 不會(huì)刪除 Status:Failed 和 Reason:Evicted 狀態(tài)的 pod ,因此選擇 k8s CronJob 定時(shí)刪除這些 pod

$ vim 01-sa.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: delete-evicted-pods
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: delete-evicted-pods
  namespace: delete-evicted-pods

$ vim 02-cr.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: delete-evicted-pods
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list", "delete"]



$ vim 03-crb.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: delete-evicted-pods
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: delete-evicted-pods
subjects:
  - kind: ServiceAccount
    name: delete-evicted-pods
    namespace: delete-evicted-pods

$ vim 04-cj.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: delete-evicted-pods
  namespace: delete-evicted-pods
spec:
  schedule: "*/30 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: delete-evicted-pods
          containers:
          - name: kubectl-runner
            image: bitnami/kubectl:1.21.8
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - kubectl get pods --all-namespaces -o go-template='{{range .items}} {{if (eq .status.phase "Failed" )}} {{.metadata.name}}{{" "}} {{.metadata.namespace}}{{" "}} {{.metadata.creationTimestamp}}{{" "}} {{.status.reason}} {{"\n"}}{{end}} {{end}}' | while read epod namespace ct reason; do if [ x"$reason" = x"Evicted" -a $((`date +%s`-`date -d "$ct" +%s`)) -gt 259200 ];then echo "`date "+%Y-%m-%d %H:%M:%S"` delete $namespace $reason $epod "; kubectl -n $namespace delete pod $epod; fi; done;
          restartPolicy: OnFailure

參考:

  1. Pod 的生命周期:https://kubernetes.io/zh/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
  2. 節(jié)點(diǎn)壓力驅(qū)逐:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/
  3. kubelet 驅(qū)逐時(shí) Pod 的選擇:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/#kubelet-%E9%A9%B1%E9%80%90%E6%97%B6-pod-%E7%9A%84%E9%80%89%E6%8B%A9
  4. Kubelet does not delete evicted pods:https://github.com/kubernetes/kubernetes/issues/55051
  5. 字段選擇器的鏈?zhǔn)竭x擇器:https://kubernetes.io/zh/docs/concepts/overview/working-with-objects/field-selectors/#chained-selectors
  6. 使用 RBAC 鑒權(quán):https://kubernetes.io/zh/docs/reference/access-authn-authz/rbac/
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容