概述
kubernetes集群經(jīng)常會遇到添加新節(jié)點master失敗的問題,下面了解一下通用的解決方案。

etcd架構(gòu)

etcd知識圖譜
錯誤現(xiàn)象
安裝三個節(jié)點master的高可用kubernetes集群,可能會遇到添加新節(jié)點master失敗的問題。
因etcd失敗而導致添加master失敗,可按以下方法解決問題:
錯誤一:
error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded
錯誤二:
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 192.168.1.10:2379: connect: connection refused
解決方法
進入etcd集群,刪除異常的etcd集群節(jié)點,到報異常的節(jié)點執(zhí)行kubeadm reset命令后繼 續(xù)執(zhí)行添加master的命令
# kubectl exec -it etcd-192.168.1.10 sh -n kube-system /
# export ETCDCTL_API=3 /
# alias etcdctl='etcdctl --endpoints=https://192.168.1.10:2379 -- cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
# etcdctl member list 48d035e72d5a2c65, started, 192.168.1.10, https://192.168.1.11:2380, https://192.168.1.12:2379 9d8522a49fdf6359, started, 192.168.1.10, https://192.168.1.11:2380, https://192.168.1.12:2379 ca3435917a38658e, unstarted, 192.168.1.10, https://192.168.1.11:2380, https://192.168.1.12:2379
# etcdctl member remove ca3435917a38658e Member ca3435917a38658e removed from cluster 8650138bc047cb5
# etcdctl member list 48d035e72d5a2c65, started, 192.168.1.10, https://192.168.1.11:2380, https://192.168.1.12:2379 9d8522a49fdf6359, started, 192.168.1.10, https://192.168.1.11:2380, https://192.168.1.12:2379
擴展實操
遇到狀態(tài)為Terminating的POD,強制刪除POD的方法
kubectl delete po etcd-192.168.1.12 -n kube-system --force kubectl delete po kube-apiserver-192.168.1.12 -n kube-system --force
擴展實操:常用kubectl工具命令
kubectl get po -o wide -A # 查看所有命名空間下POD
kubectl get po -o wide -n kube-system # 查看命名空間kube-system下POD
kubectl logs -f --tail=100 PODID
kubectl describe po PODID # 查看pod屬性信息
kubectl get deploy -n kube-system
kubectl get daemonset -n kube-system