最近集群總是遇到某些節(jié)點(diǎn)出現(xiàn)pod一直terminating,無法正常刪除的狀態(tài),查看系統(tǒng)日志
journalctl -xeu kubelet
發(fā)現(xiàn)kubelet的異常日志如下
no space left on device

image.png
根據(jù)關(guān)鍵字imotify_add_watch,網(wǎng)上介紹說是fs.inotify.max_user_watches的值不夠的緣故,查看節(jié)點(diǎn)的該值
cat /proc/sys/fs/inotify/max_user_watches
臨時調(diào)整
sysctl -w fs.inotify.max_user_watches=65536
觀察到kubelet確實正常自動,為了避免重啟機(jī)器配置丟失,將配置寫入/etc/sysctl.conf
# 新增配置
fs.inotify.max_user_watches=65536
# 啟動配置
sysctl -p
利用ansible操作多臺機(jī)器
hosts.ini
[kube-all]
192.168.0.1 ansible_ssh_user=ubuntu ansible_ssh_port=22 ansible_ssh_pass="***" ansible_sudo_pass="***"
fix_watch.yaml
---
- hosts: kube-all
become_user: root
become: yes
gather_facts: no
tasks:
- name: change watch
lineinfile:
path: /etc/sysctl.conf
line: fs.inotify.max_user_watches=65536
- name: sysctl -p
shell: |
sysctl -p
ansible.cfg
[defaults]
host_key_checking = False
any_errors_fatal = True
timeout = 30
forks = 10
[ssh_connection]
ssh_args=-F ansible_ssh_config
retries=10
ansible_ssh_config
Host *
ForwardAgent no
ControlMaster=auto
ControlPersist=300s
執(zhí)行命令,即可完成機(jī)器的配置調(diào)整,上面的ansible.cfg在調(diào)整了一些配置,ansible會自動優(yōu)先查找當(dāng)前目錄下的該文件
ansible-playbook -i hosts,ini fix_watch.yaml