問(wèn)題描述
測(cè)試了若干天的Spark on k8s, 今天突然就無(wú)法初始化 Spark Driver Pod 了。
表現(xiàn)如下,
- 客戶端側(cè)以cluster模式提交一個(gè)幾秒就會(huì)結(jié)束的測(cè)試程序,一直Hang住在ContainerCreating階段

client
- 接著嘗試查看該 Pod 的真實(shí)狀態(tài)。發(fā)現(xiàn) no space left on device 的錯(cuò)誤。
Kent@KentsMacBookPro ~ kubectl describe pod kent-766ab96ce1498671-driver -nns1
Name: kent-766ab96ce1498671-driver
Namespace: ns1
Node: 10.120.237.55/10.120.237.55
Start Time: Fri, 30 Aug 2019 14:49:57 +0800
Labels: spark-app-selector=spark-891742f8c844486eb027aa15ae4b2849
spark-role=driver
syetem/tenant=tenant1
system/namespace=ns1
system/project-project1=true
Annotations: <none>
Status: Pending
IP:
Containers:
spark-kubernetes-driver:
Container ID:
Image: harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3-SPARK-28896
Image ID:
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
org.apache.spark.examples.HdfsTest
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar
hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 1408Mi
Requests:
cpu: 1
memory: 1408Mi
Environment:
SPARK_USER: Kent
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
HADOOP_CONF_DIR: /opt/hadoop/conf
SPARK_LOCAL_DIRS: /var/data/spark-e6630b91-3892-41d1-9624-a955159b176a
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/etc/krb5.conf from krb5-file (rw,path="krb5.conf")
/mnt/secrets/kerberos-keytab from kerberos-keytab (rw)
/opt/hadoop/conf from hadoop-properties (rw)
/opt/spark/conf from spark-conf-volume (rw)
/var/data/spark-e6630b91-3892-41d1-9624-a955159b176a from spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nxhgg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
hadoop-properties:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hz10-hadoop-dir
Optional: false
krb5-file:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kent-766ab96ce1498671-krb5-file
Optional: false
kerberos-keytab:
Type: Secret (a volume populated by a Secret)
SecretName: kent-766ab96ce1498671-kerberos-keytab
Optional: false
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
spark-conf-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kent-766ab96ce1498671-driver-conf-map
Optional: false
default-token-nxhgg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nxhgg
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned ns1/kent-766ab96ce1498671-driver to 10.120.237.55
Warning FailedMount 37m kubelet, 10.120.237.55 MountVolume.SetUp failed for volume "spark-conf-volume" : configmaps "kent-766ab96ce1498671-driver-conf-map" not found
Warning FailedMount 37m kubelet, 10.120.237.55 MountVolume.SetUp failed for volume "kerberos-keytab" : secrets "kent-766ab96ce1498671-kerberos-keytab" not found
Warning FailedMount 37m kubelet, 10.120.237.55 MountVolume.SetUp failed for volume "krb5-file" : configmaps "kent-766ab96ce1498671-krb5-file" not found
Warning FailedCreatePodContainer 7m9s (x140 over 37m) kubelet, 10.120.237.55 unable to ensure pod container exists: failed to create container for [kubepods burstable pod613a702e-caf2-11e9-95c4-6c92bf35a76a] : mkdir /sys/fs/cgroup/memory/kubepods/burstable/pod613a702e-caf2-11e9-95c4-6c92bf35a76a: no space left on device
- Google一番后發(fā)現(xiàn)與https://github.com/rootsongjc/kubernetes-handbook/issues/313 這個(gè)同學(xué)的問(wèn)題基本一致。
存在的可能有,
- Kubelet 宿主機(jī)的 Linux 內(nèi)核過(guò)低 - Linux version 3.10.0-862.el7.x86_64
- K8s集群的版本過(guò)低 - 1.11.9
- 可以通過(guò)禁用kmem解決
聯(lián)系了一下我司k8s內(nèi)核開(kāi)發(fā),首先發(fā)現(xiàn)kmem應(yīng)該是禁止了

image.png
但實(shí)際上,貌似沒(méi)有禁止干凈

image.png
最后,因?yàn)樵趩?dòng)容器的時(shí)候runc的邏輯會(huì)默認(rèn)打開(kāi)容器的kmem accounting,導(dǎo)致3.10內(nèi)核可能的泄漏問(wèn)題
總結(jié)
從根源上看,這個(gè)問(wèn)題應(yīng)該是屬于kubernetes 或者 操作系統(tǒng)內(nèi)核的一個(gè)bug
但從結(jié)果上來(lái)講,Container 因?yàn)楦鞣N原因分配失敗應(yīng)該是常有的事,Spark在處理這塊的時(shí)候目前也沒(méi)有retry的機(jī)制,這些異常的 Pod也沒(méi)有自動(dòng)清理的機(jī)制,需要上集群手動(dòng)進(jìn)行刪除。