Prometheus監(jiān)控k8s資源
通過Prometheus監(jiān)控k8s集群中各種資源:如微服務(wù),容器資源指標(biāo) 并在Grafana顯示
思路
可以通過外部prometheus通過連接apiserver去監(jiān)控k8s集群內(nèi)指標(biāo)。(前提k8s集群內(nèi)安裝好相應(yīng)的exports)
可以通過部署kube-prometheus(集群內(nèi)部起了一套監(jiān)控) 在通過聯(lián)邦的方式,進行監(jiān)控。
以下采用 外部prometheus監(jiān)控cadvisor,kube-state-metrics來獲取k8s集群指標(biāo)資源
準(zhǔn)備工作
插件介紹
想要監(jiān)控k8s比較全面的資源指標(biāo),我們需要在集群內(nèi)安裝相應(yīng)的exports,這要借助cadvisor,kube-state-metrics
cadvisor: 集成在kubelet內(nèi),不需要單獨去安裝了,它可以收集集群內(nèi)容器的cpu,內(nèi)存等指標(biāo)
kube-state-metrics: 輪詢api-server,監(jiān)聽 add delete update等事件,換句話說 光有cadvisor這些基本指標(biāo)去監(jiān)控,維度是不夠的
對于deployment,Pod、daemonset、cronjob等k8s資源對象并沒有監(jiān)控,比如:當(dāng)前replace是多少?Pod當(dāng)前狀態(tài)(pending or running?)cadvisor并沒有對具體的資源對象就行監(jiān)控,因此就需引用新的exports來暴漏監(jiān)控指標(biāo),kube-state-metrics
kube-state-metrics安裝部署
- 下載kube-state-metrics安裝包
注意: 我的k8s版本為v1.20.6 所以,要在github上看下說明,根據(jù)自己k8s的版本按實際情況來選擇kube-state-metrics的版本
kube-state-metrics_v2.2.1
下載地址
https://github.com/starsliao/Prometheus/tree/master/kubernetes
將安裝包上傳至服務(wù)器
- 部署kube-state-metrics
按需修改service.yml 中暴露端口,修改后如下:
apiVersion: v1
kind: Service
metadata:
# annotations:
# prometheus.io/scrape: 'true'
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: v2.2.1
name: kube-state-metrics
namespace: ops-monit
spec:
type: NodePort
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
nodePort: 30866
- name: telemetry
port: 8081
targetPort: telemetry
nodePort: 30867
selector:
app.kubernetes.io/name: kube-state-metrics
在k8s集群中
kubectl create namespace ops-monit
cd kube-state-metrics
kubectl apply -f .
配置Prometheus
vim prometheus.yml
新增如下內(nèi)容:
- job_name: 'k8s-cadvisor'
scrape_interval: 60s
scrape_timeout: 60s
metrics_path: /metrics/cadvisor
kubernetes_sd_configs: # kubernetes 自動發(fā)現(xiàn)
- api_server: https://192.168.1.21:6443 # apiserver 地址
role: node # node 類型的自動發(fā)現(xiàn)
namespaces:
names:
- ops-monit
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
- source_labels: [pod_name]
separator: ;
regex: (.+)
target_label: pod
replacement: $1
action: replace
- source_labels: [container_name]
separator: ;
regex: (.+)
target_label: container
replacement: $1
action: replace
- job_name: kube-state-metrics-1
kubernetes_sd_configs:
- api_server: https://192.168.1.21:6443 # apiserver 地址
role: endpoints # node 類型的自動發(fā)現(xiàn)
namespaces:
names:
- ops-monit
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- separator: ;
regex: (.*)
target_label: __address__
replacement: 192.168.1.21:30866
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: kube-state-metrics
replacement: $1
action: keep
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: k8s_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: k8s_sname
- job_name: kube-state-metrics-2
kubernetes_sd_configs:
- api_server: https://192.168.1.21:6443 # apiserver 地址
role: endpoints # node 類型的自動發(fā)現(xiàn)
namespaces:
names:
- ops-monit
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- separator: ;
regex: (.*)
target_label: __address__
replacement: 192.168.1.21:30867
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: kube-state-metrics
replacement: $1
action: keep
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: k8s_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: k8s_sname
- 以上涉及到了一些段落說明一下
這是外部prometheus連接k8s集群,
填寫apiserver地址
命名空間寫kube-state-metrics所在的ns空間
通信的token : 在prometheus.yml同級目錄創(chuàng)建了個k8s.token的文件,內(nèi)容為k8s集群中ops-monit空間下的secret---prometheus-token-wq9fd的內(nèi)容
kubernetes_sd_configs:
- api_server: https://192.168.1.21:6443 # apiserver 地址
role: endpoints # node 類型的自動發(fā)現(xiàn)
namespaces:
names:
- ops-monit
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
配置儀表盤
Grafana導(dǎo)入id:13105
一些配置上細節(jié)的說明,可以參考模板的說明https://grafana.com/grafana/dashboards/13105-1-k8s-for-prometheus-dashboard-20211010/
