很多地方提到Prometheus Operator是kubernetes集群監(jiān)控的終極解決方案,但是目前Prometheus Operator已經不包含完整功能,完整的解決方案已經變?yōu)閗ube-prometheus。項目地址為:
https://github.com/coreos/kube-prometheus
安裝
下載軟件
#git clone https://github.com/coreos/kube-prometheus.git
查看清單文件
#cd manifests
#ls
00namespace-namespace.yaml node-exporter-clusterRole.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml node-exporter-daemonset.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml node-exporter-serviceAccount.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml node-exporter-serviceMonitor.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml node-exporter-service.yaml
0prometheus-operator-clusterRoleBinding.yaml prometheus-adapter-apiService.yaml
0prometheus-operator-clusterRole.yaml prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
0prometheus-operator-deployment.yaml prometheus-adapter-clusterRoleBindingDelegator.yaml
0prometheus-operator-serviceAccount.yaml prometheus-adapter-clusterRoleBinding.yaml
0prometheus-operator-serviceMonitor.yaml prometheus-adapter-clusterRoleServerResources.yaml
0prometheus-operator-service.yaml prometheus-adapter-clusterRole.yaml
alertmanager-alertmanager.yaml prometheus-adapter-configMap.yaml
alertmanager-secret.yaml prometheus-adapter-deployment.yaml
alertmanager-serviceAccount.yaml prometheus-adapter-roleBindingAuthReader.yaml
alertmanager-serviceMonitor.yaml prometheus-adapter-serviceAccount.yaml
alertmanager-service.yaml prometheus-adapter-service.yaml
grafana-dashboardDatasources.yaml prometheus-clusterRoleBinding.yaml
grafana-dashboardDefinitions.yaml prometheus-clusterRole.yaml
grafana-dashboardSources.yaml prometheus-prometheus.yaml
grafana-deployment.yaml prometheus-roleBindingConfig.yaml
grafana-serviceAccount.yaml prometheus-roleBindingSpecificNamespaces.yaml
grafana-serviceMonitor.yaml prometheus-roleConfig.yaml
grafana-service.yaml prometheus-roleSpecificNamespaces.yaml
kube-state-metrics-clusterRoleBinding.yaml prometheus-rules.yaml
kube-state-metrics-clusterRole.yaml prometheus-serviceAccount.yaml
kube-state-metrics-deployment.yaml prometheus-serviceMonitorApiserver.yaml
kube-state-metrics-roleBinding.yaml prometheus-serviceMonitorCoreDNS.yaml
kube-state-metrics-role.yaml prometheus-serviceMonitorKubeControllerManager.yaml
kube-state-metrics-serviceAccount.yaml prometheus-serviceMonitorKubelet.yaml
kube-state-metrics-serviceMonitor.yaml prometheus-serviceMonitorKubeScheduler.yaml
kube-state-metrics-service.yaml prometheus-serviceMonitor.yaml
node-exporter-clusterRoleBinding.yaml prometheus-service.yaml
修改prometheus-serviceMonitorKubelet.yaml中的port,由https-metrics改為http-metrics,并將scheme改為http
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
port: http-metrics
scheme: http #很多資料上沒有提到scheme
alertmanager-service.yaml 增加nodeport 30093的配置
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30093
type: NodePort
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
grafana-service.yaml 增加nodeport 32000的配置
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
ports:
- name: http
port: 3000
targetPort: http
nodePort: 32000
type: NodePort
selector:
app: grafana
prometheus-service.yaml 增加nodeport 30090的配置
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
ports:
- name: web
port: 9090
targetPort: web
nodePort: 30090
type: NodePort
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
創(chuàng)建資源,過程中會報資源不存在,建議執(zhí)行兩次
#kubectl apply -f .
查看自定義資源crd
#kubectl get crd | grep coreos
alertmanagers.monitoring.coreos.com 2019-06-03T09:17:48Z
prometheuses.monitoring.coreos.com 2019-06-03T09:17:48Z
prometheusrules.monitoring.coreos.com 2019-06-03T09:17:48Z
servicemonitors.monitoring.coreos.com 2019-06-03T09:17:48Z
查看新建的pod
#kubectl -n monitoring get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 16h 10.244.196.134 node01 <none> <none>
alertmanager-main-1 2/2 Running 0 15h 10.244.241.204 ingressnode02 <none> <none>
alertmanager-main-2 2/2 Running 0 15h 10.244.114.4 node05 <none> <none>
grafana-69c7b8468d-l8p2b 1/1 Running 0 16h 10.244.17.198 prometheus01 <none> <none>
kube-state-metrics-65b5ccc84-kwfgh 4/4 Running 0 15h 10.244.17.199 prometheus01 <none> <none>
node-exporter-62mkc 2/2 Running 0 16h 22.22.3.235 master02 <none> <none>
node-exporter-6bsrb 2/2 Running 0 16h 22.22.3.239 node04 <none> <none>
node-exporter-8b5h8 2/2 Running 0 16h 22.22.3.241 prometheus01 <none> <none>
node-exporter-chssb 2/2 Running 0 16h 22.22.3.243 ingressnode02 <none> <none>
node-exporter-dwqkc 2/2 Running 0 16h 22.22.3.240 node05 <none> <none>
node-exporter-kf2cr 2/2 Running 0 16h 22.22.3.242 ingressnode01 <none> <none>
node-exporter-krsm4 2/2 Running 0 16h 22.22.3.238 node03 <none> <none>
node-exporter-lv4gx 2/2 Running 0 16h 22.22.3.236 node01 <none> <none>
node-exporter-v5f9v 2/2 Running 0 16h 22.22.3.234 master01 <none> <none>
node-exporter-zgsr2 2/2 Running 0 16h 22.22.3.237 node02 <none> <none>
prometheus-adapter-6c75d8686d-gq8bn 1/1 Running 0 16h 10.244.17.197 prometheus01 <none> <none>
prometheus-k8s-0 3/3 Running 1 16h 10.244.140.68 node02 <none> <none>
prometheus-k8s-1 3/3 Running 1 16h 10.244.248.198 node04 <none> <none>
prometheus-operator-74d449f6b4-q6bjn 1/1 Running 0 16h 10.244.17.196 prometheus01 <none> <none>
確認網頁都能正常打開



配置prometheus
展開Status菜單,查看targets,可以看到只有圖中兩個監(jiān)控任務沒有對應的目標,這和serviceMonitor資源對象有關

查看yaml文件prometheus-serviceMonitorKubeScheduler,selector匹配的是service的標簽,但是kube-system namespace中并沒有k8s-app=kube-scheduler的service
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http-metrics
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-scheduler
新建prometheus-kubeSchedulerService.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler #與servicemonitor中的selector匹配
spec:
selector:
component: kube-scheduler # 與scheduler的pod標簽一直
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
創(chuàng)建service kube-scheduler
#kubectl apply -f prometheus-kubeSchedulerService.yaml
同理新建prometheus-kubeControllerManagerService.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
selector:
component: kube-controller-manager
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
創(chuàng)建service kube-controller-manager
#kubectl apply -f prometheus-kubeControllerManagerService.yaml
確認所有targets變?yōu)檎?/p>

配置grafana
使用admin/admin登錄并修改密碼
可以看到數據源已經與prometheus關聯

自定義監(jiān)控項
以監(jiān)控etcd為例

將需要的etcd證書保存到secret對象etcd-certs中
# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
secret/etcd-certs created
修改prometheus資源k8s,在prometheus-prometheus.yaml里面增加secrets
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
beta.kubernetes.io/os: linux
replicas: 2
secrets:
- etcd-certs
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.7.2
應用prometheus-prometheus.yaml
#kubectl apply -f prometheus-prometheus.yaml
在pod中查看證書是否導入成功
# kubectl -n monitoring exec -it prometheus-k8s-0 /bin/sh
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
# ls -l /etc/prometheus/secrets/etcd-certs/
total 0
lrwxrwxrwx 1 root root 13 Jun 4 09:12 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 29 Jun 4 09:12 healthcheck-client.crt -> ..data/healthcheck-client.crt
lrwxrwxrwx 1 root root 29 Jun 4 09:12 healthcheck-client.key -> ..data/healthcheck-client.key
/prometheus $ cat /etc/prometheus/secrets/etcd-certs/ca.crt
-----BEGIN CERTIFICATE-----
MIIC9zCCAd+gAwIBAgIJAMiN3pOWJVGOMA0GCSqGSIb3DQEBCwUAMBIxEDAOBgNV
BAMMB2V0Y2QtY2EwHhcNMTkwNTI3MDgzNDExWhcNMzkwNTIyMDgzNDExWjASMRAw
DgYDVQQDDAdldGNkLWNhMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
rG1xQcAwZ67XXG84PzqIIqoqnq/zM3Ru+02PELbzgiZ4MrNPte32vZuj6HK/JDDQ
nEirgnQQxQJ6OxvnDrFVwyxveNI8jrd+FRfuh2ae0NIiqkWk88O42OioACBW6cJA
hILpIcn066+E+t2vh/3TmqMduV8eY5p8VAwRT1B04fJAQVcr0sJh3JXExppbtdWL
Z0T25QTbbbZ/I6oxLMu/NkS171R5l397rSpD2ox0NV0GASoqiitffPznOHBPa1Zs
UwOlQnZlWaBM5XQHFhRQTG/Bxxhe45azmmPT3DGCpATk+/GnYDPnt4TSZiX9gZ6O
beRsGUzPDrX/LOEV/Uv+VQIDAQABo1AwTjAdBgNVHQ4EFgQUxQl8C8RdG+tU2U+T
gy901tOxUNUwHwYDVR0jBBgwFoAUxQl8C8RdG+tU2U+Tgy901tOxUNUwDAYDVR0T
BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAica5i0wN9ZuCICQOGwMcuVgadBqV
w4dOyP4EPyD2SKx3YpYREMGXOafYkrX2rWKqsCBqS9xUT34x2DQ4/KuoPY/Ee37h
pJ+/i47sq8pmiHxqQRUACyGA6SqWtcApfW62+O97qHnRtyUcCftKKLYEu3djzTJd
FOn6xPehbFzhL9H4tsiZ+kFaXqWDUbhSCAd/LeJ+dxzmOE+Rd0hsPHIyzdmWUKwe
CTkSaf9X4KPWjBUCqPzB/Td6Mz3HHg8zZo2FgkyI98a7c83rHl3aTfBJEi4LND8x
PTFwgOGNlZXa6OnUmkn/sHvoNc88EqDm/GjPI6xfLr7BSWE4jJCIwWROvg==
-----END CERTIFICATE-----
創(chuàng)建serviceMonitor etcd-k8s prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: etcd-k8s
name: etcd-k8s
namespace: monitoring
spec:
endpoints:
- port: port
interval: 30s
scheme: https
#port: https-metrics
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.cert
certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.cert
keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: etcd
應用prometheus-serviceMonitorEtcd.yaml
#kubectl apply -f prometheus-serviceMonitorEtcd.yaml
創(chuàng)建關聯的service,因為etcd是外部的,所以需要手動創(chuàng)建endpoints.prometheus-service-etcd.yaml
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: etcd
name: etcd-k8s
namespace: kube-system
spec:
ports:
- name: port
port: 2379
protocol: TCP
type: ClusterIP
clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 22.22.3.231
nodeName: etcd01
- ip: 22.22.3.232
nodeName: etcd02
- ip: 22.22.3.233
nodeName: etcd03
ports:
- name: port
port: 2379
protocol: TCP
應用prometheus-service-etcd.yaml
#kubectl apply -f prometheus-service-etcd.yaml

到https://grafana.com/dashboards 找到etcd相關dashboard
https://grafana.com/dashboards/3070

下載json文件,并導入到grafana,需要修改prometheus為prometheus

查看dashboard

- Prometheus和serviceMonitor的配置錯誤可能導致pod prometheus-k8s-0和prometheus-k8s-1不正常,從而導致prometheus無法打開,只要將配置修改正確就可以恢復。