
使用docker的方式運(yùn)行Grafana集成Prometheus+node-exporter+cadvisor監(jiān)控多個(gè)節(jié)點(diǎn)。
node里一個(gè)是本機(jī),另外一個(gè)是我的另外一臺(tái)服務(wù)器,Prometheus只需要啟動(dòng)一個(gè),另外一個(gè)服務(wù)器只需要運(yùn)行一個(gè)node-exporter。
Prometheus 官方和一些第三方,已經(jīng)把一些常用數(shù)據(jù)庫(kù)、系統(tǒng)、中間件等的指標(biāo)數(shù)據(jù)的采集做成了一個(gè)個(gè) exporter,在生產(chǎn)環(huán)境中,直接導(dǎo)入使用就可以。 這一節(jié),我們就用 Prometheus 官方提供的 Node Exporter 來(lái)完成對(duì)Linux系統(tǒng)運(yùn)行數(shù)據(jù)的采集 。cAdvisor可以對(duì)節(jié)點(diǎn)機(jī)器上的資源及容器進(jìn)行實(shí)時(shí)監(jiān)控和性能數(shù)據(jù)采集,包括CPU使用情況、內(nèi)存使用情況、網(wǎng)絡(luò)吞吐量及文件系統(tǒng)使用情況。

本文全程基于docker-compse,沒(méi)有docker環(huán)境的請(qǐng)先準(zhǔn)備docker環(huán)境
docker-compose文件準(zhǔn)備
1.編寫grafana.yml的文件,一定要記得掛在卷,否則重啟以后就要重新配置。
version: '3.1'
services:
grafana:
image: grafana/grafana
container_name: grafana
restart: always
ports:
- "3000:3000"
volumes:
- /opt/grafana:/var/lib/grafana
2.編寫prometheus.yml包含Prometheus+node-exporter+cadvisor
version: "3"
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /opt/prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml
- /opt/prometheus/config/node_down.yml:/etc/prometheus/node_down.yml
ports:
- "9090:9090"
node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
restart: always
ports:
- "9100:9100"
cadvisor:
image: google/cadvisor:latest
container_name: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
3.另外一個(gè)節(jié)點(diǎn),node-exporter.yml
version: '3.1'
services:
node-exporter:
image: quay.io/prometheus/node-exporter
container_name: node-exporter
restart: always
ports:
- "9100:9100"
Prometheus配置文件編輯
上面的prometheus掛載的文件有兩個(gè)prometheus.yml和node_down.yml
1.prometheus.yml
172.18.0.1是我docker網(wǎng)卡的網(wǎng)關(guān)地址。端口對(duì)應(yīng)上面docker-compose文件里配置的地址,這里全部采用默認(rèn)的端口。
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
#alerting:
# alertmanagers:
# - static_configs:
# - targets: ['172.18.0.1:9093']
rule_files:
- "node_down.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['172.18.0.1:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['172.18.0.1:8080']
- job_name: 'node'
scrape_interval: 8s
static_configs:
- targets: ['172.18.0.1:9100','49.235.160.131:9100']
2.node_down.yml
groups:
- name: node_down
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
user: test
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
啟動(dòng)服務(wù)
docker-compose -f grafana.yml up -d
docker-compose -f node-exporter.yml up -d
docker-compose -f prometheus.yml up -d
配置界面
啟動(dòng)后訪問(wèn)你的grafana,地址是ip:3000,第一次需要修改默認(rèn)密碼(admin/admin)
登錄進(jìn)來(lái)后,第一步需要先加prometheus數(shù)據(jù)源

第二步就是找一個(gè)官方模板,模板ID:1860,當(dāng)然你也可以自己設(shè)計(jì)dashboard

這里需要選擇第一步設(shè)置好的數(shù)據(jù)源

實(shí)際效果

查看其他的服務(wù)器信息

監(jiān)控Docker主機(jī)模板ID:193(這個(gè)模板可以直接使用來(lái)監(jiān)控docker來(lái)得到儀表盤)
