安裝Alertmanager
下載地址:https://prometheus.io/download/
下載完成后,將下載中軟件包上傳至Prometheus服務(wù)所在的機(jī)器

image.png
解壓alertmanager軟件包
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /data
mv /data/alertmanager-0.21.0.linux-amd64 /data/alertmanager
進(jìn)入解壓后的alertmanager文件夾,修改alertmanager.yml文件,配置報警信息,alertmanager.yml 內(nèi)容如下:
cat alertmanager.yml
global:
resolve_timeout: 5m #5分鐘內(nèi)沒收到告警表示警報已解除
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1m
receiver: 'wechat'
receivers:
- name: 'wechat'
wechat_configs:
- corp_id: 'XXX'
to_party: 'XX'
agent_id: '1000011'
api_secret: 'secret'
安裝Exporter_rabbitmq
下載地址:https://github.com/kbudde/rabbitmq_exporter/releases
解壓運(yùn)行:
tar -zxvf rabbitmq_exporter-version
cd rabbitmq_exporter-version
RABBIT_USER=USER RABBIT_PASSWORD=PASSWORD OUTPUT_FORMAT=json PUBLISH_PORT=9099 RABBIT_URL=http://XXX:15672 nohup ./rabbitmq_exporter &
RABBIT_USER:Rabbitmq管理插件的用戶名
RABBIT_PASSWORD: Rabbitmq管理插件的用戶名密碼
OUTPUT_FORMAT:數(shù)據(jù)輸出格式為json
PUBLISH_PORT:監(jiān)聽端口
RABBIT_URL:Rabbitmq管理插件的地址
Rabbitmq_exporter起來后配置prometheus.yml添加RabbitMQ監(jiān)控
- job_name: 'RabbitMQ'
static_configs:
- targets: ['47.241.2.144:9099']
labels:
instance: RabbitMQ-47.241.2.144
- targets: ['47.101.150.234:9099']
labels:
instance: RabbitMQ-47.101.150.234
我這里監(jiān)控了兩個節(jié)點(diǎn)
配置告警規(guī)則
在Prometheus.yml下配置規(guī)則文件
rule_files:
- "rule.yml"
cat /data/prometheus/rule.yml
groups:
- name: Rabbitmq
rules:
- alert: Rabbitmq-down
expr: rabbitmq_up{job='RabbitMQ'} != 1
labels:
status: High
team: Rabbitmq_monitor
annotations:
description: "Instance: {{ $labels.instance }} is Down ! ! !"
value: '{{ $value }}'
summary: "The host node is down"
- name: Rabbitmq disk free limit
rules:
- alert: Rabbitmq disk free limit status
expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024 <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200
labels:
status: High
team: Rabbitmq_monitor
annotations:
description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !"
value: '{{ $value }} MB'
summary: "The rmq free disk too low"
添加需要的監(jiān)控項(xiàng)
Prometheus.yml整體配置
cat /data/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.1.178:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rule.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'RabbitMQ'
static_configs:
- targets: ['xx.xx.xx.xx:9099']
labels:
instance: RabbitMQ-47.241.2.144
- targets: ['xx.xx.xx.xx:9099']
labels:
instance: RabbitMQ-47.101.150.234
- job_name: 'Linux'
static_configs:
- targets: ['xx.xx.xx.xx:9100']
labels:
instance: Linux
- job_name: 'alertmanager'
static_configs:
- targets: ['xx.xx.xx.xx:9093']
最后查看效果測試效果

image.png