Prometheus+Alertmanager監(jiān)控RabbitMQ并配置企業(yè)微信告警

安裝Alertmanager

下載地址:https://prometheus.io/download/
下載完成后,將下載中軟件包上傳至Prometheus服務(wù)所在的機(jī)器

image.png

解壓alertmanager軟件包

tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /data
mv /data/alertmanager-0.21.0.linux-amd64 /data/alertmanager
進(jìn)入解壓后的alertmanager文件夾,修改alertmanager.yml文件,配置報警信息,alertmanager.yml 內(nèi)容如下:
cat alertmanager.yml 
global:
  resolve_timeout: 5m  #5分鐘內(nèi)沒收到告警表示警報已解除
route:                 
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
  receiver: 'wechat'
receivers:
- name: 'wechat'
  wechat_configs:
  - corp_id: 'XXX'
    to_party: 'XX'
    agent_id: '1000011'
    api_secret: 'secret'

安裝Exporter_rabbitmq

下載地址:https://github.com/kbudde/rabbitmq_exporter/releases
解壓運(yùn)行:

tar -zxvf rabbitmq_exporter-version
cd rabbitmq_exporter-version
RABBIT_USER=USER RABBIT_PASSWORD=PASSWORD OUTPUT_FORMAT=json PUBLISH_PORT=9099 RABBIT_URL=http://XXX:15672 nohup ./rabbitmq_exporter &

RABBIT_USER:Rabbitmq管理插件的用戶名
RABBIT_PASSWORD: Rabbitmq管理插件的用戶名密碼
OUTPUT_FORMAT:數(shù)據(jù)輸出格式為json
PUBLISH_PORT:監(jiān)聽端口
RABBIT_URL:Rabbitmq管理插件的地址
Rabbitmq_exporter起來后配置prometheus.yml添加RabbitMQ監(jiān)控

- job_name: 'RabbitMQ'
    static_configs:
    - targets: ['47.241.2.144:9099']
      labels:
        instance: RabbitMQ-47.241.2.144
    - targets: ['47.101.150.234:9099']
      labels:
        instance: RabbitMQ-47.101.150.234

我這里監(jiān)控了兩個節(jié)點(diǎn)

配置告警規(guī)則

在Prometheus.yml下配置規(guī)則文件

rule_files:
  - "rule.yml"
cat /data/prometheus/rule.yml
groups:
- name: Rabbitmq
  rules:
  - alert: Rabbitmq-down
    expr: rabbitmq_up{job='RabbitMQ'} != 1
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} is Down ! ! !"
      value: '{{ $value }}'
      summary:  "The host node is down"
- name: Rabbitmq disk free limit 
  rules:
  - alert: Rabbitmq disk free limit   status
    expr: rabbitmq_node_disk_free{job='RabbitMQ'} / 1024 / 1024  <= rabbitmq_node_disk_free_limit{job='RabbitMQ'} / 1024 / 1024 + 200
    labels:
      status: High
      team: Rabbitmq_monitor
    annotations:
      description: "Instance: {{ $labels.instance }} the rmq free disk is to low ! ! !"
      value: '{{ $value }} MB'
      summary:  "The rmq free disk too low"

添加需要的監(jiān)控項(xiàng)
Prometheus.yml整體配置

cat /data/prometheus/prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 192.168.1.178:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rule.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'RabbitMQ'
    static_configs:
    - targets: ['xx.xx.xx.xx:9099']
      labels:
        instance: RabbitMQ-47.241.2.144
    - targets: ['xx.xx.xx.xx:9099']
      labels:
        instance: RabbitMQ-47.101.150.234
  - job_name: 'Linux'
    static_configs:
    - targets: ['xx.xx.xx.xx:9100']
      labels:
        instance: Linux
  - job_name: 'alertmanager'
    static_configs:
    - targets: ['xx.xx.xx.xx:9093']

最后查看效果測試效果

image.png

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容