我是一塊磚,哪里需要哪里搬!隨著項目數(shù)量越來越多,總是遇到服務出現(xiàn)問題后都是由客戶先發(fā)現(xiàn)問題,再一層一層的反饋到開發(fā)人員,這樣不僅用戶體驗不好,還會出現(xiàn)服務掛了很長時間后才被發(fā)現(xiàn),日志已經(jīng)被自動清除,無法進行bug查找?;谶@種情況,我們組搭建起了elk,但是僅僅有elk還是不夠的,如何在故障產(chǎn)生后,及時的通知到相關人員這也是非常重要的。
本著開發(fā)量盡量少、功能盡量強大、對內存要求盡量低的原則,分析對比了網(wǎng)上多種基于日志的告警系統(tǒng),大致有以下幾種:
1、cat:大眾點評開源的告警系統(tǒng),功能強大,相對重量級;不符合需求!
2、 kafka+sparkstream:完全靠開發(fā);不符合需求!
3、sentinl:kibana插件,友好的web ui,非常方便管理,僅支持發(fā)送郵件
安裝方式非常的簡單:
?。?首先根據(jù)kibana的版本下載對應的版sentinl包 https://github.com/sirensolutions/sentinl/releases/tag,
./kibana-plugin install file:./sentinl-v6.0.1.zip
然后重啟kibana,便可在kibana界面上看到sentinl,如下圖:
sentinl的使用和安裝都非常的簡單,但是僅支持發(fā)送郵件,并且郵件內容中無法獲取到從es上查詢出的內容。不符合需求!
4、ElastAlert:無開發(fā)量;告警支持郵件、釘釘、微信、自定義等多種告警方式;能靈活從es中查詢出來的內容;符合需求!
1)安裝
首先在從下載源碼包:網(wǎng)上都說master不支持es5,需要切換到es5的分支,但是并沒找到es5的分支,故這里采用了es6分支,本文使用的es版本為5.4.0。ElastAlert只支持python2。上傳下載后的包到服務器上,解壓。
cd elastalert
pip install -r requirements.txt
python setup.py install
cp config.yaml.example config.yaml
修改config.yaml
# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
#rules_folder: example_rules
#rule目錄,rules可以存在多個規(guī)則
rules_folder: rules
# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
#minutes: 1
#每3秒向es請求數(shù)據(jù)
seconds: 3
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
#日志會延時進入es,這里表示查詢時間范圍
minutes: 15
# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: 200.200.200.65
# The Elasticsearch port
es_port: 9200
# Connect with TLS to Elasticsearch
#use_ssl: True
# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
#writeback_index: logstash-2018.06.25
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
# minutes: 2
days: 2
以上字段的解釋
Rules_folder:用來加載下一階段rule的設置,默認是example_rules
Run_every:用來設置定時向elasticsearch發(fā)送請求
Buffer_time:用來設置請求里時間字段的范圍,默認是45分鐘
Es_host:elasticsearch的host地址
Es_port:elasticsearch 對應的端口號
Use_ssl:可選的,選擇是否用SSL連接es,true或者false
Verify_certs:可選的,是否驗證TLS證書,設置為true或者false,默認為- true
Es_username:es認證的username
Es_password:es認證的password
Es_url_prefix:可選的,es的url前綴(我的理解是https或者http)
Es_send_get_body_as:可選的,查詢es的方式,默認的是GET
Writeback_index:elastalert產(chǎn)生的日志在elasticsearch中的創(chuàng)建的索引
Alert_time_limit:失敗重試的時間限制
修改后,執(zhí)行elastalert-create-index ,會自動在es中創(chuàng)建索引 elastalert_status,用來保存各個rule每次的執(zhí)行結果。
2)配置告警規(guī)則
ElastAlert支持11種告警規(guī)則,本文主要介紹frequency,其他的告警規(guī)則,如果后續(xù)有應用將會補上。
首先copy一份默認的
cp example_rules/example_frequency.yaml rules/test_frequency.yaml
在已有的es上隨便找來一個index進行測試,如下圖:

只要_type的值為syslog,就發(fā)送郵件,修改test_frequency.yaml
# Alert when the rate of events exceeds a threshold
# (Optional)
# Elasticsearch host
es_host: 200.200.200.65
# (Optional)
# Elasticsearch port
es_port: 9200
# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True
# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# (Required)
# Rule name, must be unique
name: "服務器都掛了你還在睡覺"
# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
#use_strftime_index: true
# (Required)
# Index to search, wildcard supported
#index: logstash-*
index: logstash-*
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
#在規(guī)定的時間范圍內發(fā)生N次就觸發(fā)事件
num_events: 1
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
hours: 3
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
#過濾出_type為syslog的數(shù)據(jù)
filter:
- term:
_type: "syslog"
# some_field: "some_value"
#_ query_string
# query: "_type: syslog"
# (Required)
# The alert is use when a match is found
#告警方式設置我email
alert:
- "email"
#告警郵件主題,以及動態(tài)填充的參數(shù),按順序對應
alert_subject: "Error {} @{}"
alert_subject_args:
- name
- "@timestamp"
#只發(fā)送alert_text的內容
alert_text_type: alert_text_only
#增加郵件內容
alert_text: |
> "你好啊,我是帥氣的笑笑"
> Name: {}
> Message: {}
> Host: {} ({})
alert_text_args:
- name
- message
- port
- host
smtp_host: smtp.163.com
smtp_port: 25
#用戶認證文件,需要user和password兩個屬性
# smtp_auth_file.yaml,為剛才編輯的配置文件
smtp_auth_file: /home/elk/test-zx/smtp_auth_file.yaml
email_reply_to:xxx@163.com
from_addr: xxx@163.com
# (required, email specific)
# a list of email addresses to send alerts to
email:
- "xxx@163.com"
/home/elk/test-zx/smtp_auth_file.yaml配置郵箱的smtp賬戶和密碼
user: "xxx@163.com"
password: "xxx"
可以使用下面兩種方式測試上面的規(guī)則:
elastalert-test-rule --config config.yaml ruels/test_frequency.yaml
python -m elastalert.elastalert --debug --config config.yaml --rule ruels/test_frequency.yaml
觸發(fā)事件內容:

在真正運行的時候,采用
python -m elastalert.elastalert --config config.yaml
即可。
這里只介紹了alert的一種方式,還支持command、釘釘、微信和自定義擴展