基于ELK實現(xiàn)日志告警

我是一塊磚,哪里需要哪里搬!隨著項目數(shù)量越來越多,總是遇到服務出現(xiàn)問題后都是由客戶先發(fā)現(xiàn)問題,再一層一層的反饋到開發(fā)人員,這樣不僅用戶體驗不好,還會出現(xiàn)服務掛了很長時間后才被發(fā)現(xiàn),日志已經(jīng)被自動清除,無法進行bug查找?;谶@種情況,我們組搭建起了elk,但是僅僅有elk還是不夠的,如何在故障產(chǎn)生后,及時的通知到相關人員這也是非常重要的。
  本著開發(fā)量盡量少、功能盡量強大、對內存要求盡量低的原則,分析對比了網(wǎng)上多種基于日志的告警系統(tǒng),大致有以下幾種:
1、cat:大眾點評開源的告警系統(tǒng),功能強大,相對重量級;不符合需求!
2、 kafka+sparkstream:完全靠開發(fā);不符合需求!
3、sentinl:kibana插件,友好的web ui,非常方便管理,僅支持發(fā)送郵件
 安裝方式非常的簡單:
?。?首先根據(jù)kibana的版本下載對應的版sentinl包 https://github.com/sirensolutions/sentinl/releases/tag,
            ./kibana-plugin install file:./sentinl-v6.0.1.zip
然后重啟kibana,便可在kibana界面上看到sentinl,如下圖:

image

image

sentinl的使用和安裝都非常的簡單,但是僅支持發(fā)送郵件,并且郵件內容中無法獲取到從es上查詢出的內容。不符合需求!

4、ElastAlert:無開發(fā)量;告警支持郵件、釘釘、微信、自定義等多種告警方式;能靈活從es中查詢出來的內容;符合需求!
  1)安裝
  首先在從下載源碼包:網(wǎng)上都說master不支持es5,需要切換到es5的分支,但是并沒找到es5的分支,故這里采用了es6分支,本文使用的es版本為5.4.0。ElastAlert只支持python2。上傳下載后的包到服務器上,解壓。

  cd elastalert
  pip install -r requirements.txt
  python setup.py install
  cp config.yaml.example config.yaml

修改config.yaml

# This is the folder that contains the rule yaml files
# Any .yaml file will be loaded as a rule
#rules_folder: example_rules
#rule目錄,rules可以存在多個規(guī)則
rules_folder: rules

# How often ElastAlert will query Elasticsearch
# The unit can be anything from weeks to seconds
run_every:
  #minutes: 1
  #每3秒向es請求數(shù)據(jù)
  seconds: 3

# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
  #日志會延時進入es,這里表示查詢時間范圍
  minutes: 15

# The Elasticsearch hostname for metadata writeback
# Note that every rule can have its own Elasticsearch host
es_host: 200.200.200.65

# The Elasticsearch port
es_port: 9200

# Connect with TLS to Elasticsearch
#use_ssl: True

# Option basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# The index on es_host which is used for metadata storage
# This can be a unmapped index, but it is recommended that you run
# elastalert-create-index to set a mapping
writeback_index: elastalert_status
#writeback_index: logstash-2018.06.25

# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
#  minutes: 2
  days: 2

以上字段的解釋

Rules_folder:用來加載下一階段rule的設置,默認是example_rules
Run_every:用來設置定時向elasticsearch發(fā)送請求
Buffer_time:用來設置請求里時間字段的范圍,默認是45分鐘
Es_host:elasticsearch的host地址
Es_port:elasticsearch 對應的端口號
Use_ssl:可選的,選擇是否用SSL連接es,true或者false
Verify_certs:可選的,是否驗證TLS證書,設置為true或者false,默認為- true
Es_username:es認證的username
Es_password:es認證的password
Es_url_prefix:可選的,es的url前綴(我的理解是https或者http)
Es_send_get_body_as:可選的,查詢es的方式,默認的是GET
Writeback_index:elastalert產(chǎn)生的日志在elasticsearch中的創(chuàng)建的索引
Alert_time_limit:失敗重試的時間限制

修改后,執(zhí)行elastalert-create-index ,會自動在es中創(chuàng)建索引 elastalert_status,用來保存各個rule每次的執(zhí)行結果。

2)配置告警規(guī)則
  ElastAlert支持11種告警規(guī)則,本文主要介紹frequency,其他的告警規(guī)則,如果后續(xù)有應用將會補上。
首先copy一份默認的

cp example_rules/example_frequency.yaml rules/test_frequency.yaml

在已有的es上隨便找來一個index進行測試,如下圖:


image.png

只要_type的值為syslog,就發(fā)送郵件,修改test_frequency.yaml

# Alert when the rate of events exceeds a threshold

# (Optional)
# Elasticsearch host
es_host: 200.200.200.65

# (Optional)
# Elasticsearch port
es_port: 9200

# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True

# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# (Required)
# Rule name, must be unique
name: "服務器都掛了你還在睡覺"

# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency

#use_strftime_index: true

# (Required)
# Index to search, wildcard supported
#index: logstash-*
index: logstash-*

# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
#在規(guī)定的時間范圍內發(fā)生N次就觸發(fā)事件
num_events: 1

# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
  hours: 3

# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
#過濾出_type為syslog的數(shù)據(jù)
filter:
- term:
    _type: "syslog"
#   some_field: "some_value"
#_ query_string
#   query: "_type: syslog"

# (Required)
# The alert is use when a match is found
#告警方式設置我email
alert:
- "email"

#告警郵件主題,以及動態(tài)填充的參數(shù),按順序對應
alert_subject: "Error {} @{}"
alert_subject_args:
  - name
  - "@timestamp"

#只發(fā)送alert_text的內容
alert_text_type: alert_text_only

#增加郵件內容
alert_text: |
  > "你好啊,我是帥氣的笑笑"
  > Name: {}
  > Message: {}
  > Host: {} ({})

alert_text_args:
  - name
  - message
  - port
  - host

smtp_host: smtp.163.com
smtp_port: 25

#用戶認證文件,需要user和password兩個屬性
# smtp_auth_file.yaml,為剛才編輯的配置文件
smtp_auth_file: /home/elk/test-zx/smtp_auth_file.yaml
email_reply_to:xxx@163.com
from_addr: xxx@163.com


# (required, email specific)
# a list of email addresses to send alerts to
email:
- "xxx@163.com"

/home/elk/test-zx/smtp_auth_file.yaml配置郵箱的smtp賬戶和密碼

user: "xxx@163.com"
password: "xxx"

可以使用下面兩種方式測試上面的規(guī)則:

elastalert-test-rule --config config.yaml ruels/test_frequency.yaml 
python -m elastalert.elastalert --debug --config config.yaml --rule ruels/test_frequency.yaml 

觸發(fā)事件內容:


image.png

在真正運行的時候,采用

python -m elastalert.elastalert --config config.yaml

即可。

這里只介紹了alert的一種方式,還支持command、釘釘、微信和自定義擴展

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

友情鏈接更多精彩內容