服務(wù)器在運(yùn)行期間會(huì)出現(xiàn)各種故障,其中一個(gè)比較常見(jiàn)的問(wèn)題就是磁盤(pán)無(wú)法寫(xiě)入,成為只讀狀態(tài),導(dǎo)致靠寫(xiě)入信息的服務(wù)出現(xiàn)問(wèn)題,造成業(yè)務(wù)崩潰。通常會(huì)從上至下的排查很多久才能找到問(wèn)題的根源。
好了領(lǐng)導(dǎo)交給的任務(wù)不會(huì)硬著頭皮上,在網(wǎng)上找了許久,沒(méi)有發(fā)現(xiàn)很好的監(jiān)控模板,只好自己動(dòng)手豐衣足食。
解決思路:
既然是監(jiān)控磁盤(pán)可寫(xiě)狀態(tài),那么只要寫(xiě)入成功輸出0失敗輸出1,然后告警就可以了。
- 失敗嘗試:
linux 系統(tǒng)下首先想到的是shell腳本解決問(wèn)題,但在嘗試的時(shí)候發(fā)現(xiàn)如果寫(xiě)入失敗會(huì)直接拋出異常,無(wú)法正常按照自己的想法輸出狀態(tài)碼。折騰許久果斷放棄! - 成功案例:
除shell外,另一大腳本語(yǔ)言python成為首選,使用try來(lái)處理異常,順利的解決了問(wèn)題,下面就把腳本、模板貼出來(lái)提供參考。
附件下載
1.創(chuàng)建zabbix key
vim /usr/local/zabbix/etc/zabbix_agentd.conf.d/disk.conf
UserParameter=disk.health.check,/usr/bin/python /usr/local/zabbix/scripts/disk_health_check.py
2.創(chuàng)建disk_health_check 腳本(需要執(zhí)行權(quán)限)
vim /usr/local/zabbix/scripts/disk_health_check.py
#!/usr/bin/python
#磁盤(pán)只讀檢測(cè)腳本正常0,異常1
#jipeng 2016/3/26
import time
try:
fileDisk = open ( '/usr/local/zabbix/scripts/disk_health_check.log', 'w' )
old = str(time.time())
fileDisk.write(old)
fileDisk = open ( '/usr/local/zabbix/scripts/disk_health_check.log' )
new=fileDisk.read()
if (old==new):
print '0'
else:
print '1'
fileDisk.close()
except:
print '1'
3.zbx_export_templates.xml 模板文件
本地創(chuàng)建xml文件然后導(dǎo)入即可使用
<?xml version="1.0" encoding="UTF-8"?>
<zabbix_export>
<version>3.0</version>
<date>2016-11-18T05:15:07Z</date>
<groups>
<group>
<name>Template-Hardware</name>
</group>
</groups>
<templates>
<template>
<template>DiskHealth-Check</template>
<name>DiskHealth-Check</name>
<description>磁盤(pán)寫(xiě)入輸出狀態(tài)0正常,1異常</description>
<groups>
<group>
<name>Template-Hardware</name>
</group>
</groups>
<applications>
<application>
<name>diskHealth</name>
</application>
</applications>
<items>
<item>
<name>DiskHealthCheck</name>
<type>0</type>
<snmp_community/>
<multiplier>0</multiplier>
<snmp_oid/>
<key>disk.health.check</key>
<delay>60</delay>
<history>90</history>
<trends>365</trends>
<status>0</status>
<value_type>3</value_type>
<allowed_hosts/>
<units/>
<delta>0</delta>
<snmpv3_contextname/>
<snmpv3_securityname/>
<snmpv3_securitylevel>0</snmpv3_securitylevel>
<snmpv3_authprotocol>0</snmpv3_authprotocol>
<snmpv3_authpassphrase/>
<snmpv3_privprotocol>0</snmpv3_privprotocol>
<snmpv3_privpassphrase/>
<formula>1</formula>
<delay_flex/>
<params/>
<ipmi_sensor/>
<data_type>0</data_type>
<authtype>0</authtype>
<username/>
<password/>
<publickey/>
<privatekey/>
<port/>
<description/>
<inventory_link>0</inventory_link>
<applications>
<application>
<name>diskHealth</name>
</application>
</applications>
<valuemap/>
<logtimefmt/>
</item>
</items>
<discovery_rules/>
<macros/>
<templates/>
<screens/>
</template>
</templates>
<triggers>
<trigger>
<expression>{DiskHealth-Check:disk.health.check.count(#2,1,"eq")}>1</expression>
<name>DiskHealthCheck</name>
<url/>
<status>0</status>
<priority>2</priority>
<description>2次內(nèi)觸發(fā)器等于1的次數(shù)大于1(等于2)次就會(huì)告警
判斷兩次都異常即告警</description>
<type>0</type>
<dependencies/>
</trigger>
</triggers>
<graphs>
<graph>
<name>磁盤(pán)健康檢查</name>
<width>900</width>
<height>200</height>
<yaxismin>0.0000</yaxismin>
<yaxismax>100.0000</yaxismax>
<show_work_period>1</show_work_period>
<show_triggers>1</show_triggers>
<type>0</type>
<show_legend>1</show_legend>
<show_3d>0</show_3d>
<percent_left>0.0000</percent_left>
<percent_right>0.0000</percent_right>
<ymin_type_1>0</ymin_type_1>
<ymax_type_1>0</ymax_type_1>
<ymin_item_1>0</ymin_item_1>
<ymax_item_1>0</ymax_item_1>
<graph_items>
<graph_item>
<sortorder>0</sortorder>
<drawtype>3</drawtype>
<color>00C800</color>
<yaxisside>0</yaxisside>
<calc_fnc>7</calc_fnc>
<type>0</type>
<item>
<host>DiskHealth-Check</host>
<key>disk.health.check</key>
</item>
</graph_item>
</graph_items>
</graph>
</graphs>
</zabbix_export>