簡介
引入了skywalking后,雖然界面可以清晰的看到鏈路情況,但是對于開發(fā)而言,更多的是在出現(xiàn)問題的時候我們才會主動去查詢鏈路信息,而skywalking提供了告警功能可以及時讓我們注意到問題。
告警主要有兩塊內容組成
- 告警規(guī)則
- 鉤子
告警使用
規(guī)則
- 告警名稱,唯一,必須_rule結尾
- 監(jiān)控名稱,來自官方的一些分析數據,位于
skywalking/oap-server/generated-analysis/src/main/resources/official_analysis.oal([https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md] (https://github.com/apache/skywalking/blob/master/docs/en/guides/backend-oal-scripts.md)
) -
包含名稱,服務,斷點等,如圖:image.png
下面是官方等sample里面的內容
# [Optional] Default, match all services in this metrics
include-names:
- dubbox-provider
- dubbox-consumer
- Threshold,目標值。比如,時間1000ms,成功率90
- OP,> 大于, < 小雨, = 等于
- Period,告警檢測周期
- Count,數量
- Silence period,沉默周期,如果告警在A時間觸發(fā),在A+sp時間內只會觸發(fā)一次告警,大家應該經歷過被已知告警轟炸的經歷,所以這個還是很有必要的
官方還給出了默認告警規(guī)則,這里就不做過多介紹了。
We provided a default alarm-setting.yml in our distribution only for convenience, which including following rules
- Service average response time over 1s in last 3 minutes.
- Service success rate lower than 80% in last 2 minutes.
- Service 90% response time is over 1s in last 3 minutes
- Service Instance average response time over 1s in last 2 minutes.
- Endpoint average response time over 1s in last 2 minutes.
鉤子
在上面有一篇文章介紹Webhook的內容。它主要就是我們日常告警中的一個回調功能。
Webhook requires the peer is a web container. The alarm message will send through HTTP post by application/json content type. The JSON format is based on List<org.apache.skywalking.oap.server.core.alarm.AlarmMessage> with following key information.
@Setter(AccessLevel.PUBLIC)
@Getter(AccessLevel.PUBLIC)
public class AlarmMessage {
public static AlarmMessage NONE = new NoAlarm();
private int scopeId;
private String name;
private int id0;
private int id1;
private String alarmMessage;
private long startTime;
private static class NoAlarm extends AlarmMessage {
}
}
這里用到了lombok,個人覺得開源組件就不應該用lombok,也就多幾行Get/Set,所見即所得還是更符合人類習慣的。lombok它是屬于業(yè)務開發(fā)的蜜。
回歸正題,下面是發(fā)送的代碼
public class WebhookCallback implements AlarmCallback {
@Override public void doAlarm(List<AlarmMessage> alarmMessage) {
if (remoteEndpoints.size() == 0) {
return;
}
CloseableHttpClient httpClient = HttpClients.custom().build();
try {
remoteEndpoints.forEach(url -> {
HttpPost post = new HttpPost(url);
post.setConfig(requestConfig);
post.setHeader("Accept", "application/json");
post.setHeader("Content-type", "application/json");
StringEntity entity = null;
try {
entity = new StringEntity(gson.toJson(alarmMessage));
post.setEntity(entity);
CloseableHttpResponse httpResponse = httpClient.execute(post);
StatusLine statusLine = httpResponse.getStatusLine();
if (statusLine != null && statusLine.getStatusCode() != 200) {
logger.error("send alarm to " + url + " failure. Response code: " + statusLine.getStatusCode());
}
} catch (UnsupportedEncodingException e) {
logger.error("Alarm to JSON error, " + e.getMessage(), e);
} catch (ClientProtocolException e) {
logger.error("send alarm to " + url + " failure.", e);
} catch (IOException e) {
logger.error("send alarm to " + url + " failure.", e);
}
});
} finally {
try {
httpClient.close();
} catch (IOException e) {
logger.error(e.getMessage(), e);
}
}
}
}
而它又是org.apache.skywalking.oap.server.core.alarm.provider.AlarmCore#start觸發(fā)的,它是一個延遲線程池
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {}, 10, 10, TimeUnit.SECONDS);
頁面效果
我在dubbo服務端設置了隨機sleep,然后可以看到出現(xiàn)了告警信息

6.x 官方告警文檔
https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md
