結(jié)論
原因1 配置問題 配置文件中的recordDataTTL、otherMetricsDataTTL配置不生效, 可以認(rèn)為是bug
解決方案:方法1:手動(dòng)設(shè)置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL。recordData的刪除會(huì)使用dayMetricsDataTTL配置的值。方法2:修改源碼
原因2 Skywalking Bug skywalking-6.2.0如果設(shè)置了nameSpace 在刪除index的時(shí)候有bug, 需要改源碼重新編譯
解決方案:方法1:把namespace設(shè)置為空。方法2:修改源碼
環(huán)境
Skywalking版本:6.2.0
ES實(shí)例:4核 * 14G, 三臺(tái)實(shí)例,基于docker起的
OAPServer:只有一臺(tái),1500M
agent節(jié)點(diǎn):也就是JVM實(shí)例大概50個(gè)
解決過程
1. 配置問題
通過查看源碼找到刪除ES歷史數(shù)據(jù)的核心代碼,如下。先根據(jù)該model(如Segement ,各種Metrics)的Downsampling配置和DataTTLConfig計(jì)算出截止時(shí)間,小于該截止時(shí)間的index需要?jiǎng)h除

DownSampling是一個(gè)枚舉
public enum Downsampling {
None(0, ""), Second(1, "second"), Minute(2, "minute"), Hour(3, "hour"), Day(4, "day"), Month(5, "month");
private final int value;
private final String name;
Downsampling(int value, String name) {
this.value = value;
this.name = name;
}
public int getValue() {
return value;
}
public String getName() {
return name;
}
}
DataTTLConfig就是配置各個(gè)類型過期時(shí)間的配置,record和metrics
@Setter
@Getter
public class DataTTLConfig {
private int recordDataTTL;
private int minuteMetricsDataTTL;
private int hourMetricsDataTTL;
private int dayMetricsDataTTL;
private int monthMetricsDataTTL;
}
回過頭看deleteHistory的邏輯,主要看一下計(jì)算截止時(shí)間timeBefore的邏輯,可以看到截止時(shí)間只與model的Downsampling和DataTTLConfig有關(guān)
StorageTTL的實(shí)現(xiàn)類為ElasticsearchStorageTTL, ElasticsearchStorageTTL的工作就是根據(jù)DownSampling返回對(duì)應(yīng)的TTLCalculator。舉例TTLCalculator的實(shí)現(xiàn)類EsMinuteTTLCalculator,可以看到會(huì)根據(jù)當(dāng)前時(shí)間和DataTTLConfig的MinuteMetricsDataTTL配置計(jì)算時(shí)間,單位為天 , 而EsHourTTLCalculator會(huì)使用DataTTLConfig的hourMetricsDataTTL計(jì)算時(shí)間,TTLCalculator與DataTTLConfig是有對(duì)應(yīng)關(guān)系的
public class ElasticsearchStorageTTL implements StorageTTL {
@Override public TTLCalculator calculator(Downsampling downsampling) {
switch (downsampling) {
case Month:
return new MonthTTLCalculator();
case Hour:
return new EsHourTTLCalculator();
case Minute:
return new EsMinuteTTLCalculator();
default:
return new DayTTLCalculator();
}
}
}
public class EsMinuteTTLCalculator implements TTLCalculator {
@Override public long timeBefore(DateTime currentTime, DataTTLConfig dataTTLConfig) {
return Long.valueOf(currentTime.plusDays(0 - dataTTLConfig.getMinuteMetricsDataTTL()).toString("yyyyMMdd"));
}
}
這里順便說(shuō)一下為什么recordDataTTL配置不會(huì)生效,Record類型的DownSampling為Second,但是從上面可以看到ElasticsearchStorageTTL中并沒有case Second,所以遇到Second的話會(huì)返回DayTTLCalculator,而DayTTLCalculator使用的dataTTLConfig的DayMetricsDataTTL,recordDataTTL也就沒有用了

接下來(lái)只用DataTTLConfig是如何獲取到的就可以了,從上面的deleteHistory代碼可以看到DataTTLConfig是從CoreModule的ConfigService中讀取的,其實(shí)也就是從application.yml的core Module配置讀取的,不過StorageModuleElasticsearchProvider在啟動(dòng)的時(shí)候會(huì)用StorageModuleElasticsearchConfig覆蓋CoreModule中的DataTTLConfig
org.apache.skywalking.oap.server.storage.plugin.elasticsearch.StorageModuleElasticsearchProvider
private void overrideCoreModuleTTLConfig() {
ConfigService configService = getManager().find(CoreModule.NAME).provider().getService(ConfigService.class);
configService.getDataTTLConfig().setRecordDataTTL(config.getRecordDataTTL());
configService.getDataTTLConfig().setMinuteMetricsDataTTL(config.getMinuteMetricsDataTTL());
configService.getDataTTLConfig().setHourMetricsDataTTL(config.getHourMetricsDataTTL());
configService.getDataTTLConfig().setDayMetricsDataTTL(config.getDayMetricsDataTTL());
configService.getDataTTLConfig().setMonthMetricsDataTTL(config.getMonthMetricsDataTTL());
}
看下配置StorageModuleElasticsearchConfig,我們主要看與otherMetricsDataTTL相關(guān)的配置,從下面的代碼可以看到,作者是想在otherMetricsDataTTL被賦值的時(shí)候自動(dòng)把minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL給賦值上。但是由于系統(tǒng)啟動(dòng)的時(shí)候是通過反射直接修改的Field,所以setOtherMetricsDataTTL方法并不會(huì)被觸發(fā),這也就是我們?cè)谂渲梦募信渲昧薿therMetricsDataTTL也不會(huì)生效的原因,系統(tǒng)只會(huì)用默認(rèn)的2
@Getter
public class StorageModuleElasticsearchConfig extends ModuleConfig {
@Setter private int recordDataTTL = 7;
@Setter private int minuteMetricsDataTTL = 2;
@Setter private int hourMetricsDataTTL = 2;
@Setter private int dayMetricsDataTTL = 2;
private int otherMetricsDataTTL = 0;
@Setter private int monthMetricsDataTTL = 18;
public void setOtherMetricsDataTTL(int otherMetricsDataTTL) {
if (otherMetricsDataTTL > 0) {
minuteMetricsDataTTL = otherMetricsDataTTL;
hourMetricsDataTTL = otherMetricsDataTTL;
dayMetricsDataTTL = otherMetricsDataTTL;
}
}
}
系統(tǒng)啟動(dòng)時(shí)通過反射賦值Config的相關(guān)代碼
org.apache.skywalking.oap.server.library.module.ModuleDefine
private void copyProperties(ModuleConfig dest, Properties src, String moduleName,
String providerName) throws IllegalAccessException {
if (dest == null) {
return;
}
Enumeration<?> propertyNames = src.propertyNames();
while (propertyNames.hasMoreElements()) {
String propertyName = (String)propertyNames.nextElement();
Class<? extends ModuleConfig> destClass = dest.getClass();
try {
Field field = getDeclaredField(destClass, propertyName);
field.setAccessible(true);
field.set(dest, src.get(propertyName));
} catch (NoSuchFieldException e) {
logger.warn(propertyName + " setting is not supported in " + providerName + " provider of " + moduleName + " module");
}
}
}
配置問題的解決方案
方法1:直接在配置文件中配置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL參數(shù),而不使用默認(rèn)的otherMetricsDataTTL。recordData的刪除會(huì)使用dayMetricsDataTTL配置的值
方法2:修改源碼,手動(dòng)調(diào)用一下setOtherMetricsDataTTL
image.png
2. 刪除Index的bug問題
這個(gè)問題相對(duì)比較明顯,從上面的deleteHistory中我們看到根據(jù)alias查詢出index,然后判斷時(shí)間過期的index會(huì)被調(diào)用刪除邏輯,問題就出在deleteIndex的地方。如下可以看到在刪除之前會(huì)在傳入的indexName前面添加namespace,問題是此時(shí)傳入的idnexName已經(jīng)包含了Namespace信息了(是根據(jù)alias直接從es中查詢出來(lái)的),再添加一次namespace就會(huì)導(dǎo)致找不到index,而導(dǎo)致刪除index失敗
public boolean deleteIndex(String indexName) throws IOException {
indexName = formatIndexName(indexName);
DeleteIndexRequest request = new DeleteIndexRequest(indexName);
DeleteIndexResponse response;
response = client.indices().delete(request);
logger.debug("delete {} index finished, isAcknowledged: {}", indexName, response.isAcknowledged());
return response.isAcknowledged();
}
public String formatIndexName(String indexName) {
if (StringUtils.isNotEmpty(namespace)) {
return namespace + "_" + indexName;
}
return indexName;
}
解決方案也很簡(jiǎn)單, 添加一個(gè)deleteIndexWithFullIndexName方法,這個(gè)地方直接調(diào)用deleteIndexWithFullIndexName即可
public boolean deleteIndex(String indexName) throws IOException {
String fullIndexName = formatIndexName(indexName);
return deleteIndexWithFullIndexName(fullIndexName);
}
public boolean deleteIndexWithFullIndexName(String fullIndexName) throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest(fullIndexName);
DeleteIndexResponse response;
response = client.indices().delete(request);
logger.debug("delete {} index finished, isAcknowledged: {}", fullIndexName, response.isAcknowledged());
return response.isAcknowledged();
}

