Skywalking對(duì)應(yīng)的ES歷史數(shù)據(jù)不刪除的問題

結(jié)論

原因1 配置問題 配置文件中的recordDataTTL、otherMetricsDataTTL配置不生效, 可以認(rèn)為是bug
解決方案:方法1:手動(dòng)設(shè)置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL。recordData的刪除會(huì)使用dayMetricsDataTTL配置的值。方法2:修改源碼

原因2 Skywalking Bug skywalking-6.2.0如果設(shè)置了nameSpace 在刪除index的時(shí)候有bug, 需要改源碼重新編譯
解決方案:方法1:把namespace設(shè)置為空。方法2:修改源碼

環(huán)境

Skywalking版本:6.2.0
ES實(shí)例:4核 * 14G, 三臺(tái)實(shí)例,基于docker起的
OAPServer:只有一臺(tái),1500M
agent節(jié)點(diǎn):也就是JVM實(shí)例大概50個(gè)

解決過程

1. 配置問題

通過查看源碼找到刪除ES歷史數(shù)據(jù)的核心代碼,如下。先根據(jù)該model(如Segement ,各種Metrics)的Downsampling配置和DataTTLConfig計(jì)算出截止時(shí)間,小于該截止時(shí)間的index需要?jiǎng)h除


image.png

DownSampling是一個(gè)枚舉

public enum Downsampling {
    None(0, ""), Second(1, "second"), Minute(2, "minute"), Hour(3, "hour"), Day(4, "day"), Month(5, "month");

    private final int value;
    private final String name;

    Downsampling(int value, String name) {
        this.value = value;
        this.name = name;
    }

    public int getValue() {
        return value;
    }

    public String getName() {
        return name;
    }
}

DataTTLConfig就是配置各個(gè)類型過期時(shí)間的配置,record和metrics

@Setter
@Getter
public class DataTTLConfig {
    private int recordDataTTL;
    private int minuteMetricsDataTTL;
    private int hourMetricsDataTTL;
    private int dayMetricsDataTTL;
    private int monthMetricsDataTTL;
}

回過頭看deleteHistory的邏輯,主要看一下計(jì)算截止時(shí)間timeBefore的邏輯,可以看到截止時(shí)間只與model的Downsampling和DataTTLConfig有關(guān)
StorageTTL的實(shí)現(xiàn)類為ElasticsearchStorageTTL, ElasticsearchStorageTTL的工作就是根據(jù)DownSampling返回對(duì)應(yīng)的TTLCalculator。舉例TTLCalculator的實(shí)現(xiàn)類EsMinuteTTLCalculator,可以看到會(huì)根據(jù)當(dāng)前時(shí)間和DataTTLConfig的MinuteMetricsDataTTL配置計(jì)算時(shí)間,單位為 , 而EsHourTTLCalculator會(huì)使用DataTTLConfig的hourMetricsDataTTL計(jì)算時(shí)間,TTLCalculator與DataTTLConfig是有對(duì)應(yīng)關(guān)系的

public class ElasticsearchStorageTTL implements StorageTTL {

    @Override public TTLCalculator calculator(Downsampling downsampling) {
        switch (downsampling) {
            case Month:
                return new MonthTTLCalculator();
            case Hour:
                return new EsHourTTLCalculator();
            case Minute:
                return new EsMinuteTTLCalculator();
            default:
                return new DayTTLCalculator();
        }
    }
}

public class EsMinuteTTLCalculator implements TTLCalculator {
    @Override public long timeBefore(DateTime currentTime, DataTTLConfig dataTTLConfig) {
        return Long.valueOf(currentTime.plusDays(0 - dataTTLConfig.getMinuteMetricsDataTTL()).toString("yyyyMMdd"));
    }
}

這里順便說(shuō)一下為什么recordDataTTL配置不會(huì)生效,Record類型的DownSampling為Second,但是從上面可以看到ElasticsearchStorageTTL中并沒有case Second,所以遇到Second的話會(huì)返回DayTTLCalculator,而DayTTLCalculator使用的dataTTLConfig的DayMetricsDataTTL,recordDataTTL也就沒有用了

Record類型的Model會(huì)使用Second類型的Downsmapling

接下來(lái)只用DataTTLConfig是如何獲取到的就可以了,從上面的deleteHistory代碼可以看到DataTTLConfig是從CoreModule的ConfigService中讀取的,其實(shí)也就是從application.yml的core Module配置讀取的,不過StorageModuleElasticsearchProvider在啟動(dòng)的時(shí)候會(huì)用StorageModuleElasticsearchConfig覆蓋CoreModule中的DataTTLConfig

org.apache.skywalking.oap.server.storage.plugin.elasticsearch.StorageModuleElasticsearchProvider

private void overrideCoreModuleTTLConfig() {
        ConfigService configService = getManager().find(CoreModule.NAME).provider().getService(ConfigService.class);

        configService.getDataTTLConfig().setRecordDataTTL(config.getRecordDataTTL());
        configService.getDataTTLConfig().setMinuteMetricsDataTTL(config.getMinuteMetricsDataTTL());
        configService.getDataTTLConfig().setHourMetricsDataTTL(config.getHourMetricsDataTTL());
        configService.getDataTTLConfig().setDayMetricsDataTTL(config.getDayMetricsDataTTL());
        configService.getDataTTLConfig().setMonthMetricsDataTTL(config.getMonthMetricsDataTTL());
    }

看下配置StorageModuleElasticsearchConfig,我們主要看與otherMetricsDataTTL相關(guān)的配置,從下面的代碼可以看到,作者是想在otherMetricsDataTTL被賦值的時(shí)候自動(dòng)把minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL給賦值上。但是由于系統(tǒng)啟動(dòng)的時(shí)候是通過反射直接修改的Field,所以setOtherMetricsDataTTL方法并不會(huì)被觸發(fā),這也就是我們?cè)谂渲梦募信渲昧薿therMetricsDataTTL也不會(huì)生效的原因,系統(tǒng)只會(huì)用默認(rèn)的2

@Getter
public class StorageModuleElasticsearchConfig extends ModuleConfig {
    @Setter private int recordDataTTL = 7;
    @Setter private int minuteMetricsDataTTL = 2;
    @Setter private int hourMetricsDataTTL = 2;
    @Setter private int dayMetricsDataTTL = 2;
    private int otherMetricsDataTTL = 0;
    @Setter private int monthMetricsDataTTL = 18;

    public void setOtherMetricsDataTTL(int otherMetricsDataTTL) {
        if (otherMetricsDataTTL > 0) {
            minuteMetricsDataTTL = otherMetricsDataTTL;
            hourMetricsDataTTL = otherMetricsDataTTL;
            dayMetricsDataTTL = otherMetricsDataTTL;
        }
    }
}

系統(tǒng)啟動(dòng)時(shí)通過反射賦值Config的相關(guān)代碼

org.apache.skywalking.oap.server.library.module.ModuleDefine

private void copyProperties(ModuleConfig dest, Properties src, String moduleName,
        String providerName) throws IllegalAccessException {
        if (dest == null) {
            return;
        }
        Enumeration<?> propertyNames = src.propertyNames();
        while (propertyNames.hasMoreElements()) {
            String propertyName = (String)propertyNames.nextElement();
            Class<? extends ModuleConfig> destClass = dest.getClass();

            try {
                Field field = getDeclaredField(destClass, propertyName);
                field.setAccessible(true);
                field.set(dest, src.get(propertyName));
            } catch (NoSuchFieldException e) {
                logger.warn(propertyName + " setting is not supported in " + providerName + " provider of " + moduleName + " module");
            }
        }
    }
配置問題的解決方案

方法1:直接在配置文件中配置minuteMetricsDataTTL、hourMetricsDataTTL 、dayMetricsDataTTL參數(shù),而不使用默認(rèn)的otherMetricsDataTTL。recordData的刪除會(huì)使用dayMetricsDataTTL配置的值

方法2:修改源碼,手動(dòng)調(diào)用一下setOtherMetricsDataTTL


image.png

2. 刪除Index的bug問題

這個(gè)問題相對(duì)比較明顯,從上面的deleteHistory中我們看到根據(jù)alias查詢出index,然后判斷時(shí)間過期的index會(huì)被調(diào)用刪除邏輯,問題就出在deleteIndex的地方。如下可以看到在刪除之前會(huì)在傳入的indexName前面添加namespace,問題是此時(shí)傳入的idnexName已經(jīng)包含了Namespace信息了(是根據(jù)alias直接從es中查詢出來(lái)的),再添加一次namespace就會(huì)導(dǎo)致找不到index,而導(dǎo)致刪除index失敗

public boolean deleteIndex(String indexName) throws IOException {
        indexName = formatIndexName(indexName);
        DeleteIndexRequest request = new DeleteIndexRequest(indexName);
        DeleteIndexResponse response;
        response = client.indices().delete(request);
        logger.debug("delete {} index finished, isAcknowledged: {}", indexName, response.isAcknowledged());
        return response.isAcknowledged();
    }

public String formatIndexName(String indexName) {
        if (StringUtils.isNotEmpty(namespace)) {
            return namespace + "_" + indexName;
        }
        return indexName;
    }

解決方案也很簡(jiǎn)單, 添加一個(gè)deleteIndexWithFullIndexName方法,這個(gè)地方直接調(diào)用deleteIndexWithFullIndexName即可

public boolean deleteIndex(String indexName) throws IOException {
        String fullIndexName = formatIndexName(indexName);
        return deleteIndexWithFullIndexName(fullIndexName);
    }

    public boolean deleteIndexWithFullIndexName(String fullIndexName) throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest(fullIndexName);
        DeleteIndexResponse response;
        response = client.indices().delete(request);
        logger.debug("delete {} index finished, isAcknowledged: {}", fullIndexName, response.isAcknowledged());
        return response.isAcknowledged();
    }
image.png
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容