Hystrix指標窗口實現(xiàn)原理

一、引子

Hystrix是一個熔斷中間件,能夠?qū)崿F(xiàn)fast-fail并走備用方案。Hystrix基于滑動窗口判定服務(wù)失敗占比選擇性熔斷?;瑒哟翱诘膶崿F(xiàn)方案有很多種,指標計數(shù)也有很多種實現(xiàn)常見的就是AtomicInteger進行原子增減維護計數(shù),具體的方案就不探討了。

Hystrix是基于Rxjava去實現(xiàn)的,那么如何利用RxJava實現(xiàn)指標的匯聚和滑動窗口實現(xiàn)呢?當然本篇不是作為教程去介紹RxJava的使用姿勢,本篇文章主要解說Hystrix是什么一個思路完成這項功能。

二、指標數(shù)據(jù)上傳

看HystrixCommand執(zhí)行的主入口

public Observable<R> toObservable() {
    final AbstractCommand<R> _cmd = this;

    final Action0 terminateCommandCleanup = new Action0() {

        @Override
        public void call() {
            if (_cmd.commandState.compareAndSet(CommandState.OBSERVABLE_CHAIN_CREATED, CommandState.TERMINAL)) {
                handleCommandEnd(false); //user code never ran
            } else if (_cmd.commandState.compareAndSet(CommandState.USER_CODE_EXECUTED, CommandState.TERMINAL)) {
                handleCommandEnd(true); //user code did run
            }
        }
    };

    //mark the command as CANCELLED and store the latency (in addition to standard cleanup)
    final Action0 unsubscribeCommandCleanup = new Action0() {
        @Override
        public void call() {
            if (_cmd.commandState.compareAndSet(CommandState.OBSERVABLE_CHAIN_CREATED, CommandState.UNSUBSCRIBED)) {
                .......省略干擾代碼...........
                handleCommandEnd(false); //user code never ran
            } else if (_cmd.commandState.compareAndSet(CommandState.USER_CODE_EXECUTED, CommandState.UNSUBSCRIBED)) {
                .......省略干擾代碼...........
                handleCommandEnd(true); //user code did run
            }
        }
    };

   .......省略干擾代碼...........

    return Observable.defer(new Func0<Observable<R>>() {

    .......省略干擾代碼...........

            return afterCache
                    .doOnTerminate(terminateCommandCleanup) 
                    .doOnUnsubscribe(unsubscribeCommandCleanup) 
                    .doOnCompleted(fireOnCompletedHook);
        }
});

我們的主入口Observable當doOnTerminatedoOnUnsubscribe的時候觸發(fā) handleCommandEnd 方法,從字面意思就是當command執(zhí)行結(jié)束處理一些事情。

private void handleCommandEnd(boolean commandExecutionStarted) {
    ........省略干擾代碼..........
    executionResult = executionResult.markUserThreadCompletion((int) userThreadLatency);
    if (executionResultAtTimeOfCancellation == null) {
        metrics.markCommandDone(executionResult, commandKey, threadPoolKey, commandExecutionStarted);
    } else {
        metrics.markCommandDone(executionResultAtTimeOfCancellation, commandKey, threadPoolKey, commandExecutionStarted);
    }
    ........省略干擾代碼..........
}

注意看 metrics.markCommandDone,調(diào)用了HystrixCommandMetrics的markCommandDone方法,把一個executionResult傳入了進來。ExecutionResult 這是個什么鬼呢?
我們截取部分代碼瀏覽下

public class ExecutionResult {
    private final EventCounts eventCounts;
    private final Exception failedExecutionException;
    private final Exception executionException;
    private final long startTimestamp;
    private final int executionLatency; //time spent in run() method
    private final int userThreadLatency; //time elapsed between caller thread submitting request and response being visible to it
    private final boolean executionOccurred;
    private final boolean isExecutedInThread;
    private final HystrixCollapserKey collapserKey;

    private static final HystrixEventType[] ALL_EVENT_TYPES = HystrixEventType.values();
    private static final int NUM_EVENT_TYPES = ALL_EVENT_TYPES.length;
    private static final BitSet EXCEPTION_PRODUCING_EVENTS = new BitSet(NUM_EVENT_TYPES);
    private static final BitSet TERMINAL_EVENTS = new BitSet(NUM_EVENT_TYPES);

以大家聰慧的頭腦應該能夠猜測到這個類就是當前HystrixCommand的 執(zhí)行結(jié)果記錄,只不過這個結(jié)果不僅僅是結(jié)果,也包含了各種狀態(tài)以及出現(xiàn)的異常。它的身影在Hystrix執(zhí)行原理里講的各Observable里出現(xiàn),跟著HystrixCommand整個生命周期。

回到上面講,當時command執(zhí)行完畢后,調(diào)用了HystrixCommandMetrics的markCommandDone方法

void markCommandDone(ExecutionResult executionResult, HystrixCommandKey commandKey, HystrixThreadPoolKey threadPoolKey, boolean executionStarted) {
    HystrixThreadEventStream.getInstance().executionDone(executionResult, commandKey, threadPoolKey);
    if (executionStarted) {
        concurrentExecutionCount.decrementAndGet();
    }
}

最終調(diào)用量HystrixThreadEventStream. executionDone方法的HystrixThreadEventStream是ThreadLocal方式,和當前線程綁定

//HystrixThreadEventStream.threadLocalStreams
private static final ThreadLocal<HystrixThreadEventStream> threadLocalStreams = new ThreadLocal<HystrixThreadEventStream>() {
    @Override
    protected HystrixThreadEventStream initialValue() {
        return new HystrixThreadEventStream(Thread.currentThread());
    }
};

executionDone代碼如下

public void executionDone(ExecutionResult executionResult, HystrixCommandKey commandKey, HystrixThreadPoolKey threadPoolKey) {
    HystrixCommandCompletion event = HystrixCommandCompletion.from(executionResult, commandKey, threadPoolKey);
    writeOnlyCommandCompletionSubject.onNext(event);
}

這里根據(jù) executionResult, threadpoolkey,comandKey,生成 了一個HystrixCommandCompletion然后通過writeOnlyCommandCompletionSubject寫入,writeOnlyCommandCompletionSubject整個東西,我們等會再看?,F(xiàn)在思考下HystrixCommandCompletion是什么?HystrixCommandCompletion包含了 ExecutionResultHystrixRequestContext,它是一種HystrixEvent,標識著command執(zhí)行完成的一個事件,該事件是當前這個點HystrixCommand的請求信息,執(zhí)行結(jié)果,狀態(tài)等數(shù)據(jù)的載體。


從上面類圖可以看到不僅僅HystrixCommandCompletion一種還有其它的Event,這里就不一一介紹了。

writeOnlyCommandCompletionSubject onNext的時候會觸發(fā) writeCommandCompletionsToShardedStreams執(zhí)行里面的call()方法。

  private static final Action1<HystrixCommandCompletion> writeCommandCompletionsToShardedStreams = new Action1<HystrixCommandCompletion>() {
    @Override
    public void call(HystrixCommandCompletion commandCompletion) {
        HystrixCommandCompletionStream commandStream = HystrixCommandCompletionStream.getInstance(commandCompletion.getCommandKey());
        commandStream.write(commandCompletion);

        if (commandCompletion.isExecutedInThread() || commandCompletion.isResponseThreadPoolRejected()) {
            HystrixThreadPoolCompletionStream threadPoolStream = HystrixThreadPoolCompletionStream.getInstance(commandCompletion.getThreadPoolKey());
            threadPoolStream.write(commandCompletion);
        }
    }
};

這個方法的意思是,會把HystrixCommandCompletion通過HystrixCommandCompletionStream寫入,如果當前command使用的是線程池隔離策略的話 會通過 HystrixThreadPoolCompletionStream再寫一遍。HystrixCommandCompletionStreamHystrixThreadPoolCompletionStream他們兩個概念類似,我們拿著前者解釋,這個是個什么東西。
HystrixCommandCompletionStream 以commandKey為key,維護在內(nèi)存中,調(diào)用它的write的方法實則是調(diào)用內(nèi)部屬性 writeOnlySubject的方法,writeOnlySubject是一個Subject(RxJava的東西),通過SerializedSubject保證其寫入的順序性,調(diào)用其share()方法獲得一個Observable也就是readOnlyStream,讓外界能夠讀這個Subject的數(shù)據(jù)??偨Y(jié)下Subject是連接兩個Observable之間的橋梁,它有兩個泛型元素標識著進出數(shù)據(jù)類型,全部都是HystrixCommandCompletion類型

HystrixCommandCompletionStream(final HystrixCommandKey commandKey) {
        this.commandKey = commandKey;

        this.writeOnlySubject = new SerializedSubject<HystrixCommandCompletion, HystrixCommandCompletion>(PublishSubject.<HystrixCommandCompletion>create());
        this.readOnlyStream = writeOnlySubject.share();
    }

我們從源頭開始梳理,明白了這個HystrixCommandCompletion數(shù)據(jù)流是如何寫入的(其它類型的的思路一致,就不一一解釋了),那它是如何被搜集起來呢?

三、指標數(shù)據(jù)搜集

追溯至AbstractCommand初始化

protected AbstractCommand(HystrixCommandGroupKey group, HystrixCommandKey key, HystrixThreadPoolKey threadPoolKey, HystrixCircuitBreaker circuitBreaker, HystrixThreadPool threadPool,
        HystrixCommandProperties.Setter commandPropertiesDefaults, HystrixThreadPoolProperties.Setter threadPoolPropertiesDefaults,
        HystrixCommandMetrics metrics, TryableSemaphore fallbackSemaphore, TryableSemaphore executionSemaphore,
        HystrixPropertiesStrategy propertiesStrategy, HystrixCommandExecutionHook executionHook) {

    ........省略代碼........
    this.metrics = initMetrics(metrics, this.commandGroup, this.threadPoolKey, this.commandKey, this.properties);
    ........省略代碼........
}

初始化command指標

HystrixCommandMetrics(final HystrixCommandKey key, HystrixCommandGroupKey commandGroup, HystrixThreadPoolKey threadPoolKey, HystrixCommandProperties properties, HystrixEventNotifier eventNotifier) {
    super(null);
    this.key = key;
    this.group = commandGroup;
    this.threadPoolKey = threadPoolKey;
    this.properties = properties;

    healthCountsStream = HealthCountsStream.getInstance(key, properties);
    rollingCommandEventCounterStream = RollingCommandEventCounterStream.getInstance(key, properties);
    cumulativeCommandEventCounterStream = CumulativeCommandEventCounterStream.getInstance(key, properties);

    rollingCommandLatencyDistributionStream = RollingCommandLatencyDistributionStream.getInstance(key, properties);
    rollingCommandUserLatencyDistributionStream = RollingCommandUserLatencyDistributionStream.getInstance(key, properties);
    rollingCommandMaxConcurrencyStream = RollingCommandMaxConcurrencyStream.getInstance(key, properties);
}

有很多各種 XXXStream.getInstance(),這些Stream就是針對各類用途進行指標搜集,統(tǒng)計的具體實現(xiàn),下面可以看下他們的UML類圖

Hystrix幾個別Stream類圖(并非所有子類)

BucketedCounterStream實現(xiàn)了基本的桶計數(shù)器,BucketedCumulativeCounterStream基于父類實現(xiàn)了累計計數(shù),BucketedRollingCounterStream基于父類實現(xiàn)了滑動窗口計數(shù)。兩者的子類就是對特定指標的具體實現(xiàn)。

接下來分兩塊累計計數(shù)和滑動窗口計數(shù),挑選其對應的CumulativeCommandEventCounterStream和HealthCountsStream進行詳細說明。

3.1、BucketedCounterStream 基本桶的實現(xiàn)
數(shù)據(jù)采集示意圖
protected BucketedCounterStream(final HystrixEventStream<Event> inputEventStream, final int numBuckets, final int bucketSizeInMs,
                                    final Func2<Bucket, Event, Bucket> appendRawEventToBucket) {
    this.numBuckets = numBuckets;
    this.reduceBucketToSummary = new Func1<Observable<Event>, Observable<Bucket>>() {
        @Override
        public Observable<Bucket> call(Observable<Event> eventBucket) {
            return eventBucket.reduce(getEmptyBucketSummary(), appendRawEventToBucket);
        }
    };

    final List<Bucket> emptyEventCountsToStart = new ArrayList<Bucket>();
    for (int i = 0; i < numBuckets; i++) {
        emptyEventCountsToStart.add(getEmptyBucketSummary());
    }

    this.bucketedStream = Observable.defer(new Func0<Observable<Bucket>>() {
        @Override
        public Observable<Bucket> call() {
            return inputEventStream
                    .observe()
                    .window(bucketSizeInMs, TimeUnit.MILLISECONDS)
                    .flatMap(reduceBucketToSummary)                
                    .startWith(emptyEventCountsToStart);   
        }
    });
}

這里父類的構(gòu)造方法主要成三個部分分別是
I. reduceBucketToSummary 每個桶如何計算聚合的數(shù)據(jù)

appendRawEventToBucket的實現(xiàn)由其子類決定,不過大同小異,我們自行拔下代碼看下HealthCountsStream, 可以看到他用的是HystrixCommandMetrics.appendEventToBucket

public static final Func2<long[], HystrixCommandCompletion, long[]> appendEventToBucket = new Func2<long[], HystrixCommandCompletion, long[]>() {
        @Override
        public long[] call(long[] initialCountArray, HystrixCommandCompletion execution) {
            ExecutionResult.EventCounts eventCounts = execution.getEventCounts();
            for (HystrixEventType eventType: ALL_EVENT_TYPES) {
                switch (eventType) {
                    case EXCEPTION_THROWN: break; //this is just a sum of other anyway - don't do the work here
                    default:
                        initialCountArray[eventType.ordinal()] += eventCounts.getCount(eventType);
                        break;
                }
            }
            return initialCountArray;
        }
    };
}

這個方法就是將一個桶時長內(nèi)的數(shù)據(jù)進行累計計數(shù)相加。initialCountArray可以看出一個桶內(nèi)前面的n個數(shù)據(jù)流的計算結(jié)果,數(shù)組的下標就是HystrixEventType 枚舉里事件的下標值。

II. emptyEventCountsToStart 第一個桶的定義,裝逼點叫創(chuàng)世桶

III. window窗口的定義,這里第一個參數(shù)就是每個桶的時長,第二個參數(shù)時間的單位。利用RxJava的window幫我們做聚合數(shù)據(jù)。

.window(bucketSizeInMs, TimeUnit.MILLISECONDS)

Bucket 時長如何計算
每個桶的時長如何得出的?這個也是基于我們的配置得出,拿HealthCountsStream舉例子。
metrics.rollingStats.timeInMilliseconds 滑動窗口時長 默認10000ms
metrics.healthSnapshot.intervalInMilliseconds 檢測健康狀態(tài)的時間片,默認500ms 在這里對應一個bucket的時長

滑動窗口內(nèi)桶的個數(shù) = 滑動窗口時長 / bucket時長

而 CumulativeCommandEventCounterStream
metrics.rollingStats.timeInMilliseconds 滑動窗口時長 默認10000ms
metrics.rollingStats.numBuckets 滑動窗口要切的桶個數(shù)

bucket時長 = 滑動窗口時長 / 桶個數(shù)

不同職能的 XXXStream對應的算法和對應的配置也不一樣,不過都一個套路,就不一一去展示了。

inputEventStream
inputEventStream 可以認為是窗口采集的數(shù)據(jù)流,這個數(shù)據(jù)流由其子類去傳遞,大致看了下

//HealthCountsStream
private HealthCountsStream(final HystrixCommandKey commandKey, final int numBuckets, final int bucketSizeInMs,
                               Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion) {
    super(HystrixCommandCompletionStream.getInstance(commandKey), numBuckets, bucketSizeInMs, reduceCommandCompletion, healthCheckAccumulator);
}

//RollingThreadPoolEventCounterStream
private RollingThreadPoolEventCounterStream(HystrixThreadPoolKey threadPoolKey, int numCounterBuckets, int counterBucketSizeInMs,
                                                Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion,
                                                Func2<long[], long[], long[]> reduceBucket) {
    super(HystrixThreadPoolCompletionStream.getInstance(threadPoolKey), numCounterBuckets, counterBucketSizeInMs, reduceCommandCompletion, reduceBucket);
}

我們發(fā)現(xiàn)這個 inputEventStream,其實就是 HystrixCommandCompletionStream、HystrixThreadPoolCompletionStream或者其它的,我們挑其中HystrixCommandCompletionStream看下,這個就是上面第二部分指標數(shù)據(jù)上傳里講的寫數(shù)據(jù)那個stream,inputEventStream.observe()也就是 HystrixCommandCompletionStream的 readOnlyStreamSubject的只讀Observable。(這里如果沒明白可以回到第二點看下結(jié)尾的部分)

3.2、累計計數(shù)器之CumulativeCommandEventCounterStream

先看下累計計數(shù)器的父類BucketedCumulativeCounterStream

protected BucketedCumulativeCounterStream(HystrixEventStream<Event> stream, int numBuckets, int bucketSizeInMs,
                                              Func2<Bucket, Event, Bucket> reduceCommandCompletion,
                                              Func2<Output, Bucket, Output> reduceBucket) {
    super(stream, numBuckets, bucketSizeInMs, reduceCommandCompletion);

    this.sourceStream = bucketedStream
            .scan(getEmptyOutputValue(), reduceBucket)
            .skip(numBuckets)
            ........省略代碼........
            
}

bucketedStream就是3.1里的數(shù)據(jù)匯聚后的一個一個桶流,這里執(zhí)行了scan方法,scan方法的意思就是會將當前窗口內(nèi)已經(jīng)提交的數(shù)據(jù)流進行按照順序進行遍歷并執(zhí)行指定的function邏輯,scan里有兩個參數(shù)第一個參數(shù)表示上一次執(zhí)行function的結(jié)果,第二個參數(shù)就是每次遍歷要執(zhí)行的function,scan完畢后skip numBuckets個bucket,可以認為丟棄掉已經(jīng)計算過的bucket。

scan里的function是如何實現(xiàn)呢?它也是實現(xiàn)累計計數(shù)的關(guān)鍵,由子類實現(xiàn),本小節(jié)也就是CumulativeCommandEventCounterStream來實現(xiàn)

CumulativeCommandEventCounterStream newStream = new CumulativeCommandEventCounterStream(commandKey, numBuckets, bucketSizeInMs,HystrixCommandMetrics.appendEventToBucket, HystrixCommandMetrics.bucketAggregator);

發(fā)現(xiàn)調(diào)用的是 HystrixCommandMetrics.bucketAggregator,我們看下其函數(shù)體

public static final Func2<long[], long[], long[]> bucketAggregator = new Func2<long[], long[], long[]>() {
    @Override
    public long[] call(long[] cumulativeEvents, long[] bucketEventCounts) {
        for (HystrixEventType eventType: ALL_EVENT_TYPES) {
            switch (eventType) {
                case EXCEPTION_THROWN:
                    for (HystrixEventType exceptionEventType: HystrixEventType.EXCEPTION_PRODUCING_EVENT_TYPES) {
                        cumulativeEvents[eventType.ordinal()] += bucketEventCounts[exceptionEventType.ordinal()];
                    }
                    break;
                default:
                    cumulativeEvents[eventType.ordinal()] += bucketEventCounts[eventType.ordinal()];
                    break;
            }
        }
        return cumulativeEvents;
    }
};

call() 方法有兩個參數(shù)第一個參數(shù)指的之前的計算結(jié)果,第二個參數(shù)指的當前桶內(nèi)的計數(shù),方法體不難理解,就是對各個時間的count計數(shù)累加。

如此,一個command的計數(shù)就實現(xiàn)了,其它累計計數(shù)也雷同。

3.3、滑動窗口之HealthCountsStream

直接父類代碼

protected BucketedRollingCounterStream(HystrixEventStream<Event> stream, final int numBuckets, int bucketSizeInMs,
                                           final Func2<Bucket, Event, Bucket> appendRawEventToBucket,
                                           final Func2<Output, Bucket, Output> reduceBucket) {
    super(stream, numBuckets, bucketSizeInMs, appendRawEventToBucket);
    Func1<Observable<Bucket>, Observable<Output>> reduceWindowToSummary = new Func1<Observable<Bucket>, Observable<Output>>() {
        @Override
        public Observable<Output> call(Observable<Bucket> window) {
            return window.scan(getEmptyOutputValue(), reduceBucket).skip(numBuckets);
        }
    };
    this.sourceStream = bucketedStream      
            .window(numBuckets, 1)          
            .flatMap(reduceWindowToSummary) 
            ........省略代碼........
}

依然像累計計數(shù)器一樣對父級的桶流數(shù)據(jù)進行操作,這里用的是window(),第一個參數(shù)表示桶的個數(shù),第二個參數(shù)表示一次移動的個數(shù)。這里numBuckets就是我們的滑動窗口桶個數(shù)

滑動窗口

第一排我們可以認為是移動前的滑動窗口的數(shù)據(jù),在執(zhí)行完 flatMap里的function之后,滑動窗口向前移動一個桶位,那么 23 5 2 0 這個桶就被丟棄了,然后新進了最新的桶 45 6 2 0。
那么每次滑動窗口內(nèi)的數(shù)據(jù)是如何被處理呢?就是flatMap里的function做的,reduceWindowToSummary 最終被具體的子類stream實現(xiàn),我們就研究下HealthCountsStream

private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = new Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts>() {
    @Override
    public HystrixCommandMetrics.HealthCounts call(HystrixCommandMetrics.HealthCounts healthCounts, long[] bucketEventCounts) {
        return healthCounts.plus(bucketEventCounts);
    }
};

//HystrixCommandMetrics.HealthCounts#plus
public HealthCounts plus(long[] eventTypeCounts) {
    long updatedTotalCount = totalCount;
    long updatedErrorCount = errorCount;

    long successCount = eventTypeCounts[HystrixEventType.SUCCESS.ordinal()];
    long failureCount = eventTypeCounts[HystrixEventType.FAILURE.ordinal()];
    long timeoutCount = eventTypeCounts[HystrixEventType.TIMEOUT.ordinal()];
    long threadPoolRejectedCount = eventTypeCounts[HystrixEventType.THREAD_POOL_REJECTED.ordinal()];
    long semaphoreRejectedCount = eventTypeCounts[HystrixEventType.SEMAPHORE_REJECTED.ordinal()];

    updatedTotalCount += (successCount + failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    updatedErrorCount += (failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    return new HealthCounts(updatedTotalCount, updatedErrorCount);
}

方法的實現(xiàn)也顯而易見,統(tǒng)計了當前滑動窗口內(nèi)成功數(shù)、失敗數(shù)、線程拒絕數(shù),超時數(shù).....

該stream的職責就是探測服務(wù)的可用性,也是Hystrix熔斷器是否生效依賴的數(shù)據(jù)源。

四、回顧

Hystrix的滑動窗口設(shè)計相對于其它可能稍微偏難理解些,其主要原因還是因為我們對RxJava的了解不夠,不過這不重要,只要耐心的多看幾遍就沒有什么問題。

本篇主要從指標數(shù)據(jù)上報到指標數(shù)據(jù)收集來逐步解開Hystrix指標搜集的神秘面紗。最后借用一大牛的圖匯總下本篇的內(nèi)容


參考文檔
官方文檔-How it works
官方文檔-configuration
Hystrix 1.5 滑動窗口實現(xiàn)原理總結(jié)


系列文章推薦
Hystrix熔斷框架介紹
Hystrix常用功能介紹
Hystrix執(zhí)行原理
Hystrix熔斷器執(zhí)行機制
Hystrix超時實現(xiàn)機制

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容