Kafka源碼分析(三)高吞吐核心——RecordAccumulator消息累積過程

Kafka為什么會有這么高吞吐?

Kafka的發(fā)送邏輯和TCP的像極了,當客戶端調(diào)用了producer.send(msg)后,Kafka的主線程并不會著急直接調(diào)用網(wǎng)絡(luò)底層將消息發(fā)送給Kafka Broker,而是將消息放入一個叫RecordAccumulator的數(shù)據(jù)結(jié)構(gòu)中。

RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
    serializedValue, headers, interceptCallback, remainingWaitMs);

其實放入RecordAccumulator中只是第一步,接下去真實的發(fā)送邏輯甚至不在當前的主線程中,所以發(fā)送邏輯整體是以異步調(diào)用的方式來組織的。當消息真正被網(wǎng)絡(luò)層發(fā)送并且得到Broker的成功反饋后,是通過Future的形式來通知回調(diào),所以為了不丟失異步鏈路,在放入RecordAccumulator后,有個RecordAppendResult的返回值。

回過來再看下RecordAccumulator這個數(shù)據(jù)結(jié)構(gòu)。

如下圖所示,RecordAccumulator整體是一個ConcurrentMap<TopicPartition, Deque<ProducerBatch>>混合數(shù)據(jù)機構(gòu),Key就是TopicPartition,Value是一個雙向隊列Deque,隊列的成員是一個個ProducerBatch。

RecordAccumulator

舉個栗子,如果是發(fā)送TopicPartition(topic1:0)的消息,邏輯可以簡述為,首先去找TopicPartition(topic1:0)這個Key所對應(yīng)的那個Deque隊列(如果沒有則創(chuàng)建一個),然后從Deque中拿到最后一個ProducerBatch對象,最后將消息放入最后一個ProducerBatch中。

private RecordAppendResult tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers,
                                        Callback callback, Deque<ProducerBatch> deque) {
    ProducerBatch last = deque.peekLast();
    if (last != null) {
        FutureRecordMetadata future = last.tryAppend(timestamp, key, value, headers, callback, time.milliseconds());
        if (future == null)
            last.closeForRecordAppends();
        else
            return new RecordAppendResult(future, deque.size() > 1 || last.isFull(), false);
    }
    return null;
}

可見ProducerBatch也是一個容器型數(shù)據(jù)結(jié)構(gòu),從下面的代碼可以看出,消息的數(shù)據(jù)是按順序放入(MemoryRecordsBuilder recordsBuilder)中,消息的事件回調(diào)future是按順序放入(List<Thunk> thunks)中。

public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
    if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
        return null;
    } else {
        Long checksum = this.recordsBuilder.append(timestamp, key, value, headers);
        this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
                recordsBuilder.compressionType(), key, value, headers));
        this.lastAppendTime = now;
        FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
                                                                timestamp, checksum,
                                                                key == null ? -1 : key.length,
                                                                value == null ? -1 : value.length);
        // we have to keep every future returned to the users in case the batch needs to be
        // split to several new batches and resent.
        thunks.add(new Thunk(callback, future));
        this.recordCount++;
        return future;
    }
}

至此,放入RecordAccumulator的過程算是講完了,下一篇聊下從RecordAccumulator拿出來。

在結(jié)束這篇前,有幾點注意下,Map是Concurrent系的,所以在TopicPartition級別是可以安全并發(fā)put、get、remove它的Deque。但是當涉及到的是同一個TopicPartition時,操縱的其實是同一個Deque,而Deque不是一個并發(fā)安全的集合,所以在對某一個具體的Deque進行增刪改時,需要使用鎖。

Deque<ProducerBatch> dq = getOrCreateDeque(tp);

synchronized (dq) {
    // Need to check if producer is closed again after grabbing the dequeue lock.
    if (closed)
        throw new KafkaException("Producer closed while send in progress");

    RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
    if (appendResult != null) {
        // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
        return appendResult;
    }

    MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
    ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
    FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));

    dq.addLast(batch);
    incomplete.add(batch);

    // Don't deallocate this buffer in the finally block as it's being used in the record batch
    buffer = null;

    return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容