跟我學(xué)Kafka源碼Producer分析

我的原文博客地址是:http://flychao88.iteye.com/blog/2266611

本章主要講解分析Kafka的Producer的業(yè)務(wù)邏輯,分發(fā)邏輯和負(fù)載邏輯都在Producer中維護(hù)。

一、Kafka的總體結(jié)構(gòu)圖

(圖片轉(zhuǎn)發(fā))

二、Producer源碼分析

class Producer[K,V](val config: ProducerConfig,

private val eventHandler: EventHandler[K,V])? // only for unit testing

extends Logging {

private val hasShutdown = new AtomicBoolean(false)

//異步發(fā)送隊(duì)列

private val queue = new LinkedBlockingQueue[KeyedMessage[K,V]](config.queueBufferingMaxMessages)

private var sync: Boolean = true

//異步處理線程

private var producerSendThread: ProducerSendThread[K,V] = null

private val lock = new Object()

//根據(jù)從配置文件中載入的信息封裝成ProducerConfig類

//判斷發(fā)送類型是同步,還是異步,如果是異步則啟動(dòng)一個(gè)異步處理線程

config.producerType match {

case "sync" =>

case "async" =>

sync = false

producerSendThread =

new ProducerSendThread[K,V]("ProducerSendThread-" + config.clientId,

queue,

ventHandler,

config.queueBufferingMaxMs,

config.batchNumMessages,

config.clientId)

producerSendThread.start()

}

private val producerTopicStats = ProducerTopicStatsRegistry.getProducerTopicStats(config.clientId)

KafkaMetricsReporter.startReporters(config.props)

AppInfo.registerInfo()

def this(config: ProducerConfig) =

this(config,

new DefaultEventHandler[K,V](config,

Utils.createObject[Partitioner](config.partitionerClass, config.props),

Utils.createObject[Encoder[V]](config.serializerClass, config.props),

Utils.createObject[Encoder[K]](config.keySerializerClass, config.props),

new ProducerPool(config)))

/**

* Sends the data, partitioned by key to the topic using either the

* synchronous or the asynchronous producer

* @param messages the producer data object that encapsulates the topic, key and message data

*/

def send(messages: KeyedMessage[K,V]*) {

lock synchronized {

if (hasShutdown.get)

throw new ProducerClosedException

recordStats(messages)

sync match {

case true => eventHandler.handle(messages)

case false => asyncSend(messages)

}

}

}

private def recordStats(messages: Seq[KeyedMessage[K,V]]) {

for (message <- messages) {

producerTopicStats.getProducerTopicStats(message.topic).messageRate.mark()

producerTopicStats.getProducerAllTopicsStats.messageRate.mark()

}

}

//異步發(fā)送流程

//將messages異步放到queue里面,等待異步線程獲取

private def asyncSend(messages: Seq[KeyedMessage[K,V]]) {

for (message <- messages) {

val added = config.queueEnqueueTimeoutMs match {

case 0? =>

queue.offer(message)

case _? =>

try {

config.queueEnqueueTimeoutMs < 0 match {

case true =>

queue.put(message)

true

case _ =>

queue.offer(message, config.queueEnqueueTimeoutMs, TimeUnit.MILLISECONDS)

}

}

catch {

case e: InterruptedException =>

false

}

}

if(!added) {

producerTopicStats.getProducerTopicStats(message.topic).droppedMessageRate.mark()

producerTopicStats.getProducerAllTopicsStats.droppedMessageRate.mark()

throw new QueueFullException("Event queue is full of unsent messages, could not send event: " + message.toString)

}else {

trace("Added to send queue an event: " + message.toString)

trace("Remaining queue size: " + queue.remainingCapacity)

}

}

}

/**

* Close API to close the producer pool connections to all Kafka brokers. Also closes

* the zookeeper client connection if one exists

*/

def close() = {

lock synchronized {

val canShutdown = hasShutdown.compareAndSet(false, true)

if(canShutdown) {

info("Shutting down producer")

val startTime = System.nanoTime()

KafkaMetricsGroup.removeAllProducerMetrics(config.clientId)

if (producerSendThread != null)

producerSendThread.shutdown

eventHandler.close

info("Producer shutdown completed in " + (System.nanoTime() - startTime) / 1000000 + " ms")

}

}

}

}

說(shuō)明:

上面這段代碼很多方法我加了中文注釋,首先要初始化一系列參數(shù),比如異步消息隊(duì)列queue,是否是同步sync,異步同步數(shù)據(jù)線程ProducerSendThread,其實(shí)重點(diǎn)就是ProducerSendThread這個(gè)類,從隊(duì)列中取出數(shù)據(jù)并讓kafka.producer.EventHandler將消息發(fā)送到broker。這個(gè)代碼量不多,但是包含了很多內(nèi)容,通過(guò)config.producerType判斷是同步發(fā)送還是異步發(fā)送,每一種發(fā)送方式都有相關(guān)類支持,下面我們將重點(diǎn)介紹這二種類型。

1、同步發(fā)送

private def dispatchSerializedData(messages: Seq[KeyedMessage[K,Message]]): Seq[KeyedMessage[K, Message]] = {

//分區(qū)并且整理方法

val partitionedDataOpt = partitionAndCollate(messages)

partitionedDataOpt match {

case Some(partitionedData) =>

val failedProduceRequests = new ArrayBuffer[KeyedMessage[K,Message]]

try {

for ((brokerid, messagesPerBrokerMap) <- partitionedData) {

if (logger.isTraceEnabled)

messagesPerBrokerMap.foreach(partitionAndEvent =>

trace("Handling event for Topic: %s, Broker: %d, Partitions: %s".format(partitionAndEvent._1, brokerid, partitionAndEvent._2)))

val messageSetPerBroker = groupMessagesToSet(messagesPerBrokerMap)

val failedTopicPartitions = send(brokerid, messageSetPerBroker)

failedTopicPartitions.foreach(topicPartition => {

messagesPerBrokerMap.get(topicPartition) match {

case Some(data) => failedProduceRequests.appendAll(data)

case None => // nothing

}

})

}

} catch {

case t: Throwable => error("Failed to send messages", t)

}

failedProduceRequests

case None => // all produce requests failed

messages

}

}

說(shuō)明:

這個(gè)方法主要說(shuō)了二個(gè)重要信息,一個(gè)是partitionAndCollate,這個(gè)方法主要獲取topic、partition和broker的,這個(gè)方法很重要,下面會(huì)進(jìn)行分析。另一個(gè)重要的方法是groupMessageToSet是要對(duì)所發(fā)送數(shù)據(jù)進(jìn)行壓縮 ?設(shè)置。

在我們了解的partitionAndCollate方法之前先來(lái)了解一下如下類結(jié)構(gòu):

TopicMetadata -->PartitionMetadata

case class PartitionMetadata(partitionId: Int,

val leader: Option[Broker],

replicas: Seq[Broker],

isr: Seq[Broker] = Seq.empty,

errorCode: Short = ErrorMapping.NoError)

也就是說(shuō),Topic元數(shù)據(jù)包括了partition元數(shù)據(jù),partition元數(shù)據(jù)中包括了partitionId,leader(leader partition在哪個(gè)broker中,備份partition在哪些broker中,以及isr有哪些等等。

def partitionAndCollate(messages: Seq[KeyedMessage[K,Message]]): Option[Map[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]] = {

val ret = new HashMap[Int, collection.mutable.Map[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]

try {

for (message <- messages) {

//獲取Topic的partition列表

val topicPartitionsList = getPartitionListForTopic(message)

//根據(jù)hash算法得到消息應(yīng)該發(fā)往哪個(gè)分區(qū)(partition)

val partitionIndex = getPartition(message.topic, message.partitionKey, topicPartitionsList)

val brokerPartition = topicPartitionsList(partitionIndex)

// postpone the failure until the send operation, so that requests for other brokers are handled correctly

val leaderBrokerId = brokerPartition.leaderBrokerIdOpt.getOrElse(-1)

var dataPerBroker: HashMap[TopicAndPartition, Seq[KeyedMessage[K,Message]]] = null

ret.get(leaderBrokerId) match {

case Some(element) =>

dataPerBroker = element.asInstanceOf[HashMap[TopicAndPartition, Seq[KeyedMessage[K,Message]]]]

case None =>

dataPerBroker = new HashMap[TopicAndPartition, Seq[KeyedMessage[K,Message]]]

ret.put(leaderBrokerId, dataPerBroker)

}

val topicAndPartition = TopicAndPartition(message.topic, brokerPartition.partitionId)

var dataPerTopicPartition: ArrayBuffer[KeyedMessage[K,Message]] = null

dataPerBroker.get(topicAndPartition) match {

case Some(element) =>

dataPerTopicPartition = element.asInstanceOf[ArrayBuffer[KeyedMessage[K,Message]]]

case None =>

dataPerTopicPartition = new ArrayBuffer[KeyedMessage[K,Message]]

dataPerBroker.put(topicAndPartition, dataPerTopicPartition)

}

dataPerTopicPartition.append(message)

}

Some(ret)

}catch {? ? // Swallow recoverable exceptions and return None so that they can be retried.

case ute: UnknownTopicOrPartitionException => warn("Failed to collate messages by topic,partition due to: " + ute.getMessage); None

case lnae: LeaderNotAvailableException => warn("Failed to collate messages by topic,partition due to: " + lnae.getMessage); None

case oe: Throwable => error("Failed to collate messages by topic, partition due to: " + oe.getMessage); None

}

}

說(shuō)明:

調(diào)用partitionAndCollate根據(jù)topics的messages進(jìn)行分組操作,messages分配給dataPerBroker(多個(gè)不同的Broker的Map),根據(jù)不同Broker調(diào)用不同的SyncProducer.send批量發(fā)送消息數(shù)據(jù),SyncProducer包裝了nio網(wǎng)絡(luò)操作信息。

partitionAndCollate這個(gè)方法的主要作用是:獲取所有partitions的leader所在leaderBrokerId(就是在該partiionid的leader分布在哪個(gè)broker上),創(chuàng)建一個(gè)HashMap>>>,把messages按照brokerId分組組裝數(shù)據(jù),然后為SyncProducer分別發(fā)送消息作準(zhǔn)備工作。

我們進(jìn)入getPartitionListForTopic這個(gè)方法看一下,這個(gè)方法主要是干什么的。

private def getPartitionListForTopic(m: KeyedMessage[K,Message]): Seq[PartitionAndLeader] = {

val topicPartitionsList = brokerPartitionInfo.getBrokerPartitionInfo(m.topic, correlationId.getAndIncrement)

debug("Broker partitions registered for topic: %s are %s"

.format(m.topic, topicPartitionsList.map(p => p.partitionId).mkString(",")))

val totalNumPartitions = topicPartitionsList.length

if(totalNumPartitions == 0)

throw new NoBrokersForPartitionException("Partition key = " + m.key)

topicPartitionsList

}

說(shuō)明:這個(gè)方法看上去沒(méi)什么,主要是getBrokerPartitionInfo這個(gè)方法,其中KeyedMessage這個(gè)就是我們要發(fā)送的消息,返回值是Seq[PartitionAndLeader]。

def getBrokerPartitionInfo(topic: String, correlationId: Int): Seq[PartitionAndLeader] = {

debug("Getting broker partition info for topic %s".format(topic))

// check if the cache has metadata for this topic

val topicMetadata = topicPartitionInfo.get(topic)

val metadata: TopicMetadata =

topicMetadata match {

case Some(m) => m

case None =>

// refresh the topic metadata cache

updateInfo(Set(topic), correlationId)

val topicMetadata = topicPartitionInfo.get(topic)

topicMetadata match {

case Some(m) => m

case None => throw new KafkaException("Failed to fetch topic metadata for topic: " + topic)

}

}

val partitionMetadata = metadata.partitionsMetadata

if(partitionMetadata.size == 0) {

if(metadata.errorCode != ErrorMapping.NoError) {

throw new KafkaException(ErrorMapping.exceptionFor(metadata.errorCode))

} else {

throw new KafkaException("Topic metadata %s has empty partition metadata and no error code".format(metadata))

}

}

partitionMetadata.map { m =>

m.leader match {

case Some(leader) =>

debug("Partition [%s,%d] has leader %d".format(topic, m.partitionId, leader.id))

new PartitionAndLeader(topic, m.partitionId, Some(leader.id))

case None =>

debug("Partition [%s,%d] does not have a leader yet".format(topic, m.partitionId))

new PartitionAndLeader(topic, m.partitionId, None)

}

}.sortWith((s, t) => s.partitionId < t.partitionId)

}

說(shuō)明:

這個(gè)方法很重要,首先看一下topicPartitionInfo這個(gè)對(duì)象,這個(gè)一個(gè)HashMap結(jié)構(gòu):HashMap[String, TopicMetadata] key是topic名稱,value是topic元數(shù)據(jù)。

通過(guò)這個(gè)hash結(jié)構(gòu)獲取topic元數(shù)據(jù),做match匹配,如果有數(shù)據(jù)(Some(m))則賦值給metadata,如果沒(méi)有,也就是None的時(shí)候,則通過(guò)nio遠(yuǎn)程連到服務(wù)端更新topic信息。

請(qǐng)看如下流程圖:

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • Spark Streaming在企業(yè)級(jí)使用中,一般會(huì)使用no receiver的方式讀取數(shù)據(jù),對(duì)應(yīng)kafka中的D...
    海納百川_spark閱讀 1,537評(píng)論 3 3
  • Spring Cloud為開(kāi)發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見(jiàn)模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,534評(píng)論 19 139
  • kafka的定義:是一個(gè)分布式消息系統(tǒng),由LinkedIn使用Scala編寫(xiě),用作LinkedIn的活動(dòng)流(Act...
    時(shí)待吾閱讀 5,537評(píng)論 1 15
  • Kafka入門(mén)經(jīng)典教程-Kafka-about云開(kāi)發(fā) http://www.aboutyun.com/threa...
    葡萄喃喃囈語(yǔ)閱讀 10,981評(píng)論 4 54
  • 本文轉(zhuǎn)載自http://dataunion.org/?p=9307 背景介紹Kafka簡(jiǎn)介Kafka是一種分布式的...
    Bottle丶Fish閱讀 5,583評(píng)論 0 34

友情鏈接更多精彩內(nèi)容