聊聊kafka consumer offset lag increase異常

本文主要解析一下遇到的一個(gè)kafka consumer offset lag不斷增大的異常。

查看consumer消費(fèi)情況

Group       Topic        Pid    Offset     logSize         Lag             Owner
demo-group demo-topic     0    9678273         9858394         180121          xxx-service-dpqpc-1510557406684-e2171bd6-0
demo-group demo-topic     1    9689443         9873522         184079          xxx-service-dpqpc-1510557406684-e2171bd6-1
demo-group demo-topic     2    9676875         9855874         178999          xxx-service-q7vch-1510557399475-b1d7d22c-0
demo-group demo-topic     3    9683393         9864518         181125          xxx-service-q7vch-1510557399475-b1d7d22c-1

發(fā)現(xiàn)消費(fèi)者的offset與logSize差距太大,lag值都過(guò)10w了。

正常的情況

Group           Topic         Pid   Offset          logSize         Lag             Owner
demo-group      demo-topic    0     9860587         9860587         0               demo-group_tomcat2-1512984437115-fc1ee57b-0
demo-group      demo-topic    1     9875814         9875814         0               demo-group_tomcat2-1512984437115-fc1ee57b-0
demo-group      demo-topic    2     9858213         9858214         1               demo-group_tomcat2-1512984437115-fc1ee57b-1
demo-group      demo-topic    3     9866744         9866744         0               demo-group_tomcat2-1512984437115-fc1ee57b-2

像這種lag差距比較少的,是正常的。

查看topic的partition

    Topic:demo-topic    PartitionCount:4    ReplicationFactor:2 Configs:
    Topic: demo-topic   Partition: 0    Leader: 3   Replicas: 3,4   Isr: 4,3
    Topic: demo-topic   Partition: 1    Leader: 4   Replicas: 4,1   Isr: 1,4
    Topic: demo-topic   Partition: 2    Leader: 1   Replicas: 1,2   Isr: 1,2
    Topic: demo-topic   Partition: 3    Leader: 2   Replicas: 2,3   Isr: 2,3

topic是4個(gè)分區(qū),因此4個(gè)consumer來(lái)消費(fèi)是正常的。
問(wèn)題可能是消費(fèi)者消費(fèi)速度太慢,或者消費(fèi)者消費(fèi)異常。

排查

jstack -l pid

2017-12-27 04:06:23
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.66-b17 mixed mode):

"Attach Listener" #12286 daemon prio=9 os_prio=0 tid=0x00007f2920001000 nid=0x3087 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"ConsumerFetcherThread-xxx-service-dpqpc-1510557406684-e2171bd6-0-3" #9263 prio=5 os_prio=0 tid=0x00007f287400d800 nid=0x2440 waiting on condition [0x00007f285e6eb000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007048874b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
    at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
    at kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.utils.Utils$.inLock(Utils.scala:535)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"ConsumerFetcherThread-xxx-service-dpqpc-1510557406684-e2171bd6-0-4" #9262 prio=5 os_prio=0 tid=0x00007f28740c2800 nid=0x243f waiting on condition [0x00007f291950d000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007048086d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
    at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
    at kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.utils.Utils$.inLock(Utils.scala:535)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"xxx-service-dpqpc-1510557406684-e2171bd6-leader-finder-thread" #9261 prio=5 os_prio=0 tid=0x0000000002302800 nid=0x243e waiting on condition [0x00007f28bd1df000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000703d06518> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:61)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"consume-2" #62 prio=5 os_prio=0 tid=0x00007f28f8e86000 nid=0x51 waiting on condition [0x00007f28bd3e1000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000070440cd38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"consume-1" #61 prio=5 os_prio=0 tid=0x00007f28f8e84800 nid=0x50 waiting on condition [0x00007f28bd4e2000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000070440cd38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"xxx-service-dpqpc-1510557406684-e2171bd6_watcher_executor" #59 prio=5 os_prio=0 tid=0x00007f28fb685800 nid=0x4e waiting on condition [0x00007f28bd8e4000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007048878d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
    at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:544)

上面consume-1以及consume-2是具體的消費(fèi)kafka的業(yè)務(wù)線(xiàn)程

error log

2017-12-16 12:53:34.257  INFO 7 --- [0.2.84.118:2181] k.consumer.ZookeeperConsumerConnector    : [xxx-service-q7vch-1510557399475-b1d7d22c], begin rebalancing consumer xxx-service-q7vch-1510557399475-b1d7d22c try #1
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] kafka.consumer.ConsumerFetcherManager    : [ConsumerFetcherManager-1510557399586] Stopping leader finder thread
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] kafka.consumer.ConsumerFetcherManager    : [ConsumerFetcherManager-1510557399586] Stopping all fetchers
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] kafka.consumer.ConsumerFetcherManager    : [ConsumerFetcherManager-1510557399586] All connections stopped
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] k.consumer.ZookeeperConsumerConnector    : [xxx-service-q7vch-1510557399475-b1d7d22c], Cleared all relevant queues for this fetcher
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] k.consumer.ZookeeperConsumerConnector    : [xxx-service-q7vch-1510557399475-b1d7d22c], Cleared the data chunks in all the consumer message iterators
2017-12-16 12:53:34.348  INFO 7 --- [0.2.84.118:2181] k.consumer.ZookeeperConsumerConnector    : [xxx-service-q7vch-1510557399475-b1d7d22c], Committing all offsets after clearing the fetcher queues

日志幾乎沒(méi)有消費(fèi)到消息的痕跡,但是lag確又有那么多。

一開(kāi)始看異常日志,找到這個(gè),加上上面的jstack,看到ConsumerFetcherThread一直blocking在PartitionTopicInfo.enqueue,有點(diǎn)懷疑是rebalance引起的死鎖或阻塞。之前jstack忘記加-l,無(wú)法看到死鎖信息。網(wǎng)上查了一下,看到ConsumerFetcherThread deadlock?有提到類(lèi)似的問(wèn)題,不過(guò)看是14年的帖子的,kafka0.8.2.2版本應(yīng)該是有修復(fù)了才對(duì)。緊接著看到

The fetchers are blocked on the queue since it is full, is your consumer

iterator stopped and hence not getting more data from it?

有點(diǎn)開(kāi)始懷疑是否是自己的業(yè)務(wù)線(xiàn)程沒(méi)有捕獲異常掛了,因而就沒(méi)有消費(fèi)了。重啟了下程序,看log,刷刷的消費(fèi)消息。再jstack對(duì)比一下

"ConsumerFetcherThread-xxx-376jt-1514353818187-b37be1c0-0-3" #81 prio=5 os_prio=0 tid=0x00007fe39c004000 nid=0x63 waiting on condition [0x00007fe3931f4000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007822ac4e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
    at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
    at kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.utils.Utils$.inLock(Utils.scala:535)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"ConsumerFetcherThread-xxx-376jt-1514353818187-b37be1c0-0-4" #80 prio=5 os_prio=0 tid=0x00007fe39c003000 nid=0x62 waiting on condition [0x00007fe3926ea000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007821c9a68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
    at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:60)
    at kafka.consumer.ConsumerFetcherThread.processPartitionData(ConsumerFetcherThread.scala:49)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:128)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1$$anonfun$apply$mcV$sp$2.apply(AbstractFetcherThread.scala:109)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply$mcV$sp(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$1.apply(AbstractFetcherThread.scala:109)
    at kafka.utils.Utils$.inLock(Utils.scala:535)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:108)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:86)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"xxx-376jt-1514353818187-b37be1c0-leader-finder-thread" #79 prio=5 os_prio=0 tid=0x0000000001f5a000 nid=0x61 waiting on condition [0x00007fe3920e7000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000782154c30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:61)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

"consume-2" #62 prio=5 os_prio=0 tid=0x00007fe48da13800 nid=0x51 runnable [0x00007fe392ff1000]
   java.lang.Thread.State: RUNNABLE
    //......
    at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:1948)
    at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1768)
    at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:1751)
    at org.springframework.data.mongodb.core.MongoTemplate.find(MongoTemplate.java:625)
    at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:590)
    at org.springframework.data.mongodb.core.MongoTemplate.findOne(MongoTemplate.java:582)
    at com.xxx.consumer.KafkaStreamProcessor.process(KafkaStreamProcessor.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:115)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"consume-1" #61 prio=5 os_prio=0 tid=0x00007fe48e310000 nid=0x50 runnable [0x00007fe3930f2000]
   java.lang.Thread.State: RUNNABLE
    //...
    at org.springframework.data.mongodb.core.MongoTemplate$12.doInCollection(MongoTemplate.java:1157)
    at org.springframework.data.mongodb.core.MongoTemplate$12.doInCollection(MongoTemplate.java:1137)
    at org.springframework.data.mongodb.core.MongoTemplate.execute(MongoTemplate.java:463)
    at org.springframework.data.mongodb.core.MongoTemplate.doUpdate(MongoTemplate.java:1137)
    at org.springframework.data.mongodb.core.MongoTemplate.upsert(MongoTemplate.java:1099)
    at com.xxx.consumer.KafkaStreamProcessor.process(KafkaStreamProcessor.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:115)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"xxx-376jt-1514353818187-b37be1c0_watcher_executor" #59 prio=5 os_prio=0 tid=0x00007fe48fe7c000 nid=0x4e waiting on condition [0x00007fe3934f5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000782155248> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
    at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:544)

對(duì)比一下,發(fā)現(xiàn)原來(lái)懷疑的ConsumerFetcherThread一直blocking在PartitionTopicInfo.enqueue在重啟之后還是存在,因此可能是正常的。

在對(duì)比下consume-1與consume-2,發(fā)現(xiàn)了問(wèn)題,有問(wèn)題的線(xiàn)程堆棧沒(méi)有看到自己的業(yè)務(wù)方法,而重啟之后發(fā)現(xiàn)了業(yè)務(wù)方法。因此問(wèn)題的原因漸漸明朗,就是因?yàn)闆](méi)有catch異常導(dǎo)致。

業(yè)務(wù)方法

原來(lái)的業(yè)務(wù)方法大致如下

@Async
public void process(KafkaStream<byte[], byte[]> stream){
    ConsumerIterator<byte[], byte[]> it = stream.iterator();
    while (it.hasNext()) {
        System.out.println(Thread.currentThread().getName()+":"+new String(it.next().message()));
    }
}

這里有個(gè)疑問(wèn)就是線(xiàn)程異常沒(méi)有catch的話(huà),理論上再次new的線(xiàn)程,id應(yīng)該遞增才對(duì),但是通過(guò)實(shí)驗(yàn)發(fā)現(xiàn),走async的,拋異常之后,線(xiàn)程id都不變。

spring-core-4.3.13.RELEASE-sources.jar!/org/springframework/util/CustomizableThreadCreator.java

public class CustomizableThreadCreator implements Serializable {
    private final AtomicInteger threadCount = new AtomicInteger(0);
    /**
     * Template method for the creation of a new {@link Thread}.
     * <p>The default implementation creates a new Thread for the given
     * {@link Runnable}, applying an appropriate thread name.
     * @param runnable the Runnable to execute
     * @see #nextThreadName()
     */
    public Thread createThread(Runnable runnable) {
        Thread thread = new Thread(getThreadGroup(), runnable, nextThreadName());
        thread.setPriority(getThreadPriority());
        thread.setDaemon(isDaemon());
        return thread;
    }

    /**
     * Return the thread name to use for a newly created {@link Thread}.
     * <p>The default implementation returns the specified thread name prefix
     * with an increasing thread count appended: e.g. "SimpleAsyncTaskExecutor-0".
     * @see #getThreadNamePrefix()
     */
    protected String nextThreadName() {
        return getThreadNamePrefix() + this.threadCount.incrementAndGet();
    }
    //...
}

這里的threadCount沒(méi)有看到調(diào)用decrement方法,因此如果線(xiàn)程異常掛掉,則理論上新補(bǔ)充的線(xiàn)程id應(yīng)該是遞增的。

/Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/src.zip!/java/util/concurrent/ThreadPoolExecutor.java

final void runWorker(Worker w) {
        Thread wt = Thread.currentThread();
        Runnable task = w.firstTask;
        w.firstTask = null;
        w.unlock(); // allow interrupts
        boolean completedAbruptly = true;
        try {
            while (task != null || (task = getTask()) != null) {
                w.lock();
                // If pool is stopping, ensure thread is interrupted;
                // if not, ensure thread is not interrupted.  This
                // requires a recheck in second case to deal with
                // shutdownNow race while clearing interrupt
                if ((runStateAtLeast(ctl.get(), STOP) ||
                     (Thread.interrupted() &&
                      runStateAtLeast(ctl.get(), STOP))) &&
                    !wt.isInterrupted())
                    wt.interrupt();
                try {
                    beforeExecute(wt, task);
                    Throwable thrown = null;
                    try {
                        task.run();
                    } catch (RuntimeException x) {
                        thrown = x; throw x;
                    } catch (Error x) {
                        thrown = x; throw x;
                    } catch (Throwable x) {
                        thrown = x; throw new Error(x);
                    } finally {
                        afterExecute(task, thrown);
                    }
                } finally {
                    task = null;
                    w.completedTasks++;
                    w.unlock();
                }
            }
            completedAbruptly = false;
        } finally {
            processWorkerExit(w, completedAbruptly);
        }
    }
    
private void processWorkerExit(Worker w, boolean completedAbruptly) {
        if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
            decrementWorkerCount();

        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            completedTaskCount += w.completedTasks;
            workers.remove(w);
        } finally {
            mainLock.unlock();
        }

        tryTerminate();

        int c = ctl.get();
        if (runStateLessThan(c, STOP)) {
            if (!completedAbruptly) {
                int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
                if (min == 0 && ! workQueue.isEmpty())
                    min = 1;
                if (workerCountOf(c) >= min)
                    return; // replacement not needed
            }
            addWorker(null, false);
        }
    }    

調(diào)試發(fā)現(xiàn)completedAbruptly都是false,因而業(yè)務(wù)線(xiàn)程沒(méi)有拋異常,這豈不是矛盾了。突然想起async注解的攔截,漸漸豁然開(kāi)朗。

AsyncExecutionInterceptor

spring-aop-4.3.13.RELEASE-sources.jar!/org/springframework/aop/interceptor/AsyncExecutionInterceptor.java

@Override
    public Object invoke(final MethodInvocation invocation) throws Throwable {
        Class<?> targetClass = (invocation.getThis() != null ? AopUtils.getTargetClass(invocation.getThis()) : null);
        Method specificMethod = ClassUtils.getMostSpecificMethod(invocation.getMethod(), targetClass);
        final Method userDeclaredMethod = BridgeMethodResolver.findBridgedMethod(specificMethod);

        AsyncTaskExecutor executor = determineAsyncExecutor(userDeclaredMethod);
        if (executor == null) {
            throw new IllegalStateException(
                    "No executor specified and no default executor set on AsyncExecutionInterceptor either");
        }

        Callable<Object> task = new Callable<Object>() {
            @Override
            public Object call() throws Exception {
                try {
                    Object result = invocation.proceed();
                    if (result instanceof Future) {
                        return ((Future<?>) result).get();
                    }
                }
                catch (ExecutionException ex) {
                    handleError(ex.getCause(), userDeclaredMethod, invocation.getArguments());
                }
                catch (Throwable ex) {
                    handleError(ex, userDeclaredMethod, invocation.getArguments());
                }
                return null;
            }
        };

        return doSubmit(task, executor, invocation.getMethod().getReturnType());
    }

async注解通過(guò)AsyncExecutionInterceptor攔截,然后包了一層,處理了異常,因此線(xiàn)程池里頭是沒(méi)有異常的。

小結(jié)

  • 使用kafka消費(fèi)數(shù)據(jù)的時(shí)候,需要對(duì)offset的lag值進(jìn)行實(shí)時(shí)監(jiān)控,以確認(rèn)消費(fèi)速度是否ok
  • 調(diào)用KafkaStream的iterator消費(fèi)線(xiàn)程必須catch住異常,否則拋出了異常,就停止消費(fèi)了。

doc

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀(guān)點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 姓名:周小蓬 16019110037 轉(zhuǎn)載自:http://blog.csdn.net/YChenFeng/art...
    aeytifiw閱讀 34,897評(píng)論 13 425
  • kafka的定義:是一個(gè)分布式消息系統(tǒng),由LinkedIn使用Scala編寫(xiě),用作LinkedIn的活動(dòng)流(Act...
    時(shí)待吾閱讀 5,538評(píng)論 1 15
  • Spring Cloud為開(kāi)發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見(jiàn)模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,544評(píng)論 19 139
  • Kafka入門(mén)經(jīng)典教程-Kafka-about云開(kāi)發(fā) http://www.aboutyun.com/threa...
    葡萄喃喃囈語(yǔ)閱讀 10,981評(píng)論 4 54
  • 本文轉(zhuǎn)載自http://dataunion.org/?p=9307 背景介紹Kafka簡(jiǎn)介Kafka是一種分布式的...
    Bottle丶Fish閱讀 5,584評(píng)論 0 34

友情鏈接更多精彩內(nèi)容