Spark任務(wù)偶現(xiàn)Task卡住很長時間導(dǎo)致Stage整體耗時長

問題現(xiàn)象

提交大量Spark任務(wù),概率性出現(xiàn)個別Task卡住一段時間,進而導(dǎo)致Stage整體耗時開銷異常。

可能原因

NodeManager FullGC

問題分析

采樣Job836

異常Stage2249 -> 卡住Task8:

對應(yīng)Executor日志:

...
INFO | [Executor task launch worker-78] | Running task 8.0 in stage 2249.0 (TID 222920) | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
ERROR | [shuffle-client-1] | Connection is dead; please adjust spark.network.timeout if this is wrong | org.apache.spark.network.server.TransportChannelHandler.userEventTriggered(TransportChannelHandler.java:128)
ERROR | [shuffle-client-1] | Still have 2 requests outstanding when connection form /10.12.122.244:27337 us closed | org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:102)
INFO | [shuffle-client-1] | Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms | org.apache.spark.network.shuffle.RetryingBlockFetcher.initiateRetry(RetryingBlockFetcher.java:163)
ERROR | [shuffle-client-1] | Failed while starting block fetches | org.apache.spark.network.shuffle.OneForOneBlockFetcher$1.onFailure(OneForOneBlockFetcher.java:151)
java.io.IOException: Connection from /10.12.122.244:27337 closed
    at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:104)
    at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94)
    ...
INFO | [shuffle-client-1] | Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms | org.apache.spark.network.shuffle.RetryingBlockFetcher.initiateRetry(RetryingBlockFetcher.java:163)
...

查看主機10.12.122.244的端口27337發(fā)現(xiàn)為NodeManager。查看其內(nèi)存開銷發(fā)現(xiàn)其內(nèi)存已經(jīng)用盡,進一步查看其GC日志,發(fā)現(xiàn)NodeManager存在頻繁的長時間Full GC,進而導(dǎo)致其在GC階段長時間無法響應(yīng)Executor的請求,進而導(dǎo)致Executor卡住。

問題解決方案

調(diào)整NodeManager堆內(nèi)存,適應(yīng)業(yè)務(wù)場景開銷。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者。

友情鏈接更多精彩內(nèi)容