hive mapjoin MapJoinMemoryExhaustionException

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"aid":252511110,"property":"{\"aid\":252511110,\"alvl\":0,\"avn\":0,\"avdn\":0,\"avpn\":0,\"avcn\":0,\"avsn\":0,\"avti\":0,\"avtp\":0}","dt":"20190226"}

? ? ? ? at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)

? ? ? ? at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)

Caused by: org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2019-02-28 04:08:29 Processing rows:? ? ? ? 200000? Hashtable size: 199999? Memory usage:? 7528720336? ? ? percentage:? ? 0.60

? at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)

at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)

at org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.processOp(SparkHashTableSinkOperator.java:79)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)

at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)

at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)

at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)

... 17 more

19/02/28 04:08:36 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown

原因:

MapJoinMemoryExhaustionHandler.java 中有如下代碼

public void checkMemoryStatus(long tableContainerSize, long numRows)

? throws MapJoinMemoryExhaustionException {

? ? long usedMemory = memoryMXBean.getHeapMemoryUsage().getUsed();

? ? double percentage = (double) usedMemory / (double) maxHeapSize;

? ? String msg = Utilities.now() + "\tProcessing rows:\t" + numRows + "\tHashtable size:\t"

? ? ? ? + tableContainerSize + "\tMemory usage:\t" + usedMemory + "\tpercentage:\t" + percentageNumberFormat.format(percentage);

? ? console.printInfo(msg);

? ? if(percentage > maxMemoryUsage) {

? ? ? throw new MapJoinMemoryExhaustionException(msg);

? ? }

? }

最終解決辦法:

hive在 0.11后 hive.auto.convert.join 自動為true,也就是如果滿足條件會自動去做mapjoin, mapjoin的參數(shù)判斷見

https://yq.aliyun.com/articles/64306

參考:

hive-15221: https://issues.apache.org/jira/browse/HIVE-15221?spm=a2c4e.11153940.blogcont64306.15.eeb3541edXrhsd

https://yq.aliyun.com/articles/64306

https://stackoverflow.com/questions/22977790/hive-query-execution-error-return-code-3-from-mapredlocaltask?spm=a2c4e.11153940.blogcont64306.13.eeb3541edXrhsd

https://yq.aliyun.com/articles/476771?spm=a2c4e.11153940.blogcont64306.28.eeb3541edXrhsd

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容