問題描述
今天發(fā)現一個日常任務突然失敗,失敗的原因也不明顯,因為這個hive sql會生成多個mapreduce任務,前面的mapreduce任務都成功了,到最后一個MR任務的時候,在任務啟動之前就失敗了,因此不能通過mapreduce任務監(jiān)控頁面來查找問題,后面通過hive的session日志,發(fā)現了如下錯誤信息:
submitJob failed java.io.IOException: Max block location exceeded for split:
StorageEngineClient.CombineFormatStorageFileInputFormat:Paths:
/user/.../default/attempt_1542378275159_154173895_r_001706_0.1552539467243:0+37738
/user/.../default/attempt_1542378275159_154173895_r_001845_0.1552539513594:0+38046
/user/.../default/attempt_1542378275159_154173895_r_000491_0.1552539063059:0+38698
...(此處省略幾萬行)
splitsize: 108807 maxsize: 100000
at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:610)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:568)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:417)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1279)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1276)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1276)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1743)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:1241)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:75)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:64)
問題分析
通過分析上面的錯誤信息,由
submitJob failed java.io.IOException: Max block location exceeded for split
可以知道,任務失敗是由IO異常引起,并且跟split有關
StorageEngineClient.CombineFormatStorageFileInputFormat:Paths:
/user/.../default/attempt_1542378275159_154173895_r_001706_0.1552539467243:0+37738
/user/.../default/attempt_1542378275159_154173895_r_001845_0.1552539513594:0+38046
/user/.../default/attempt_1542378275159_154173895_r_000491_0.1552539063059:0+38698
上面提示可以知道,錯誤發(fā)生在hive sql的多個mapreduce任務之間,因為這個sql總共會執(zhí)行3個MR任務,而在執(zhí)行完第2個MR任務的時候,sql就結束執(zhí)行了,另外也可以通過上面打印的路徑可以知道,因為hive sql的中間數據會存放在hive表的路徑的default文件夾中
splitsize: 108807 maxsize: 100000
由上面信息可以知道,是由于中間數據產生了大量的文件,導致split超過了maxsize,最終導致了IOException
總結一下就是:hive任務中間數據產生大量小文件,導致split超過了maxsize,引起了任務失敗。
問題處理
既然失敗原因弄清楚了,那就解決問題了,通過源碼可以找到
String[] locations = split.getLocations();
if (locations.length > maxBlockLocations) {
LOG.warn("Max block location exceeded for split: "
+ split + " splitsize: " + locations.length +
" maxsize: " + maxBlockLocations);
locations = Arrays.copyOf(locations, maxBlockLocations);
}
顯然這里的maxBlockLocations就是100000,而maxBlockLocations又是由這個決定
int maxBlockLocations = conf.getInt(MRConfig.MAX_BLOCK_LOCATIONS_KEY,
MRConfig.MAX_BLOCK_LOCATIONS_DEFAULT);
而這個值由參數mapreduce.job.max.split.locations決定,參數未配置默認為10,最終解決方案如下:
set mapreduce.job.max.split.locations=200000;
調大maxBlockLocations的值,最終解決問題
最后hive sql轉換成的3個MR任務都成功運行。
問題解讀
問題是解決了,但是卻讓我產生了疑問:為什么hadoop要設置maxBlockLocations呢?
通過上網查找到:
what's the recommended value of mapreduce.job.max.split.locations ?
這里有一個比較好的回答:
This configuration is involved since MR v1. It serves as an up limit for DN locations of job split which intend to protect the JobTracker from overloaded by jobs with huge numbers of split locations. For YARN in Hadoop 2, this concern is lessened as we have per job AM instead of JT. However, it will still impact RM as RM will potentially see heavy request from the AM which tries to obtain many localities for the split. With hitting this limit, it will truncate location number to given limit with sacrifice a bit data locality but get rid of the risk to hit bottleneck of RM.
Depends on your job's priority (I believer it is a per job configuration now), you can leave it as a default (for lower or normal priority job) or increase to a larger number. Increase this value to larger than DN number will be the same impact as set it to DN's number.
同時也可以通過hadoop的Issues列表MAPREDUCE-5186中的回答可以知道設置這個參數的大致原因