1、運行環(huán)境:
CentOS6.8(Linux hadoop101 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux),java version "1.8.0_144",Hadoop 2.7.2;
2、問題描述與分析過程:
在hadoop的偽分布模式,在Yarn上面執(zhí)行MapReduce(wordcount)的時候報如下錯誤:
18/01/31 23:48:30 INFO mapreduce.Job: Job job_1517413416343_0001 running in uber mode : false
18/01/31 23:48:30 INFO mapreduce.Job:? map 0% reduce 0%
18/01/31 23:48:35 INFO mapreduce.Job: Task Id : attempt_1517413416343_0001_m_000000_0, Status : FAILED
Exception from container-launch.
Container id: container_1517413416343_0001_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
18/01/31 23:48:41 INFO mapreduce.Job: Task Id : attempt_1517413416343_0001_m_000000_1, Status : FAILED
Exception from container-launch.
Container id: container_1517413416343_0001_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
18/01/31 23:48:47 INFO mapreduce.Job: Task Id : attempt_1517413416343_0001_m_000000_2, Status : FAILED
Exception from container-launch.
Container id: container_1517413416343_0001_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
初步判斷是container容器創(chuàng)建出錯了,從網(wǎng)上多方查詢,非正確答案,都不能解決問題,后面查看container容器日志的時候發(fā)現(xiàn)如下問題描述:
[lyh@hadoop101 userlogs]$ more application_1517413416343_0001/container_1517413416343_0001_01_000003/stderr
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f4d80000, 105381888, 0) failed; error='無法分配內(nèi)存' (errno=12)
[lyh@hadoop101 userlogs]$ more application_1517413416343_0001/container_1517413416343_0001_01_000003/stdout
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 105381888 bytes for committing reserved memory.
很明顯了,顯示JVM內(nèi)存分配出錯了,于是就想到執(zhí)行free命令查看一下內(nèi)存使用情況,在執(zhí)行free -h和df -h,顯示如下結(jié)果:
[AAA@BBB hsperfdata_AAA]$ free -h
? ? ? ? ? ? total? ? ? used? ? ? free? ? shared? ? buffers? ? cached
Mem:? ? ? ? ? 1.9G? ? ? 1.4G? ? ? 541M? ? ? 256K? ? ? ? 30M? ? ? ? 63M
-/+ buffers/cache:? ? ? 1.3G? ? ? 636M
Swap:? ? ? ? ? 0B? ? ? ? 0B? ? ? ? 0B
[AAA@BBB hsperfdata_AAA]$ df -h
Filesystem? ? ? Size? Used Avail Use% Mounted on
/dev/sda2? ? ? ? 15G? 3.4G? 11G? 25% /
tmpfs? ? ? ? ? 996M? ? 0? 996M? 0% /dev/shm
/dev/sda1? ? ? 190M? 39M? 142M? 22% /boot
/dev/sda3? ? ? 2.0G? 3.0M? 1.9G? 1% /swap
通過以上結(jié)果發(fā)現(xiàn),實際內(nèi)存才使用了1.4g,還有500多m呢,就比較納悶為什么還出現(xiàn)內(nèi)存的問題,最后將問題信息發(fā)送到stackoverflow上面,說是因為swap分區(qū)的問題;回頭通過free和df再次查看,發(fā)現(xiàn)swap分區(qū)使用0和0,通過cat查看/etc/fstab文件發(fā)現(xiàn)/dev/sda3是ext4類型的,不是swap類型的,相當(dāng)于swap分區(qū)沒有啟用,至于為什么swap分區(qū)沒有啟用就會出現(xiàn)內(nèi)存的問題,怎么就不夠用還需要進(jìn)一步的研究。
3、解決辦法與步驟:
思路:先將/dev/sda3分區(qū)轉(zhuǎn)換成swap類型,在將/dev/sda3啟用swap分區(qū),重啟就解決問題,整個過程需要root權(quán)限。
步驟:
3.1、將/dev/sda3分區(qū)轉(zhuǎn)換成swap類型;
fdisk /dev/sda? 選擇 p t 3 L 82 w完成轉(zhuǎn)換
3.2、將/dev/sda3啟用swap分區(qū)
由于/dev/sda3是裝系統(tǒng)的時候安裝的,已經(jīng)掛載在了系統(tǒng)上,需要先卸載后啟用;
卸載:umount /dev/sda3?
制作swap區(qū):mkswap /dev/sda3
啟用swap分區(qū):swapon /dev/sda3
編輯/etc/fstab文件:將原來的/sda3分區(qū)的uuid行注釋掉,在最后一行添加如下信息后保存退出(此步驟很重要,否則重啟報錯)
/dev/sda3?? ? swap ? ? ? ? ? ? ? ? ? ?swap ? ?defaults ? ? ? ?0 0
3.3、重啟電腦,執(zhí)行free和df命令,swap分區(qū)有了。
3.4、再次運行hadoop,完美解決問題。
4、補充說明:
本次swap分區(qū)沒有啟用,是因為在Linux系統(tǒng)安裝的時候配置/swap分區(qū)的時候沒有將文件系統(tǒng)類型改為swap,使用了默認(rèn)的ext4。當(dāng)然也可以通過創(chuàng)建swapfile文件添加swap分區(qū)解決問題。
5、部分截圖:




