一、文件ruozedata.md
上傳:
-bash-4.2$ hdfs dfs -mkdir /blockrecover
-bash-4.2$ echo "www.ruozedata.com" > ruozedata.md
-bash-4.2$ hdfs dfs -put ruozedata.md /blockrecover
-bash-4.2$ hdfs dfs -ls /blockrecover
Found 1 items
-rw-r--r-- 3 hdfs supergroup 18 2019-03-03 14:42 /blockrecover/ruozedata.md
-bash-4.2$
校驗: 健康狀態(tài)
-bash-4.2$ hdfs fsck /
Connecting to namenode via http://yws76:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /192.168.0.76 for path / at Sun Mar 03 14:44:44 CST 2019
...............................................................................Status: HEALTHY
Total size: 50194618424 B
Total dirs: 354
Total files: 1079
Total symlinks: 0
Total blocks (validated): 992 (avg. block size 50599413 B)
Minimally replicated blocks: 992 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Sun Mar 03 14:44:45 CST 2019 in 76 milliseconds
The filesystem under path '/' is HEALTHY
-bash-4.2$
二、直接DN節(jié)點上刪除文件一個block的一個副本(3副本)
刪除塊和meta文件:
[root@yws87 subdir135]# rm -rf blk_1075808214 blk_1075808214_2068515.meta
直接重啟HDFS,直接模擬損壞效果,然后fsck檢查:
-bash-4.2$ hdfs fsck /
Connecting to namenode via http://yws77:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /192.168.0.76 for path / at Sun Mar 03 16:02:04 CST 2019
.
/blockrecover/ruozedata.md: Under replicated BP-1513979236-192.168.0.76-1514982530341:blk_1075808214_2068515. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
...............................................................................Status: HEALTHY
Total size: 50194618424 B
Total dirs: 354
Total files: 1079
Total symlinks: 0
Total blocks (validated): 992 (avg. block size 50599413 B)
Minimally replicated blocks: 992 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (0.10080645 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.998992
Corrupt blocks: 0
Missing replicas: 1 (0.033602152 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Sun Mar 03 16:02:04 CST 2019 in 148 milliseconds
The filesystem under path '/' is HEALTHY
-bash-4.2$
三、手動修復(fù)hdfs debug
-bash-4.2$ hdfs |grep debug
沒有輸出debug參數(shù)的任何信息結(jié)果!
故hdfs命令幫助是沒有debug的,但是確實有hdfs debug這個組合命令,切記。
修復(fù)命令:
-bash-4.2$ hdfs debug recoverLease -path /blockrecover/ruozedata.md -retries 10
recoverLease SUCCEEDED on /blockrecover/ruozedata.md
-bash-4.2$
直接DN節(jié)點查看,block文件和meta文件恢復(fù):
[root@yws87 subdir135]# ll
total 8
-rw-r--r-- 1 hdfs hdfs 56 Mar 3 14:28 blk_1075808202
-rw-r--r-- 1 hdfs hdfs 11 Mar 3 14:28 blk_1075808202_2068503.meta
[root@yws87 subdir135]# ll
total 24
-rw-r--r-- 1 hdfs hdfs 56 Mar 3 14:28 blk_1075808202
-rw-r--r-- 1 hdfs hdfs 11 Mar 3 14:28 blk_1075808202_2068503.meta
-rw-r--r-- 1 hdfs hdfs 18 Mar 3 15:23 blk_1075808214
-rw-r--r-- 1 hdfs hdfs 11 Mar 3 15:23 blk_1075808214_2068515.meta
四、自動修復(fù)
當數(shù)據(jù)塊損壞后,DN節(jié)點執(zhí)行directoryscan操作之前,都不會發(fā)現(xiàn)損壞;
也就是directoryscan操作是間隔6h
dfs.datanode.directoryscan.interval : 21600
在DN向NN進行blockreport前,都不會恢復(fù)數(shù)據(jù)塊;
也就是blockreport操作是間隔6h
dfs.blockreport.intervalMsec : 21600000
當NN收到blockreport才會進行恢復(fù)操作。
具體參考生產(chǎn)上HDFS(CDH5.12.0)對應(yīng)的版本的文檔參數(shù):http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.12.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
總結(jié):
生產(chǎn)上本人一般傾向于使用 手動修復(fù)方式,但是前提要手動刪除損壞的block塊。
切記,是刪除損壞block文件和meta文件,而不是刪除hdfs文件。
當然還可以先把文件get下載,然后hdfs刪除,再對應(yīng)上傳。
切記刪除不要執(zhí)行: hdfs fsck / -delete 這是刪除損壞的文件, 那么數(shù)據(jù)不就丟了嘛;除非無所謂丟數(shù)據(jù),或者有信心從其他地方可以補數(shù)據(jù)到hdfs!
??思考題:
那么如何確定一個文件的損失的塊位置,哪幾種方法呢?
CDH的配置里搜索沒有這兩個參數(shù),怎么調(diào)整生效呢?
塊掃描: https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/