前言
關(guān)于scrub這塊一直想寫一篇文章的,這個在很久前,就做過一次測試,當(dāng)時是看這個scrub到底有多大的影響,當(dāng)時看到的是磁盤讀占很高,啟動deep-scrub后會有大量的讀,前端可能會出現(xiàn) slow request,這個是當(dāng)時測試看到的現(xiàn)象,一個比較簡單的處理辦法就是直接給scrub關(guān)掉了,當(dāng)然關(guān)掉了就無法檢測底層到底有沒有對象不一致的問題
關(guān)于這個scrub生產(chǎn)上是否開啟,仁者見仁,智者見智,就是選擇的問題了,這里不做討論,個人覺得開和關(guān)都有各自的道理,本篇是講述的如果想開啟的情況下如何把scrub給控制住
最近在ceph群里看到一段大致這樣的討論:
scrub是個坑
小文件多的場景一定要把scrub關(guān)掉
單pg的文件量達(dá)到一定規(guī)模,scrub一開就會有slow request
這個問題解決不了
上面的說法有沒有問題呢?在一般情況下來看,確實(shí)如此,但是我們是否能嘗試去解決下這個問題,或者緩解下呢?那么我們就來嘗試下
scrub的一些追蹤
下面的一些追蹤并不涉及代碼,僅僅從配置和日志的觀測來看看scrub到底干了什么
環(huán)境準(zhǔn)備
我的環(huán)境為了便于觀測,配置的是一個pg的存儲池,然后往這個pg里面put了100個對象,然后對這個pg做deep-scrub,deep-scrub比scrub對磁盤的壓力要大些,所以本篇主要是去觀測的deep-scrub
開啟對pg目錄的訪問的監(jiān)控
使用的是inotifywait,我想看下deep-scrub的時候,pg里面的對象到底接收了哪些請求
inotifywait -m 1.0_head
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN a16__head_8FA46F40__1
1.0_head/ ACCESS a16__head_8FA46F40__1
1.0_head/ OPEN a39__head_621FD720__1
1.0_head/ ACCESS a39__head_621FD720__1
1.0_head/ OPEN a30__head_655287E0__1
1.0_head/ ACCESS a30__head_655287E0__1
1.0_head/ OPEN a91__head_B02EE3D0__1
1.0_head/ ACCESS a91__head_B02EE3D0__1
1.0_head/ OPEN a33__head_9E9E3E30__1
1.0_head/ ACCESS a33__head_9E9E3E30__1
1.0_head/ OPEN a92__head_6AFC6B30__1
1.0_head/ ACCESS a92__head_6AFC6B30__1
1.0_head/ OPEN a22__head_AC48AAB0__1
1.0_head/ ACCESS a22__head_AC48AAB0__1
1.0_head/ OPEN a42__head_76B90AC8__1
1.0_head/ ACCESS a42__head_76B90AC8__1
1.0_head/ OPEN a5__head_E5A1A728__1
1.0_head/ ACCESS a5__head_E5A1A728__1
1.0_head/ OPEN a34__head_4D9ABA68__1
1.0_head/ ACCESS a34__head_4D9ABA68__1
1.0_head/ OPEN a69__head_7AF2B6E8__1
1.0_head/ ACCESS a69__head_7AF2B6E8__1
1.0_head/ OPEN a95__head_BD3695B8__1
1.0_head/ ACCESS a95__head_BD3695B8__1
1.0_head/ OPEN a67__head_6BCD37B8__1
1.0_head/ ACCESS a67__head_6BCD37B8__1
1.0_head/ OPEN a10__head_F0F08AF8__1
1.0_head/ ACCESS a10__head_F0F08AF8__1
1.0_head/ OPEN a3__head_88EF0BF8__1
1.0_head/ ACCESS a3__head_88EF0BF8__1
1.0_head/ OPEN a82__head_721BC094__1
1.0_head/ ACCESS a82__head_721BC094__1
1.0_head/ OPEN a48__head_27A729D4__1
1.0_head/ ACCESS a48__head_27A729D4__1
1.0_head/ OPEN a36__head_F63E6AF4__1
1.0_head/ ACCESS a36__head_F63E6AF4__1
1.0_head/ OPEN a29__head_F06D540C__1
1.0_head/ ACCESS a29__head_F06D540C__1
1.0_head/ OPEN a31__head_AC83164C__1
1.0_head/ ACCESS a31__head_AC83164C__1
1.0_head/ OPEN a59__head_884F9B6C__1
1.0_head/ ACCESS a59__head_884F9B6C__1
1.0_head/ OPEN a58__head_06954F6C__1
1.0_head/ ACCESS a58__head_06954F6C__1
1.0_head/ OPEN a55__head_2A42E61C__1
1.0_head/ ACCESS a55__head_2A42E61C__1
1.0_head/ OPEN a90__head_1B88FEDC__1
1.0_head/ ACCESS a90__head_1B88FEDC__1
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN a100__head_C29E0C42__1
1.0_head/ ACCESS a100__head_C29E0C42__1
1.0_head/ OPEN a15__head_87123BE2__1
1.0_head/ ACCESS a15__head_87123BE2__1
1.0_head/ OPEN a23__head_AABFFB92__1
1.0_head/ ACCESS a23__head_AABFFB92__1
1.0_head/ OPEN a41__head_4EA9A5D2__1
1.0_head/ ACCESS a41__head_4EA9A5D2__1
1.0_head/ OPEN a85__head_83760D72__1
1.0_head/ ACCESS a85__head_83760D72__1
1.0_head/ OPEN a72__head_8A105D72__1
1.0_head/ ACCESS a72__head_8A105D72__1
1.0_head/ OPEN a60__head_5536480A__1
1.0_head/ ACCESS a60__head_5536480A__1
1.0_head/ OPEN a73__head_F1819D0A__1
1.0_head/ ACCESS a73__head_F1819D0A__1
1.0_head/ OPEN a78__head_6929D12A__1
1.0_head/ ACCESS a78__head_6929D12A__1
1.0_head/ OPEN a57__head_2C43153A__1
1.0_head/ ACCESS a57__head_2C43153A__1
1.0_head/ OPEN a1__head_51903B7A__1
1.0_head/ ACCESS a1__head_51903B7A__1
1.0_head/ OPEN a12__head_14D7ABC6__1
1.0_head/ ACCESS a12__head_14D7ABC6__1
1.0_head/ OPEN a63__head_9490B166__1
1.0_head/ ACCESS a63__head_9490B166__1
1.0_head/ OPEN a53__head_DF95B716__1
1.0_head/ ACCESS a53__head_DF95B716__1
1.0_head/ OPEN a13__head_E09E0896__1
1.0_head/ ACCESS a13__head_E09E0896__1
1.0_head/ OPEN a27__head_7ED31896__1
1.0_head/ ACCESS a27__head_7ED31896__1
1.0_head/ OPEN a43__head_7052A656__1
1.0_head/ ACCESS a43__head_7052A656__1
1.0_head/ OPEN a28__head_E6257CD6__1
1.0_head/ ACCESS a28__head_E6257CD6__1
1.0_head/ OPEN a35__head_ACABD736__1
1.0_head/ ACCESS a35__head_ACABD736__1
1.0_head/ OPEN a54__head_B9482876__1
1.0_head/ CLOSE_WRITE,CLOSE a12__head_14D7ABC6__1
1.0_head/ ACCESS a54__head_B9482876__1
1.0_head/ OPEN a4__head_F12ACA76__1
1.0_head/ CLOSE_WRITE,CLOSE a63__head_9490B166__1
1.0_head/ ACCESS a4__head_F12ACA76__1
1.0_head/ OPEN a84__head_B033038E__1
1.0_head/ ACCESS a84__head_B033038E__1
1.0_head/ OPEN a19__head_D6A64F9E__1
1.0_head/ ACCESS a19__head_D6A64F9E__1
1.0_head/ OPEN a93__head_F54E757E__1
1.0_head/ ACCESS a93__head_F54E757E__1
1.0_head/ OPEN a7__head_1F08F77E__1
1.0_head/ ACCESS a7__head_1F08F77E__1
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN a9__head_635C6201__1
1.0_head/ ACCESS a9__head_635C6201__1
1.0_head/ OPEN a11__head_12780121__1
1.0_head/ ACCESS a11__head_12780121__1
1.0_head/ OPEN a50__head_5E524321__1
1.0_head/ ACCESS a50__head_5E524321__1
1.0_head/ OPEN a75__head_27E1CB21__1
1.0_head/ ACCESS a75__head_27E1CB21__1
1.0_head/ OPEN a21__head_69ACD1A1__1
1.0_head/ ACCESS a21__head_69ACD1A1__1
1.0_head/ OPEN a25__head_698E7751__1
1.0_head/ ACCESS a25__head_698E7751__1
1.0_head/ OPEN a44__head_57E29949__1
1.0_head/ ACCESS a44__head_57E29949__1
1.0_head/ OPEN a66__head_944E79C9__1
1.0_head/ ACCESS a66__head_944E79C9__1
1.0_head/ OPEN a52__head_DAC6BF29__1
1.0_head/ ACCESS a52__head_DAC6BF29__1
1.0_head/ OPEN a14__head_295EA1A9__1
1.0_head/ ACCESS a14__head_295EA1A9__1
1.0_head/ OPEN a70__head_62941259__1
1.0_head/ ACCESS a70__head_62941259__1
1.0_head/ OPEN a18__head_53B48959__1
1.0_head/ ACCESS a18__head_53B48959__1
1.0_head/ OPEN a17__head_7D103759__1
1.0_head/ ACCESS a17__head_7D103759__1
1.0_head/ OPEN a6__head_9505BEF9__1
1.0_head/ ACCESS a6__head_9505BEF9__1
1.0_head/ OPEN a77__head_88A7CC25__1
1.0_head/ ACCESS a77__head_88A7CC25__1
1.0_head/ OPEN a37__head_141AFE65__1
1.0_head/ ACCESS a37__head_141AFE65__1
1.0_head/ OPEN a74__head_90DAAD15__1
1.0_head/ ACCESS a74__head_90DAAD15__1
1.0_head/ OPEN a32__head_B7957195__1
1.0_head/ ACCESS a32__head_B7957195__1
1.0_head/ OPEN a45__head_CCCFB5D5__1
1.0_head/ ACCESS a45__head_CCCFB5D5__1
1.0_head/ OPEN a24__head_3B937275__1
1.0_head/ ACCESS a24__head_3B937275__1
1.0_head/ OPEN a26__head_2AB240F5__1
1.0_head/ ACCESS a26__head_2AB240F5__1
1.0_head/ OPEN a89__head_8E387EF5__1
1.0_head/ ACCESS a89__head_8E387EF5__1
1.0_head/ OPEN a80__head_6FEFE78D__1
1.0_head/ ACCESS a80__head_6FEFE78D__1
1.0_head/ OPEN a51__head_0BCC72CD__1
1.0_head/ ACCESS a51__head_0BCC72CD__1
1.0_head/ OPEN a71__head_88F4796D__1
1.0_head/ ACCESS a71__head_88F4796D__1
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN a88__head_B0A64FED__1
1.0_head/ ACCESS a88__head_B0A64FED__1
1.0_head/ OPEN a8__head_F885EA9D__1
1.0_head/ ACCESS a8__head_F885EA9D__1
1.0_head/ OPEN a83__head_1322679D__1
1.0_head/ ACCESS a83__head_1322679D__1
1.0_head/ OPEN a76__head_B8285A7D__1
1.0_head/ ACCESS a76__head_B8285A7D__1
1.0_head/ OPEN a94__head_D3BBB683__1
1.0_head/ ACCESS a94__head_D3BBB683__1
1.0_head/ OPEN a46__head_E2C6C983__1
1.0_head/ ACCESS a46__head_E2C6C983__1
1.0_head/ OPEN a56__head_A1E888C3__1
1.0_head/ ACCESS a56__head_A1E888C3__1
1.0_head/ OPEN a99__head_DD3B45C3__1
1.0_head/ ACCESS a99__head_DD3B45C3__1
1.0_head/ OPEN a79__head_AC19FC13__1
1.0_head/ ACCESS a79__head_AC19FC13__1
1.0_head/ OPEN a81__head_BC0AFFF3__1
1.0_head/ ACCESS a81__head_BC0AFFF3__1
1.0_head/ OPEN a64__head_C042B84B__1
1.0_head/ ACCESS a64__head_C042B84B__1
1.0_head/ OPEN a97__head_29054B4B__1
1.0_head/ ACCESS a97__head_29054B4B__1
1.0_head/ OPEN a96__head_BAAC0DCB__1
1.0_head/ ACCESS a96__head_BAAC0DCB__1
1.0_head/ OPEN a62__head_84A40AAB__1
1.0_head/ ACCESS a62__head_84A40AAB__1
1.0_head/ OPEN a98__head_C15FD53B__1
1.0_head/ ACCESS a98__head_C15FD53B__1
1.0_head/ OPEN a87__head_12F9237B__1
1.0_head/ ACCESS a87__head_12F9237B__1
1.0_head/ OPEN a2__head_E2983C17__1
1.0_head/ ACCESS a2__head_E2983C17__1
1.0_head/ OPEN a20__head_7E477A77__1
1.0_head/ ACCESS a20__head_7E477A77__1
1.0_head/ OPEN a49__head_3ADEC577__1
1.0_head/ ACCESS a49__head_3ADEC577__1
1.0_head/ OPEN a61__head_C860ABF7__1
1.0_head/ ACCESS a61__head_C860ABF7__1
1.0_head/ OPEN a68__head_BC5C8F8F__1
1.0_head/ ACCESS a68__head_BC5C8F8F__1
1.0_head/ OPEN a38__head_78AE322F__1
1.0_head/ ACCESS a38__head_78AE322F__1
1.0_head/ OPEN a65__head_7EE57AEF__1
1.0_head/ ACCESS a65__head_7EE57AEF__1
1.0_head/ OPEN a47__head_B6C48D1F__1
1.0_head/ ACCESS a47__head_B6C48D1F__1
1.0_head/ OPEN a86__head_7FB2C85F__1
1.0_head/ ACCESS a86__head_7FB2C85F__1
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ ACCESS,ISDIR
1.0_head/ CLOSE_NOWRITE,CLOSE,ISDIR
1.0_head/ OPEN a40__head_5F0404DF__1
1.0_head/ ACCESS a40__head_5F0404DF__1
</pre>
|
在給osd.0開啟debug_osd=20后觀測chunky相關(guān)的日志
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">[root@lab8106 ceph]# cat ceph-osd.0.log |grep chunky:1|grep handle_replica_op
2017-08-18 23:50:40.262448 7f2ac583c700 10 osd.0 26 handle_replica_op replica scrub(pg: 1.0,from:0'0,to:22'2696,epoch:26,start:1:00000000::::head,end:1:42307943:::a100:0,chunky:1,deep:1,seed:4294967295,version:6) v6 epoch 26
2017-08-18 23:50:40.294637 7f2ac583c700 10 osd.0 26 handle_replica_op replica scrub(pg: 1.0,from:0'0,to:22'2694,epoch:26,start:1:42307943:::a100:0,end:1:80463ac6:::a9:0,chunky:1,deep:1,seed:4294967295,version:6) v6 epoch 26
2017-08-18 23:50:40.320986 7f2ac583c700 10 osd.0 26 handle_replica_op replica scrub(pg: 1.0,from:0'0,to:22'2690,epoch:26,start:1:80463ac6:::a9:0,end:1:b7f2650d:::a88:0,chunky:1,deep:1,seed:4294967295,version:6) v6 epoch 26
2017-08-18 23:50:40.337646 7f2ac583c700 10 osd.0 26 handle_replica_op replica scrub(pg: 1.0,from:0'0,to:22'2700,epoch:26,start:1:b7f2650d:::a88:0,end:1:fb2020fa:::a40:0,chunky:1,deep:1,seed:4294967295,version:6) v6 epoch 26
2017-08-18 23:50:40.373227 7f2ac583c700 10 osd.0 26 handle_replica_op replica scrub(pg: 1.0,from:0'0,to:22'2636,epoch:26,start:1:fb2020fa:::a40:0,end:MAX,chunky:1,deep:1,seed:4294967295,version:6) v6 epoch 26
</pre>
|
截取關(guān)鍵部分看下,如圖

我們看下上面的文件訪問監(jiān)控里面這些對象在什么位置
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">25:1.0_head/ ACCESS a100__head_C29E0C42__1
50:1.0_head/ ACCESS a9__head_635C6201__1
75:1.0_head/ ACCESS a88__head_B0A64FED__1
100:1.0_head/ ACCESS a40__head_5F0404DF__1
</pre>
|
看上去是不是很有規(guī)律,這個地方在ceph里面會有個chunk的概念,在做scrub的時候,ceph會對這個chunk進(jìn)行加鎖,這個可以在很多地方看到這個,這個也就是為什么有slow request,并不一定是你的磁盤慢了,而是加了鎖,就沒法讀的
osd scrub chunk min
Description: The minimal number of object store chunks to scrub during single operation. Ceph blocks writes to single chunk during scrub.
Type: 32-bit Integer
Default: 5
從配置文件上面看說是會鎖住寫,沒有提及讀的鎖定的問題,那么我們下面驗(yàn)證下這個問題,到底deep-scrub,是不是會引起讀的slow request
上面的環(huán)境100個對象,現(xiàn)在把100個對象的大小調(diào)整為100M一個,并且chunk設(shè)置為100個對象的,也就是我把我這個環(huán)境所有的對象認(rèn)為是一個大的chunk,然后去用rados讀取這個對象,來看下會發(fā)生什么
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">osd_scrub_chunk_min = 100
osd_scrub_chunk_max = 100
</pre>
|
使用ceph -w監(jiān)控
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">2017-08-19 00:19:26.045032 mon.0 [INF] pgmap v377: 1 pgs: 1 active+clean+scrubbing+deep; 10000 MB data, 30103 MB used, 793 GB / 822 GB avail
2017-08-19 00:19:17.540413 osd.0 [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.398705 secs
2017-08-19 00:19:17.540456 osd.0 [WRN] slow request 30.398705 seconds old, received at 2017-08-19 00:18:47.141483: replica scrub(pg: 1.0,from:0'0,to:26'5200,epoch:32,start:1:00000000::::head,end:MAX,chunky:1,deep:1,seed:4294967295,version:6) currently reached_pg
</pre>
|
我從deep scrub 一開始就進(jìn)行a40對象的get rados -p rbd get a40 a40,直接就卡著不返回,在pg內(nèi)對象不變的情況下,對pg做scrub的順序是不變的,我專門挑了我這個scrub順序下最后一個scrub的對象來做get,還是出現(xiàn)了slow request ,這個可以證明上面的推斷,也就是在做scrub的時候,對scub的chunk的對象的讀取請求也會卡死,現(xiàn)在我把我的scrub的chunk弄成1看下會發(fā)生什么
配置參數(shù)改成
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 1
</pre>
|
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">watch -n 1 'rados -p rbd get a9 a1'
watch -n 1 'rados -p rbd get a9 a2'
watch -n 1 'rados -p rbd get a9 a3'
watch -n 1 'rados -p rbd get a9 a4'
watch -n 1 'rados -p rbd get a9 a5'
</pre>
|
使用五個請求同時去get a9,循環(huán)的去做
然后做deep scrub,這一次并沒有出現(xiàn)slow request 的情況
另外一個重要參數(shù)
再看看這個參數(shù)osd_scrub_sleep = 0
osd scrub sleep
Description: Time to sleep before scrubbing next group of chunks. Increasing this value will slow down whole scrub operation while client operations will be less impacted.
Type: Float
Default: 0
可以看到還有scrub group這個概念,從數(shù)據(jù)上分析這個group 是3,也就是3個chunks
我們來設(shè)置下
osd_scrub_sleep = 5
然后再次做deep-scrub,然后看下日志的內(nèi)容
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">cat /var/log/ceph/ceph-osd.0.log |grep be_deep_scrub|awk '{print 2,$28}'|less
2017-08-19 00:48:37.930455 1:02f625f1:::a16:head
2017-08-19 00:48:38.477271 1:02f625f1:::a16:head
2017-08-19 00:48:38.477367 1:04ebf846:::a39:head
2017-08-19 00:48:39.023952 1:04ebf846:::a39:head
2017-08-19 00:48:39.024084 1:07e14aa6:::a30:head
2017-08-19 00:48:39.572683 1:07e14aa6:::a30:head
2017-08-19 00:48:44.989551 1:0bc7740d:::a91:head
2017-08-19 00:48:45.556758 1:0bc7740d:::a91:head
2017-08-19 00:48:45.556857 1:0c7c7979:::a33:head
2017-08-19 00:48:46.109657 1:0c7c7979:::a33:head
2017-08-19 00:48:46.109768 1:0cd63f56:::a92:head
2017-08-19 00:48:46.657849 1:0cd63f56:::a92:head
2017-08-19 00:48:52.084712 1:0d551235:::a22:head
2017-08-19 00:48:52.614345 1:0d551235:::a22:head
2017-08-19 00:48:52.614458 1:13509d6e:::a42:head
2017-08-19 00:48:53.158826 1:13509d6e:::a42:head
2017-08-19 00:48:53.158916 1:14e585a7:::a5:head
</pre>
|
可以看到1s做一個對象的deep-scrub,然后在做了3個對象后就停止了5s
默認(rèn)情況下的scrub和修改后的對比
我們來計算下在修改前后的情況對比,我們來模擬pg里面有10000個對象的情況小文件 測試的文件都是1K的,這個可以根據(jù)自己的文件模型進(jìn)行測試
假設(shè)是海量對象的場景,那么算下來單pg 1w左右對象左右也算比較多了,我們就模擬10000個對象的場景的deep-scrub
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">cat /var/log/ceph/ceph-osd.0.log |grep be_deep_scrub|awk '{print 2,
2,1,8),
0}'|uniq|awk '{a[
2]++}END{for (j in a) print j,a[j]|"sort -k 1"}'
</pre>
|
使用上面的腳本統(tǒng)計每秒scrub的對象數(shù)目
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">2017-08-19 01:23:33 184
2017-08-19 01:23:34 236
2017-08-19 01:23:35 261
2017-08-19 01:23:36 263
2017-08-19 01:23:37 229
2017-08-19 01:23:38 289
2017-08-19 01:23:39 236
2017-08-19 01:23:40 258
2017-08-19 01:23:41 276
2017-08-19 01:23:42 238
2017-08-19 01:23:43 224
2017-08-19 01:23:44 282
2017-08-19 01:23:45 254
2017-08-19 01:23:46 258
2017-08-19 01:23:47 261
2017-08-19 01:23:48 233
2017-08-19 01:23:49 300
2017-08-19 01:23:50 243
2017-08-19 01:23:51 257
2017-08-19 01:23:52 252
2017-08-19 01:23:53 246
2017-08-19 01:23:54 313
2017-08-19 01:23:55 252
2017-08-19 01:23:56 276
2017-08-19 01:23:57 245
2017-08-19 01:23:58 256
2017-08-19 01:23:59 307
2017-08-19 01:24:00 276
2017-08-19 01:24:01 310
2017-08-19 01:24:02 220
2017-08-19 01:24:03 250
2017-08-19 01:24:04 313
2017-08-19 01:24:05 265
2017-08-19 01:24:06 304
2017-08-19 01:24:07 262
2017-08-19 01:24:08 308
2017-08-19 01:24:09 263
2017-08-19 01:24:10 293
2017-08-19 01:24:11 42
</pre>
|
可以看到1s 會掃300個對象左右,差不多40s鐘就掃完了一個pg,默認(rèn)25個對象一個trunk
這里可以打個比喻,在一條長為40m的馬路上,一個汽車以1m/s速度前進(jìn),中間會有人來回穿,如果穿梭的人只有一兩個可能沒什么問題,但是一旦有40個人在這個區(qū)間進(jìn)行穿梭的時候,可想而知碰撞的概率會有多大了
或者同一個文件被連續(xù)請求40次,那么對應(yīng)到這里就是40個人在同一個位置不停的穿馬路,這樣撞上的概率是不是非常的大了?
上面說了這么多,那么我想如果整個看下來,應(yīng)該知道怎么處理了
我們看下這樣的全部為1的情況下,會出現(xiàn)什么情況
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 1
osd_scrub_sleep = 3
</pre>
|
這里減少chunk大小,相當(dāng)于減少上面例子當(dāng)中汽車的長度,原來25米的大卡車,變成1米的自行車了
|
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace, monospace; font-size: 1em; margin: 0px; padding: 0px;">[root@lab8106 ceph]# cat /var/log/ceph/ceph-osd.0.log |grep be_deep_scrub|awk '{print 2,$28}'
2017-08-19 16:12:21.927440 1:0000b488:::a5471:head
2017-08-19 16:12:21.931914 1:0000b488:::a5471:head
2017-08-19 16:12:21.932039 1:000fbbcb:::a5667:head
2017-08-19 16:12:21.933568 1:000fbbcb:::a5667:head
2017-08-19 16:12:21.933646 1:00134ebd:::a1903:head
2017-08-19 16:12:21.934972 1:00134ebd:::a1903:head
2017-08-19 16:12:24.960697 1:0018f641:::a2028:head
2017-08-19 16:12:24.966653 1:0018f641:::a2028:head
2017-08-19 16:12:24.966733 1:00197a21:::a1463:head
2017-08-19 16:12:24.967085 1:00197a21:::a1463:head
2017-08-19 16:12:24.967162 1:001cb17d:::a1703:head
2017-08-19 16:12:24.967492 1:001cb17d:::a1703:head
2017-08-19 16:12:27.972252 1:002d911c:::a1585:head
2017-08-19 16:12:27.976621 1:002d911c:::a1585:head
2017-08-19 16:12:27.976740 1:00301acf:::a6131:head
2017-08-19 16:12:27.977097 1:00301acf:::a6131:head
2017-08-19 16:12:27.977181 1:0039a0a8:::a1840:head
2017-08-19 16:12:27.979053 1:0039a0a8:::a1840:head
2017-08-19 16:12:30.983556 1:00484881:::a8781:head
2017-08-19 16:12:30.989098 1:00484881:::a8781:head
2017-08-19 16:12:30.989181 1:004f234f:::a4402:head
2017-08-19 16:12:30.989531 1:004f234f:::a4402:head
2017-08-19 16:12:30.989626 1:00531b36:::a5251:head
2017-08-19 16:12:30.989954 1:00531b36:::a5251:head
2017-08-19 16:12:33.994419 1:00584c30:::a3374:head
2017-08-19 16:12:34.001296 1:00584c30:::a3374:head
2017-08-19 16:12:34.001378 1:005d6aa5:::a2115:head
2017-08-19 16:12:34.002174 1:005d6aa5:::a2115:head
2017-08-19 16:12:34.002287 1:005e0dfd:::a9945:head
2017-08-19 16:12:34.002686 1:005e0dfd:::a9945:head
2017-08-19 16:12:37.005645 1:006320f9:::a5207:head
2017-08-19 16:12:37.011498 1:006320f9:::a5207:head
2017-08-19 16:12:37.011655 1:006d32b4:::a7517:head
2017-08-19 16:12:37.011998 1:006d32b4:::a7517:head
2017-08-19 16:12:37.012111 1:006dae55:::a4702:head
2017-08-19 16:12:37.012442 1:006dae55:::a4702:head
</pre>
|
上面從日志里面截取部分的日志,這個是什么意思呢,是每秒鐘掃描3個對象,然后休息3s再進(jìn)行下一個,這個是不是已經(jīng)把速度壓到非常低了?還有上面做測試scrub sleep例子里面好像是1s 會scrub 1個對象,這里怎么就成了1s會scrub 3 個對象了,這個跟scrub的對象大小有關(guān),對象越大,scrub的時間就相對長一點(diǎn),這個測試?yán)锩娴膶ο笫?K的,基本算非常小了,也就是1s會掃描3個對象,然后根據(jù)你的設(shè)置的sleep值等待進(jìn)入下一組的scrub
在上面的環(huán)境下默認(rèn)每秒鐘會對300左右的對象進(jìn)行scrub,以25個對象的鎖定窗口移動,無法寫入和讀取,而參數(shù)修改后每秒有3個對象被scrub,以1個對象的鎖定窗口移動,這個單位時間鎖定的對象的數(shù)目已經(jīng)降低到一個非常低的程度了,如果你有生產(chǎn)環(huán)境又想去開scrub,不妨嘗試下降低chunk,增加sleep
這個的影響就是掃描的速度而已,而如果你想加快掃描速度,就去調(diào)整sleep參數(shù)來控制這個掃描的速度了,這個就不在這里贅述了
本篇講述的是一個PG上開啟deep-scrub以后的影響,默認(rèn)的是到了最大的intelval以后就會開啟自動開啟scrub了,所以我建議的是不用系統(tǒng)自帶的時間控制,而是自己去分析的scrub的時間戳和對象數(shù)目,然后計算好以后,可以是每天晚上,掃描指定個數(shù)的PG,然后等一輪全做完以后,中間就是自定義的一段時間的不掃描期,這個可以自己定義,是一個月或者兩個月掃一輪都行,這個會在后面單獨(dú)寫一篇文章來講述這個
總結(jié)
關(guān)于scrub,你需要了解,scrub什么時候會發(fā)生,發(fā)生以后會對你的osd產(chǎn)生多少的負(fù)載,每秒鐘會掃描多少對象,如何去降低這個影響,這些問題就是本篇的來源了,很多問題是能從參數(shù)上進(jìn)行解決的,關(guān)鍵是你要知道它們到底在干嘛