如何從 Ceph (Luminous) 集群中安全移除 OSD

OSD.png

工作中需要從 Ceph 的集群中移除一臺存儲服務(wù)器,挪作他用。Ceph 存儲空間即使在移除該存儲服務(wù)器后依舊夠用,所以操作是可行的,但集群已經(jīng)運行了很長時間,每個服務(wù)器上都存儲了很多數(shù)據(jù),在數(shù)據(jù)無損的情況下移除,看起來也不簡單。

1. OSD 布局

先來看看 OSD 的布局

$ ceph osd tree
ID CLASS WEIGHT    TYPE NAME      STATUS REWEIGHT PRI-AFF 
-1       265.25757 root default                           
-5       132.62878     host osd7                          
24   hdd   5.52620         osd.24     up  1.00000 1.00000 
25   hdd   5.52620         osd.25     up  1.00000 1.00000 
26   hdd   5.52620         osd.26     up  1.00000 1.00000 
27   hdd   5.52620         osd.27     up  1.00000 1.00000 
28   hdd   5.52620         osd.28     up  1.00000 1.00000 
29   hdd   5.52620         osd.29     up  1.00000 1.00000 
30   hdd   5.52620         osd.30     up  1.00000 1.00000 
31   hdd   5.52620         osd.31     up  1.00000 1.00000 
32   hdd   5.52620         osd.32     up  1.00000 1.00000 
33   hdd   5.52620         osd.33     up  1.00000 1.00000 
34   hdd   5.52620         osd.34     up  1.00000 1.00000 
35   hdd   5.52620         osd.35     up  1.00000 1.00000 
36   hdd   5.52620         osd.36     up  1.00000 1.00000 
37   hdd   5.52620         osd.37     up  1.00000 1.00000 
38   hdd   5.52620         osd.38     up  1.00000 1.00000 
39   hdd   5.52620         osd.39     up  1.00000 1.00000 
40   hdd   5.52620         osd.40     up  1.00000 1.00000 
41   hdd   5.52620         osd.41     up  1.00000 1.00000 
42   hdd   5.52620         osd.42     up  1.00000 1.00000 
43   hdd   5.52620         osd.43     up  1.00000 1.00000 
44   hdd   5.52620         osd.44     up  1.00000 1.00000 
45   hdd   5.52620         osd.45     up  1.00000 1.00000 
46   hdd   5.52620         osd.46     up  1.00000 1.00000 
47   hdd   5.52620         osd.47     up  1.00000 1.00000 
-3       132.62878     host osd8                          
 0   hdd   5.52620         osd.0      up  1.00000 1.00000 
 1   hdd   5.52620         osd.1      up  1.00000 1.00000 
 2   hdd   5.52620         osd.2      up  1.00000 1.00000 
 3   hdd   5.52620         osd.3      up  1.00000 1.00000 
 4   hdd   5.52620         osd.4      up  1.00000 1.00000 
 5   hdd   5.52620         osd.5      up  1.00000 1.00000 
 6   hdd   5.52620         osd.6      up  1.00000 1.00000 
 7   hdd   5.52620         osd.7      up  1.00000 1.00000 
 8   hdd   5.52620         osd.8      up  1.00000 1.00000 
 9   hdd   5.52620         osd.9      up  1.00000 1.00000 
10   hdd   5.52620         osd.10     up  1.00000 1.00000 
11   hdd   5.52620         osd.11     up  1.00000 1.00000 
12   hdd   5.52620         osd.12     up  1.00000 1.00000 
13   hdd   5.52620         osd.13     up  1.00000 1.00000 
14   hdd   5.52620         osd.14     up  1.00000 1.00000 
15   hdd   5.52620         osd.15     up  1.00000 1.00000 
16   hdd   5.52620         osd.16     up  1.00000 1.00000 
17   hdd   5.52620         osd.17     up  1.00000 1.00000 
18   hdd   5.52620         osd.18     up  1.00000 1.00000 
19   hdd   5.52620         osd.19     up  1.00000 1.00000 
20   hdd   5.52620         osd.20     up  1.00000 1.00000 
21   hdd   5.52620         osd.21     up  1.00000 1.00000 
22   hdd   5.52620         osd.22     up  1.00000 1.00000 
23   hdd   5.52620         osd.23     up  1.00000 1.00000 

一共兩臺服務(wù)器,48 個 OSD。需要把 osd8 移除,那么就需要把上面的所有的 24 個 OSD 全部刪除。

2. 單個 OSD 進程刪除流程

以移除 osd.0 為例看一下移除 OSD 的流程:

2.1 將狀態(tài)設(shè)置成 out

首先要現(xiàn)將 OSD 狀態(tài)設(shè)置成 out。

$ ceph osd out 0
marked out osd.0. 

這個階段 ceph 會自動將處于 out 狀態(tài) OSD 中的數(shù)據(jù)遷移到其他狀態(tài)正常的 OSD 上,所以在執(zhí)行完成后,需要使用 ceph -w 查看數(shù)據(jù)遷移流程。等到不再有輸出后,數(shù)據(jù)遷移完畢。

$ ceph -w  
  cluster:
    id:     063ed8d6-fc89-4fcb-8811-ff23915983e7
    health: HEALTH_ERR
            12408/606262 objects misplaced (2.047%)
            6 scrub errors
            Reduced data availability: 2 pgs peering
            Possible data damage: 5 pgs inconsistent
            application not enabled on 7 pool(s)
 
  services:
    mon: 3 daemons, quorum dell1,dell2,dell3
    mgr: dell1(active)
    mds: cephfs-1/1/1 up  {0=dell1=up:active}, 2 up:standby
    osd: 48 osds: 48 up, 47 in; 44 remapped pgs
    rgw: 3 daemons active
 
  data:
    pools:   22 pools, 1816 pgs
    objects: 296k objects, 963 GB
    usage:   5222 GB used, 254 TB / 259 TB avail
    pgs:     0.220% pgs not active
             12408/606262 objects misplaced (2.047%)
             1763 active+clean
             29   active+remapped+backfill_wait
             14   active+remapped+backfilling
             5    active+clean+inconsistent
             3    peering
             1    active+recovery_wait
             1    activating+remapped
 
  io:
    client:   59450 kB/s rd, 4419 MB/s wr, 1095 op/s rd, 2848 op/s wr
    recovery: 253 MB/s, 210 keys/s, 123 objects/s
 

2018-07-05 14:21:07.867104 mon.dell1 [WRN] Health check failed: Degraded data redundancy: 7/605732 objects degraded (0.001%), 1 pg degraded (PG_DEGRADED)
2018-07-05 14:21:12.252395 mon.dell1 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 7/605732 objects degraded (0.001%), 1 pg degraded)
2018-07-05 14:21:13.510741 mon.dell1 [WRN] Health check update: 12269/606262 objects misplaced (2.024%) (OBJECT_MISPLACED)
2018-07-05 14:21:13.510797 mon.dell1 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 2 pgs peering)
2018-07-05 14:21:19.488864 mon.dell1 [WRN] Health check update: 11553/606262 objects misplaced (1.906%) (OBJECT_MISPLACED)
2018-07-05 14:21:25.502619 mon.dell1 [WRN] Health check update: 10504/606262 objects misplaced (1.733%) (OBJECT_MISPLACED)
2018-07-05 14:21:31.745600 mon.dell1 [WRN] Health check update: 10091/606262 objects misplaced (1.664%) (OBJECT_MISPLACED)
2018-07-05 14:21:36.779666 mon.dell1 [WRN] Health check update: 9309/606262 objects misplaced (1.535%) (OBJECT_MISPLACED)
2018-07-05 14:21:41.779947 mon.dell1 [WRN] Health check update: 8580/606262 objects misplaced (1.415%) (OBJECT_MISPLACED)
2018-07-05 14:21:46.816584 mon.dell1 [WRN] Health check update: 8215/606262 objects misplaced (1.355%) (OBJECT_MISPLACED)
2018-07-05 14:21:51.817014 mon.dell1 [WRN] Health check update: 7331/606262 objects misplaced (1.209%) (OBJECT_MISPLACED)
2018-07-05 14:21:56.817406 mon.dell1 [WRN] Health check update: 6929/606262 objects misplaced (1.143%) (OBJECT_MISPLACED)
2018-07-05 14:22:01.817820 mon.dell1 [WRN] Health check update: 6426/606262 objects misplaced (1.060%) (OBJECT_MISPLACED)
2018-07-05 14:22:06.818188 mon.dell1 [WRN] Health check update: 5787/606262 objects misplaced (0.955%) (OBJECT_MISPLACED)
2018-07-05 14:22:11.818606 mon.dell1 [WRN] Health check update: 5429/606262 objects misplaced (0.895%) (OBJECT_MISPLACED)
2018-07-05 14:22:16.818981 mon.dell1 [WRN] Health check update: 5165/606262 objects misplaced (0.852%) (OBJECT_MISPLACED)
2018-07-05 14:22:20.303513 osd.35 [ERR] 13.2ad missing primary copy of 13:b56abc11:::d9593962-fa39-406f-bc35-7e4fcac1be9f.44307.2__shadow_121_1530008810116503747%2fplatform-cms.rar.2~vf7AEOlGNYM1ggI6IhV-iu22oDDXcvS.5_1:head, will try copies on 0
2018-07-05 14:22:21.819353 mon.dell1 [WRN] Health check update: 4866/606262 objects misplaced (0.803%) (OBJECT_MISPLACED)
2018-07-05 14:22:26.819657 mon.dell1 [WRN] Health check update: 4586/606262 objects misplaced (0.756%) (OBJECT_MISPLACED)
2018-07-05 14:22:31.819983 mon.dell1 [WRN] Health check update: 4323/606262 objects misplaced (0.713%) (OBJECT_MISPLACED)
2018-07-05 14:22:36.820335 mon.dell1 [WRN] Health check update: 4113/606262 objects misplaced (0.678%) (OBJECT_MISPLACED)
2018-07-05 14:22:41.820676 mon.dell1 [WRN] Health check update: 3949/606262 objects misplaced (0.651%) (OBJECT_MISPLACED)
2018-07-05 14:22:46.821040 mon.dell1 [WRN] Health check update: 3788/606262 objects misplaced (0.625%) (OBJECT_MISPLACED)
2018-07-05 14:22:51.821395 mon.dell1 [WRN] Health check update: 3665/606262 objects misplaced (0.605%) (OBJECT_MISPLACED)
2018-07-05 14:22:56.821692 mon.dell1 [WRN] Health check update: 3440/606262 objects misplaced (0.567%) (OBJECT_MISPLACED)
2018-07-05 14:23:01.821999 mon.dell1 [WRN] Health check update: 3170/606266 objects misplaced (0.523%) (OBJECT_MISPLACED)
2018-07-05 14:23:06.822355 mon.dell1 [WRN] Health check update: 2956/606266 objects misplaced (0.488%) (OBJECT_MISPLACED)
2018-07-05 14:23:11.822752 mon.dell1 [WRN] Health check update: 2747/606270 objects misplaced (0.453%) (OBJECT_MISPLACED)
2018-07-05 14:23:16.823168 mon.dell1 [WRN] Health check update: 2615/606270 objects misplaced (0.431%) (OBJECT_MISPLACED)
2018-07-05 14:23:21.823523 mon.dell1 [WRN] Health check update: 2512/606270 objects misplaced (0.414%) (OBJECT_MISPLACED)
2018-07-05 14:23:26.823878 mon.dell1 [WRN] Health check update: 2409/606270 objects misplaced (0.397%) (OBJECT_MISPLACED)
2018-07-05 14:23:31.824214 mon.dell1 [WRN] Health check update: 2299/606270 objects misplaced (0.379%) (OBJECT_MISPLACED)
2018-07-05 14:23:36.824596 mon.dell1 [WRN] Health check update: 2194/606270 objects misplaced (0.362%) (OBJECT_MISPLACED)
2018-07-05 14:23:41.825037 mon.dell1 [WRN] Health check update: 2101/606270 objects misplaced (0.347%) (OBJECT_MISPLACED)
2018-07-05 14:23:46.825390 mon.dell1 [WRN] Health check update: 1939/606270 objects misplaced (0.320%) (OBJECT_MISPLACED)
2018-07-05 14:23:51.825725 mon.dell1 [WRN] Health check update: 1777/606270 objects misplaced (0.293%) (OBJECT_MISPLACED)
2018-07-05 14:23:56.826087 mon.dell1 [WRN] Health check update: 1612/606270 objects misplaced (0.266%) (OBJECT_MISPLACED)
2018-07-05 14:24:01.826439 mon.dell1 [WRN] Health check update: 1444/606270 objects misplaced (0.238%) (OBJECT_MISPLACED)
2018-07-05 14:24:06.826755 mon.dell1 [WRN] Health check update: 1315/606270 objects misplaced (0.217%) (OBJECT_MISPLACED)
2018-07-05 14:24:11.828343 mon.dell1 [WRN] Health check update: 1264/606270 objects misplaced (0.208%) (OBJECT_MISPLACED)
2018-07-05 14:24:16.828638 mon.dell1 [WRN] Health check update: 1214/606270 objects misplaced (0.200%) (OBJECT_MISPLACED)
2018-07-05 14:24:21.886644 mon.dell1 [WRN] Health check update: 1161/606270 objects misplaced (0.191%) (OBJECT_MISPLACED)
2018-07-05 14:24:26.887027 mon.dell1 [WRN] Health check update: 1110/606270 objects misplaced (0.183%) (OBJECT_MISPLACED)
2018-07-05 14:24:32.287725 mon.dell1 [WRN] Health check update: 1069/606270 objects misplaced (0.176%) (OBJECT_MISPLACED)
2018-07-05 14:24:39.839578 mon.dell1 [WRN] Health check update: 960/606270 objects misplaced (0.158%) (OBJECT_MISPLACED)
2018-07-05 14:24:45.851276 mon.dell1 [WRN] Health check update: 905/606272 objects misplaced (0.149%) (OBJECT_MISPLACED)
2018-07-05 14:24:51.911053 mon.dell1 [WRN] Health check update: 849/606272 objects misplaced (0.140%) (OBJECT_MISPLACED)
2018-07-05 14:24:57.960803 mon.dell1 [WRN] Health check update: 784/606272 objects misplaced (0.129%) (OBJECT_MISPLACED)
2018-07-05 14:25:05.887641 mon.dell1 [WRN] Health check update: 688/606272 objects misplaced (0.113%) (OBJECT_MISPLACED)
2018-07-05 14:25:11.945922 mon.dell1 [WRN] Health check update: 631/606272 objects misplaced (0.104%) (OBJECT_MISPLACED)
2018-07-05 14:25:16.946267 mon.dell1 [WRN] Health check update: 570/606272 objects misplaced (0.094%) (OBJECT_MISPLACED)
2018-07-05 14:25:21.993994 mon.dell1 [WRN] Health check update: 528/606272 objects misplaced (0.087%) (OBJECT_MISPLACED)
2018-07-05 14:25:26.994417 mon.dell1 [WRN] Health check update: 468/606272 objects misplaced (0.077%) (OBJECT_MISPLACED)
2018-07-05 14:25:31.994789 mon.dell1 [WRN] Health check update: 411/606272 objects misplaced (0.068%) (OBJECT_MISPLACED)
2018-07-05 14:25:36.995192 mon.dell1 [WRN] Health check update: 353/606272 objects misplaced (0.058%) (OBJECT_MISPLACED)
2018-07-05 14:25:42.009567 mon.dell1 [WRN] Health check update: 293/606272 objects misplaced (0.048%) (OBJECT_MISPLACED)
2018-07-05 14:25:47.009879 mon.dell1 [WRN] Health check update: 241/606272 objects misplaced (0.040%) (OBJECT_MISPLACED)
2018-07-05 14:25:52.010822 mon.dell1 [WRN] Health check update: 187/606272 objects misplaced (0.031%) (OBJECT_MISPLACED)
2018-07-05 14:25:57.011182 mon.dell1 [WRN] Health check update: 133/606272 objects misplaced (0.022%) (OBJECT_MISPLACED)
2018-07-05 14:26:02.035637 mon.dell1 [WRN] Health check update: 78/606272 objects misplaced (0.013%) (OBJECT_MISPLACED)
2018-07-05 14:26:07.035965 mon.dell1 [WRN] Health check update: 22/606272 objects misplaced (0.004%) (OBJECT_MISPLACED)
2018-07-05 14:26:12.011546 mon.dell1 [INF] Health check cleared: OBJECT_MISPLACED (was: 22/606272 objects misplaced (0.004%))

2.2 PG 修復(fù)

但不是數(shù)據(jù)遷移結(jié)束后就萬事大吉了,可以通過下面這個命令看到,數(shù)據(jù)遷移后,有五個 pg 狀態(tài)不正常,需要修復(fù)。

$ ceph health detail
HEALTH_ERR 6 scrub errors; Possible data damage: 5 pgs inconsistent
OSD_SCRUB_ERRORS 6 scrub errors
PG_DAMAGED Possible data damage: 5 pgs inconsistent
    pg 13.cd is active+clean+inconsistent, acting [20,35]
    pg 13.244 is active+clean+inconsistent, acting [35,22]
    pg 13.270 is active+clean+inconsistent, acting [35,14]
    pg 13.308 is active+clean+inconsistent, acting [35,17]
    pg 13.34f is active+clean+inconsistent, acting [11,35]

執(zhí)行 repair 命令來修復(fù),如果還是不成功,可以使用 scrub 來進行數(shù)據(jù)清理。

$ ceph pg repair 13.cd
$ ceph pg scrub 13.cd

2.3 關(guān)閉 OSD 進程

數(shù)據(jù)遷移至此算是完成了,但 osd 進程還是跑著的。

 0   hdd   5.52620         osd.0      up        0 1.00000

接下來需要登錄到 OSD 服務(wù)器上關(guān)閉掉該進程。

$ ssh osd8
$ sudo systemctl stop ceph-osd@0

現(xiàn)在 osd 進程的狀態(tài)已經(jīng)已經(jīng)是 down 了。

 0   hdd   5.52620         osd.0    down        0 1.00000 

2.4 刪除 OSD

最后執(zhí)行 purge 命令,將該 osd 從 CRUSH map 中徹底刪掉,至此,單個 OSD 的刪除終于完成了。

$ ceph osd purge 0 --yes-i-really-mean-it
purged osd.0

對了,最后,如果 /etc/ceph/ceph.conf 中由對應(yīng)的該 osd 的信息,記得要一起刪除。

3. 參考文檔

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 集群管理 每次用命令啟動、重啟、停止Ceph守護進程(或整個集群)時,必須指定至少一個選項和一個命令,還可能要指定...
    Arteezy_Xie閱讀 19,911評論 0 19
  • 系統(tǒng)環(huán)境: centos73.10.0-514.26.2.el7.x86_64 機器數(shù)量:五臺 硬盤:四塊一塊為系...
    think_lonely閱讀 5,023評論 0 5
  • 1. Ceph架構(gòu)簡介及使用場景介紹 1.1 Ceph簡介 Ceph是一個統(tǒng)一的分布式存儲系統(tǒng),設(shè)計初衷是提供較好...
    lihanglucien閱讀 83,833評論 6 92
  • ceph簡介 Ceph是一個分布式存儲系統(tǒng),誕生于2004年,是最早致力于開發(fā)下一代高性能分布式文件系統(tǒng)的項目。隨...
    愛吃土豆的程序猿閱讀 6,171評論 0 21
  • 一、概述 Ceph是一個分布式存儲系統(tǒng),誕生于2004年,最早致力于開發(fā)下一代高性能分布式文件系統(tǒng)的項目。隨著云計...
    魏鎮(zhèn)坪閱讀 49,858評論 3 54

友情鏈接更多精彩內(nèi)容