參考鏈接
中文文檔: http://docs.ceph.org.cn/ (相對較老)
英文文檔: https://docs.ceph.com/en/latest/
倉庫源:
https://mirrors.aliyun.com/ceph/ #阿里云鏡像倉庫
http://mirrors.163.com/ceph/ #網(wǎng)易鏡像倉庫
https://mirrors.tuna.tsinghua.edu.cn/ceph/ #清華大學(xué)鏡像源
一、Ceph 簡介
Ceph是一個開源的分布式存儲系統(tǒng)。
Ceph提供對象存儲/塊設(shè)備/文件存儲類型。
1.1 Ceph 組件
Ceph存儲集群至少需要一個Ceph Monitor、Ceph Manager和Ceph OSD(Object Storage Daemon)。運行Ceph文件系統(tǒng)客戶端時也需要Ceph Metadata Server。

Monitors: Ceph Monitor(ceph-mon)維護(hù)集群狀態(tài)的圖表,包括監(jiān)控器圖、管理器圖、OSD圖、MDS圖和CRUSH圖。這些是重要的集群狀態(tài),Ceph守護(hù)進(jìn)程相互協(xié)調(diào)所需的圖表。Monitors還負(fù)責(zé)管理守護(hù)進(jìn)程和客戶機之間的身份驗證。為了實現(xiàn)冗余和高可用性,通常至少需要三個Monitors。
Managers: Ceph Manager守護(hù)進(jìn)程(ceph-mgr)負(fù)責(zé)跟蹤運行時指標(biāo)和Ceph集群的當(dāng)前狀態(tài),包括存儲利用率、當(dāng)前性能指標(biāo)和系統(tǒng)負(fù)載。Ceph Manager守護(hù)進(jìn)程還承載基于python的模塊,用于管理和公開Ceph集群信息,包括基于web的Ceph Dashboard和REST API。對于高可用性,通常至少需要兩個Managers。
Ceph OSDs: Ceph OSD(ceph-osd)用于存儲數(shù)據(jù),處理數(shù)據(jù)復(fù)制/恢復(fù)/再均衡等操作,并通過檢測其他Ceph OSD守護(hù)進(jìn)程的心跳,向Ceph Monitors和Managers提供一些監(jiān)控信息。為了實現(xiàn)冗余和高可用性,一般至少需要3個Ceph OSDs。
MDSs: Ceph Metadata Server(MDS, ceph-mds)為Ceph文件系統(tǒng)存儲元數(shù)據(jù)(Ceph塊設(shè)備和 Ceph對象存儲不使用MDS)。Ceph Metadata Servers允許POSIX文件系統(tǒng)的用戶執(zhí)行基本命令(如ls、find等),而不會給Ceph存儲集群帶來巨大的負(fù)擔(dān)。
Ceph將數(shù)據(jù)作為對象(objects) 存儲在邏輯存儲池中。Ceph使用CRUSH算法計算出哪個PG應(yīng)該包含該對象,并進(jìn)一步計算出哪個osd應(yīng)該存儲該PG。CRUSH算法使Ceph存儲集群能夠動態(tài)伸縮、重新均衡和恢復(fù)。
1.2 Ceph集群結(jié)構(gòu)
Ceph存儲集群提供了對象、塊和文件存儲功能,它可靠性高、管理簡便、并且是開源軟件。Ceph可提供極大的伸縮性——供成千用戶訪問PB乃至EB級的數(shù)據(jù)。Ceph節(jié)點以普通硬件和智能守護(hù)進(jìn)程作為支撐點,Ceph存儲集群啟動大量節(jié)點,它們之間靠相互通訊來復(fù)制數(shù)據(jù)、并動態(tài)地重分布數(shù)據(jù)。

1.3 Ceph存取數(shù)據(jù)流程

客戶端把對象寫入目標(biāo)PG的主OSD,然后這個主OSD再用它的CRUSH圖副本找出用于放對象副本的第二、第三個 OSD,并把數(shù)據(jù)復(fù)制到適當(dāng)?shù)腜G所對應(yīng)的第二、第三 OSD(要多少副本就有多少 OSD ),最終,確認(rèn)數(shù)據(jù)成功存儲后反饋給客戶端。

Ceph存儲系統(tǒng)支持"池"概念,它是存儲對象的邏輯分區(qū)

每個存儲池都有很多PG,CRUSH動態(tài)的把它們映射到OSD 。Ceph客戶端要存對象時,CRUSH將把各對象映射到某個PG。
存儲流程:
1.計算文件到對象的映射 oid:
計算文件到對象的映射,假如file為客戶端要讀寫的文件,得到oid(objectid)=ino+ono
ino: inode number(INO),F(xiàn)ile的元數(shù)據(jù)序列號,F(xiàn)ile的唯一id。
ono: object number(ONO),F(xiàn)ile切分產(chǎn)生的某個object的序號,默認(rèn)以4M切分一個塊大小。
2. 通過hash算法計算出文件對應(yīng)的pool中的PG
通過一致性HASH計算Object到PG,Object->PG映射hash(oid)&mask->pgid
3. 通過CRUSH把對象映射到PG中的OSD
通過CRUSH算法計算PG到OSD,PG->OSD映射:[CRUSH(pgid)->(osd1,osd2,osd3)]
4. PG中的主OSD將對象寫入到硬盤
5. 主OSD將數(shù)據(jù)同步給備份OSD,并等待備份OSD返回確認(rèn)
6. 主OSD將寫入完成返回給客戶端
二、通過ceph-deploy快速安裝部署ceph集群存儲
ceph-deploy是一種僅依靠SSH訪問服務(wù)器、sudo和一些Python來部署Ceph的方法。它不需要服務(wù)器、數(shù)據(jù)庫或任何類似的東西。
ceph推薦使用2塊萬兆網(wǎng)卡(2個網(wǎng)絡(luò)),public網(wǎng)絡(luò)用于客戶端訪問,cluster網(wǎng)絡(luò)用于集群管理及數(shù)據(jù)同步.
注: 本實驗簡單采用一塊網(wǎng)卡(public和cluster共用)。阿里云ECS,鏡像已經(jīng)經(jīng)過一些優(yōu)化(鏡像源/時間同步/openssh/允許root登錄等).
集群服務(wù)器采用ubuntu18系統(tǒng),如下:
1. 一臺ceph-deploy服務(wù)器,用于部署ceph集群
172.26.128.89/172.26.128.89
2. 三臺服務(wù)器Monitor服務(wù)器,用于ceph集群ceph-mon監(jiān)視服務(wù)器
172.26.128.90/172.26.128.90
172.26.128.91/172.26.128.91
172.26.128.92/172.26.128.92
3. 兩個Manager服務(wù)器,用于ceph-mgr管理服務(wù)器
172.26.128.93/172.26.128.93
172.26.128.94/172.26.128.94
4. 四臺OSD服務(wù)器,用于ceph集群ceph-osd存儲服務(wù)器,每臺2塊或以上的磁盤
172.26.128.95/172.26.128.95
172.26.128.96/172.26.128.96
172.26.128.97/172.26.128.97
172.26.128.98/172.26.128.98
各存儲服務(wù)器磁盤劃分: /dev/vdb /dev/vdc #20G
5. 2臺客戶端測試服務(wù)器ceph-client(centos和ubuntu各一臺)
172.26.128.99/172.26.128.99
172.26.128.100/172.26.128.100
2.1 基礎(chǔ)環(huán)境配置
基礎(chǔ)環(huán)境配置:
1. 時間同步(服務(wù)器時間必須同步,阿里云ecs自有時間同步,本實驗忽略)
2. 關(guān)閉selinux和防火墻(線上根據(jù)實際情況選擇是否關(guān)閉,阿里云ubuntu鏡像默認(rèn)關(guān)閉selinux和防火墻,本實驗忽略)
3. 更改主機名
4. 配置域名解析(實驗通過綁定host,線上需配置dns解析)
5. 設(shè)置倉庫源
6. 添加普通用戶并設(shè)置sudo權(quán)限
2.1.1 更改主機名
IP: 172.26.128.89 HostName: ceph-deploy
IP: 172.26.128.90 HostName: ceph-mon1
IP: 172.26.128.91 HostName: ceph-mon2
IP: 172.26.128.92 HostName: ceph-mon3
IP: 172.26.128.93 HostName: ceph-mgr1
IP: 172.26.128.94 HostName: ceph-mgr2
IP: 172.26.128.95 HostName: ceph-node1
IP: 172.26.128.96 HostName: ceph-node2
IP: 172.26.128.97 HostName: ceph-node3
IP: 172.26.128.98 HostName: ceph-node4
按照上述ip和主機名對應(yīng)關(guān)系更改,命令如下如下:
hostname ceph-deploy
echo ceph-deploy > /etc/hostname
2.1.2 添加域名解析
本實驗通過host方式,線上要通過域名解析
cat >> /etc/hosts<< EOF
# ceph host
172.26.128.89 ceph-deploy
172.26.128.90 ceph-mon1
172.26.128.91 ceph-mon2
172.26.128.92 ceph-mon3
172.26.128.93 ceph-mgr1
172.26.128.94 ceph-mgr2
172.26.128.95 ceph-node1
172.26.128.96 ceph-node2
172.26.128.97 ceph-node3
172.26.128.98 ceph-node4
# ceph host
EOF
2.1.3 設(shè)置倉庫源
所有服務(wù)器設(shè)置ceph倉庫源(阿里云鏡像源)
wget -q -O- 'https://mirrors.aliyun.com/ceph/keys/release.asc' | sudo apt-key add -
ceph_stable_release=pacific
echo deb https://mirrors.aliyun.com/ceph/debian-$ceph_stable_release/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
sudo apt update -y
2.1.4 添加部署ceph的普通用戶(cephyd)
較新版的ceph-deploy支持用--username選項提供可無密碼使用sudo的用戶名(包括root,不建議這樣做)。使用 ceph-deploy --username {username} 命令時,指定的用戶必須能夠通過無密碼SSH連接到Ceph節(jié)點,因為ceph-deploy中途不會提示輸入密碼。
推薦在集群內(nèi)的所有Ceph節(jié)點上給ceph-deploy創(chuàng)建一個特定的用戶(比如cephuser、cephadmin 這樣的用戶去管理ceph集群。
注:不要用"ceph"這個名字。從Infernalis版起,用戶名"ceph"保留給了Ceph守護(hù)進(jìn)程。如果Ceph節(jié)點上已經(jīng)有了"ceph"用戶,升級前必須先刪掉這個用戶。
本實驗在包含ceph-deploy節(jié)點、ceph-osd節(jié)點、ceph-mon節(jié)點和ceph-mgr節(jié)點等創(chuàng)建cephyd用戶)。
1. 在各ceph節(jié)點創(chuàng)建新用戶。
groupadd -r -g 2022 cephyd && useradd -r -m -s /bin/bash -u 2022 -g 2022 cephyd && echo cephyd:7QR59*TAI | chpasswd
2. 確保各ceph節(jié)點上新創(chuàng)建的用戶都有sudo權(quán)限
echo "cephyd ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/cephyd
sudo chmod 0440 /etc/sudoers.d/cephyd
2.1.5 配置允許ceph-deploy服務(wù)器無密碼SSH登錄至ceph集群各節(jié)點
因為ceph-deploy不支持輸入密碼,所以必須在ceph-deploy節(jié)點上生成SSH密鑰并把其公鑰分發(fā)到ceph集群各節(jié)點
。 ceph-deploy會嘗試給初始monitors生成SSH密鑰對。
生成SSH密鑰對,不要用sudo或root用戶。提示"Enter passphrase"時,直接回車,口令即為空:
在ceph-deploy服務(wù)器上操作:
1. 切換至cephyd用戶
root@ceph-deploy:~# su - cephyd
2. 生成密鑰對
cephyd@ceph-deploy:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cephyd/.ssh/id_rsa):
Created directory '/home/cephyd/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephyd/.ssh/id_rsa.
Your public key has been saved in /home/cephyd/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:t+nL8wT6ndIiEsQDPZIPlxjL/1vCXA1oHhs49F03svM cephyd@ceph-deploy
The key's randomart image is:
+---[RSA 2048]----+
| .=.. o o |
| .*o=o o . + . |
| oBo.* o o |
| .=+ + o o |
| ...S + . E |
| .+ + + |
| .* +.. |
| . .B+o.. |
| ...=*+ |
+----[SHA256]-----+
3. 分發(fā)公鑰到各被管理節(jié)點:
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.89
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.90
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.91
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.92
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.93
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.94
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.95
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.96
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.97
cephyd@ceph-deploy:~$ ssh-copy-id cephyd@172.26.128.98
4. 修改ceph-deploy管理節(jié)點上的~/.ssh/config文件,這樣ceph-deploy就能用你所建的用戶名登錄ceph節(jié)點了,而無需每次執(zhí)行ceph-deploy都要指定--username {username} 。這樣做同時也簡化了ssh和scp的用法。把 {username} 替換成你創(chuàng)建的用戶名。
cephyd@ceph-deploy:~$ cat >> ~/.ssh/config<< EOF
######## ceph #######
HOST ceph-deploy
HostName 172.26.128.89
User cephyd
PORT 22
HOST ceph-mon1
HostName 172.26.128.90
User cephyd
PORT 22
HOST ceph-mon2
HostName 172.26.128.91
User cephyd
PORT 22
HOST ceph-mon3
HostName 172.26.128.92
User cephyd
PORT 22
HOST ceph-mgr1
HostName 172.26.128.93
User cephyd
PORT 22
HOST ceph-mgr2
HostName 172.26.128.94
User cephyd
PORT 22
HOST ceph-node1
HostName 172.26.128.95
User cephyd
PORT 22
HOST ceph-node2
HostName 172.26.128.96
User cephyd
PORT 22
HOST ceph-node3
HostName 172.26.128.97
User cephyd
PORT 22
HOST ceph-node4
HostName 172.26.128.98
User cephyd
PORT 22
HOST ceph-client1
HostName 172.26.128.99
User cephyd
PORT 22
HOST ceph-client2
HostName 172.26.128.100
User cephyd
PORT 22
######## ceph #######
EOF
5. 測試免密登錄
cephyd@ceph-deploy:~$ ssh ceph-mon1 'date'
Sun Aug 15 16:14:29 CST 2021
cephyd@ceph-deploy:~$ ssh cephyd@ceph-mon1 'date'
Sun Aug 15 16:14:40 CST 2021
2.2 集群部署
通過ceph-deploy從集群管理節(jié)點建立一個Ceph存儲集群。
它包含一個Monitor、一個Manager和四個OSD守護(hù)進(jìn)程。一旦集群達(dá)到active+clean狀態(tài),再擴展它:增加第五個OSD、增加元數(shù)據(jù)服務(wù)器和兩個Ceph Monitors。
先在管理節(jié)點上創(chuàng)建一個目錄(ceph-cluster),用于保存ceph-deploy生成的配置文件和密鑰對。
cephyd@ceph-deploy:~$ mkdir ceph-cluster
cephyd@ceph-deploy:~$ cd ceph-cluster/
cephyd@ceph-deploy:~/ceph-cluster$ pwd
/home/cephyd/ceph-cluster
2.2.1 創(chuàng)建集群
1. 安裝CEPH部署工具
sudo apt-get install ceph-deploy
2. 創(chuàng)建集群
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy new --cluster-network 172.26.128.0/20 --public-network 172.26.128.0/20 ceph-mon1
3. 查看驗證
應(yīng)該有一個ceph配置文件、一個monitor密鑰和一個日志文件
cephyd@ceph-deploy:~/ceph-cluster$ ls
ceph.conf ceph-deploy-ceph.log ceph.mon.keyring
cephyd@ceph-deploy:~/ceph-cluster$ cat ceph.conf
[global]
fsid = 003cb89b-8812-4172-a327-6a774c687c6c
public_network = 172.26.128.0/20
cluster_network = 172.26.128.0/20
mon_initial_members = ceph-mon1
mon_host = 172.26.128.90
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
4. 安裝ceph
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy install --no-adjust-repos --nogpgcheck ceph-deploy ceph-mon1 ceph-mon2 ceph-mon3 ceph-mgr1 ceph-mgr2 ceph-node1 ceph-node2 ceph-node3 ceph-node4
5. 配置初始 monitor(s)、并收集所有密鑰:
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy mon create-initial
6. 驗證
在mon定節(jié)點已經(jīng)自動安裝并啟動了ceph-mon服務(wù),并且后期在ceph-deploy節(jié)點初始化目錄會生成一些ceph.bootstrap-mds/mgr/osd/rgw等服務(wù)的keyring認(rèn)證文件,這些初始化文件擁有對ceph集群的最高權(quán)限,所以一定要保存好。
cephyd@ceph-deploy:~/ceph-cluster$ ls
ceph.bootstrap-mds.keyring ceph.bootstrap-osd.keyring ceph.client.admin.keyring ceph-deploy-ceph.log
ceph.bootstrap-mgr.keyring ceph.bootstrap-rgw.keyring ceph.conf ceph.mon.keyring
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-mon1 'ps auxf | grep ceph-mon | grep -v grep'
ceph 27405 0.0 1.0 480388 40024 ? Ssl 17:01 0:00 /usr/bin/ceph-mon -f --cluster ceph --id ceph-mon1 --setuser ceph --setgroup ceph
7. 分發(fā)admin密鑰
在ceph-deploy節(jié)點把配置文件和admin密鑰拷貝至Ceph集群需要執(zhí)行ceph管理命令的節(jié)點,從而不需要后期通過 ceph命令對ceph集群進(jìn)行管理配置的時候每次都需要指定ceph-mon節(jié)點地址和ceph.client.admin.keyring文件,另外各ceph-mon節(jié)點也需要同步ceph的集群配置文件與認(rèn)證文件。
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy admin ceph-deploy ceph-node1 ceph-node2 ceph-node3 ceph-node4
8. 驗證密鑰
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node1 'ls -l /etc/ceph/'
total 12
-rw------- 1 root root 151 Aug 15 17:12 ceph.client.admin.keyring
-rw-r--r-- 1 root root 267 Aug 15 17:12 ceph.conf
-rw-r--r-- 1 root root 92 Jul 8 22:17 rbdmap
-rw------- 1 root root 0 Aug 15 17:10 tmpkzzYdt
9. 認(rèn)證文件的屬主和屬組為了安全考慮,默認(rèn)設(shè)置為了root用戶和root組,如果需要ceph用戶也能執(zhí)行ceph命令,那么就需要對ceph用戶進(jìn)行授權(quán),
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node1 'sudo apt install acl -y && sudo setfacl -m u:ceph:rw /etc/ceph/ceph.client.admin.keyring'
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node2 'sudo apt install acl -y && sudo setfacl -m u:ceph:rw /etc/ceph/ceph.client.admin.keyring'
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node3 'sudo apt install acl -y && sudo setfacl -m u:ceph:rw /etc/ceph/ceph.client.admin.keyring'
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node4 'sudo apt install acl -y && sudo setfacl -m u:ceph:rw /etc/ceph/ceph.client.admin.keyring'
10. 創(chuàng)建Manger服務(wù)器
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy mgr create ceph-mgr1
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph -s
cluster:
id: 003cb89b-8812-4172-a327-6a774c687c6c
health: HEALTH_WARN
mon is allowing insecure global_id reclaim
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum ceph-mon1 (age 49m)
mgr: ceph-mgr1(active, since 8m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
11. 解決HEALTH_WARN報警
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph config set mon auth_allow_insecure_global_id_reclaim false
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph -s
cluster:
id: 003cb89b-8812-4172-a327-6a774c687c6c
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum ceph-mon1 (age 51m)
mgr: ceph-mgr1(active, since 10m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
12. 列出node節(jié)點磁盤
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk list ceph-node1
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk list ceph-node2
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk list ceph-node3
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk list ceph-node4
13. 擦除磁盤
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk zap ceph-node1 /dev/vdb /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk zap ceph-node2 /dev/vdb /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk zap ceph-node3 /dev/vdb /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy disk zap ceph-node4 /dev/vdb /dev/vdc
14. 添加osd
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node1 --data /dev/vdb
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node1 --data /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node2 --data /dev/vdb
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node2 --data /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node3 --data /dev/vdb
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node3 --data /dev/vdc
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node4 --data /dev/vdb
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy osd create ceph-node4 --data /dev/vdc
15. 驗證
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph health
HEALTH_OK
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph -s
cluster:
id: 003cb89b-8812-4172-a327-6a774c687c6c
health: HEALTH_OK
services:
mon: 1 daemons, quorum ceph-mon1 (age 3h)
mgr: ceph-mgr1(active, since 3h)
osd: 6 osds: 6 up (since 74s), 6 in (since 82s)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 32 MiB used, 120 GiB / 120 GiB avail
pgs: 1 active+clean
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node1 'ps auxf | grep ceph-osd | grep -v grep'
ceph 31901 0.2 1.5 1032196 59156 ? Ssl 20:32 0:01 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
ceph 33478 0.2 1.5 1030084 58904 ? Ssl 20:36 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node2 'ps auxf | grep ceph-osd | grep -v grep'
ceph 31364 0.3 1.4 1030080 57068 ? Ssl 20:40 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
ceph 32955 0.3 1.5 1030084 58716 ? Ssl 20:41 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node3 'ps auxf | grep ceph-osd | grep -v grep'
ceph 31417 0.3 1.4 1030088 57200 ? Ssl 20:41 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
ceph 33012 0.3 1.4 998340 56368 ? Ssl 20:41 0:00 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
2.2.2 測試集群
要把對象存入 Ceph 存儲集群,客戶端必須做到:
- 指定對象名
- 指定存儲池
Ceph客戶端查出最新集群運行圖,用CRUSH算法計算出如何把對象映射到PG,然后動態(tài)地計算如何把PG分配到OSD。要定位對象,只需要對象名和存儲池名字即可,例如:
ceph osd map {poolname} {object-name}
為了測試集群的數(shù)據(jù)存取功能,這里首先創(chuàng)建一個用于測試的存儲池ydpool,并設(shè)定其PG數(shù)量為32個。
cephyd@ceph-deploy:~$ sudo ceph osd pool create ydpool 32 32
pool 'ydpool' created
cephyd@ceph-deploy:~$ sudo ceph pg ls-by-pool ydpool | awk '{print $1,$2,$15}'
cephyd@ceph-deploy:~$ sudo ceph osd pool ls
device_health_metrics
ydpool
cephyd@ceph-deploy:~$ sudo rados lspools
device_health_metrics
ydpool
目前ceph環(huán)境還沒部署使用塊設(shè)備和文件系統(tǒng),也沒有使用對象存儲的客戶端,但是ceph的rados命令可以實現(xiàn)訪問 ceph對象存儲的功能:
2.2.2.1 上傳文件
1. 上傳文件
把lastlog文件上傳到 mypool 并指定對象 id 為 msg1
cephyd@ceph-deploy:~$ sudo rados put msg1 lastlog --pool=ydpool
2. 列出文件
cephyd@ceph-deploy:~$ sudo rados ls --pool=ydpool
msg1
cephyd@ceph-deploy:~$ sudo rados -p ydpool ls
msg1
3. 查看文件信息
ceph osd map 命令可以獲取到存儲池中數(shù)據(jù)對象的具體位置信息:
cephyd@ceph-deploy:~$ sudo ceph osd map ydpool msg1
osdmap e48 pool 'ydpool' (2) object 'msg1' -> pg 2.c833d430 (2.10) -> up ([7,4,0], p7) acting ([7,4,0], p7)
表示文件放在了存儲池id為2的c833d430的PG上,10為當(dāng)前PG的id,2.10表示數(shù)據(jù)是在id為2的存儲池當(dāng)中id為10的 PG中存儲,在線的OSD編號7,4,0,主OSD為7, 活動的OSD7,4,0,三個OSD表示數(shù)據(jù)放一共3個副本,PG中的OSD是 ceph的crush算法計算出三份數(shù)據(jù)保存在哪些OSD。
2.2.2.2 下載文件
cephyd@ceph-deploy:~$ sudo rados get msg1 --pool=ydpool /tmp/yd.txt
cephyd@ceph-deploy:~$ ll -h /tmp/yd.txt
-rw-r--r-- 1 root root 18M Aug 15 21:21 /tmp/yd.txt
2.2.2.3 修改文件
cephyd@ceph-deploy:~$ sudo rados put msg1 lastlog --pool=ydpool
cephyd@ceph-deploy:~$ sudo rados get msg1 --pool=ydpool /tmp/yd2.txt
2.2.2.4 刪除文件
cephyd@ceph-deploy:~$ sudo rados rm msg1 --pool=ydpool
2.3 擴展ceph集群實現(xiàn)高可用
主要是擴展ceph集群的Monitor節(jié)點以及Manager節(jié)點,以實現(xiàn)集群高可用。
2.3.1 擴展ceph-mon節(jié)點
ceph-mon是原生具備自選舉以實現(xiàn)高可用機制的ceph服務(wù),節(jié)點數(shù)量通常是奇數(shù)。
添加節(jié)點:
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy mon add ceph-mon2
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy mon add ceph-mon3
驗證
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph -s
cluster:
id: 003cb89b-8812-4172-a327-6a774c687c6c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 7s)
mgr: ceph-mgr1(active, since 4h)
osd: 8 osds: 8 up (since 65m), 8 in (since 65m)
data:
pools: 2 pools, 33 pgs
objects: 0 objects, 0 B
usage: 251 MiB used, 160 GiB / 160 GiB avail
pgs: 33 active+clean
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph quorum_status
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph quorum_status --format json-pretty
2.3.2 擴展ceph-mgr節(jié)點
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy mgr create ceph-mgr2
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy admin ceph-mgr2
驗證
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph -s
cluster:
id: 003cb89b-8812-4172-a327-6a774c687c6c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 5m)
mgr: ceph-mgr1(active, since 4h), standbys: ceph-mgr2
osd: 8 osds: 8 up (since 71m), 8 in (since 71m)
data:
pools: 2 pools, 33 pgs
objects: 0 objects, 0 B
usage: 251 MiB used, 160 GiB / 160 GiB avail
pgs: 33 active+clean
2.3.2 從RADOS移除OSD
Ceph集群中的一個OSD是一個ceph-osd節(jié)點的服務(wù)進(jìn)程且對應(yīng)于一個物理磁盤設(shè)備,是一個專用的守護(hù)進(jìn)程。在OSD設(shè)備出現(xiàn)故障,或管理員出于管理之需確實要移除特定的OSD設(shè)備時,需要先停止相關(guān)的守護(hù)進(jìn)程,而后再進(jìn)行移除操作。
對于Luminous及其之后的版 本來說,停止和移除命令的格式分別如下所示:
- 停用設(shè)備:ceph osd out {osd-num}
- 停止進(jìn)程:sudo systemctl stop ceph-osd@{osd-num}
- 移除設(shè)備:ceph osd purge {id} --yes-i-really-mean-it
實戰(zhàn):
1. 查看osd
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd ls
0
1
2
3
4
5
6
7
2. 查看msg文件信息
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd map ydpool msg1
osdmap e48 pool 'ydpool' (2) object 'msg1' -> pg 2.c833d430 (2.10) -> up ([7,4,0], p7) acting ([7,4,0], p7)`
3. 移除osd 7后msg1會轉(zhuǎn)移并保持3副本
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd out 7
marked out osd.7.
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-node4 'sudo systemctl stop ceph-osd@7'
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd purge 7 --yes-i-really-mean-it
purged osd.7
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd ls
0
1
2
3
4
5
6
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd map ydpool msg1
osdmap e53 pool 'ydpool' (2) object 'msg1' -> pg 2.c833d430 (2.10) -> up ([6,1,4], p6) acting ([6,1,4], p6)
2.4 初步使用rbd塊設(shè)備
塊是一個字節(jié)序列(例如,一個512字節(jié)的數(shù)據(jù)塊)。
Ceph塊設(shè)備是精簡配置的、大小可調(diào)且將數(shù)據(jù)條帶化存儲到集群內(nèi)的多個OSD。Ceph塊設(shè)備利用RADOS的多種能力,如快照、復(fù)制和一致性。Ceph的RADOS塊設(shè)備(RBD)使用內(nèi)核模塊或librbd庫與OSD交互。

RBD(RADOS Block Devices)即為塊存儲的一種,RBD通過librbd庫與OSD進(jìn)行交互,RBD為KVM等虛擬化技術(shù)和云服務(wù)(如OpenStack和CloudStack)提供高性能和無限可擴展性的存儲后端,這些系統(tǒng)依賴于libvirt和QEMU實用程序與RBD進(jìn)行集成,客戶端基于librbd庫即可將RADOS存儲集群用作塊設(shè)備,不過,用于rbd的存儲池需要事先啟用rbd功能并進(jìn)行初始化。
2.4.1 創(chuàng)建RBD
創(chuàng)建一個名為ydrbd1的存儲池,并在啟用rbd功能后對其進(jìn)行初始化:
1.創(chuàng)建存儲池
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd pool create ydrbd1 64 64
pool 'ydrbd1' created
2. 開啟rbd功能
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph osd pool application enable ydrbd1 rbd
enabled application 'rbd' on pool 'ydrbd1'
3. 初始化
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd pool init -p ydrbd1
2.4.2 創(chuàng)建img
rbd存儲池不能直接用于塊設(shè)備,需要事先在其中按需創(chuàng)建映像(image),并把映像文件作為塊設(shè)備使用,rbd命令可用于創(chuàng)建、查看及刪除塊設(shè)備上在的映像(image),以及克隆映像、創(chuàng)建快照、將映像回滾到快照和查看快照等管理操作。
創(chuàng)建名為ydimg1和ydimg2的映像:
1. 創(chuàng)建
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd create ydimg1 --size 5G --pool ydrbd1
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd create ydimg2 --size 3G --pool ydrbd1 --image-format 2 --image-feature layering
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd ls --pool ydrbd1
ydimg1
ydimg2
# 后續(xù)步驟會使用ydimg2 ,由于centos系統(tǒng)內(nèi)核較低無法掛載使用,因此只開啟部分特性。除了layering其他特性需要高版本內(nèi)核支持
2. 查看信息
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd --image ydimg1 --pool ydrbd1 info
rbd image 'ydimg1':
size 5 GiB in 1280 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 12d8b31313b9
block_name_prefix: rbd_data.12d8b31313b9
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Mon Aug 16 12:19:41 2021
access_timestamp: Mon Aug 16 12:19:41 2021
modify_timestamp: Mon Aug 16 12:19:41 2021
cephyd@ceph-deploy:~/ceph-cluster$ sudo rbd --image ydimg2 --pool ydrbd1 info
rbd image 'ydimg2':
size 3 GiB in 768 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 37b4e008baeb
block_name_prefix: rbd_data.37b4e008baeb
format: 2
features: layering
op_features:
flags:
create_timestamp: Mon Aug 16 12:20:15 2021
access_timestamp: Mon Aug 16 12:20:15 2021
modify_timestamp: Mon Aug 16 12:20:15 2021
2.4.3 客戶端使用塊存儲
1. 查看cep狀態(tài)
cephyd@ceph-deploy:~/ceph-cluster$ sudo ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 120 GiB 120 GiB 192 MiB 192 MiB 0.16
TOTAL 120 GiB 120 GiB 192 MiB 192 MiB 0.16
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 8 0 B 0 38 GiB
ydpool 2 32 11 MiB 1 34 MiB 0.03 38 GiB
ydrbd1 3 64 405 B 7 48 KiB 0 38 GiB
2, 安裝ceph并同步認(rèn)證信息
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy install --no-adjust-repos --nogpgcheck ceph-client1
cephyd@ceph-deploy:~/ceph-cluster$ ceph-deploy admin ceph-client1
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-client1 'sudo apt install acl -y && sudo setfacl -m u:ceph:rw /etc/ceph/ceph.client.admin.keyring'
3. 客戶端映射
cephyd@ceph-deploy:~/ceph-cluster$ ssh ceph-client1
cephyd@ceph-client1:~$ sudo rbd -p ydrbd1 map ydimg2
/dev/rbd0
cephyd@ceph-client1:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
rbd0 251:0 0 3G 0 disk
rbd1 251:16 0 3G 0 disk
vda 252:0 0 40G 0 disk
└─vda1 252:1 0 40G 0 part /
cephyd@ceph-client1:~$ sudo fdisk -l /dev/rbd0
4. 格式化并掛載
cephyd@ceph-client1:~$ sudo mkfs.ext4 -m0 /dev/rbd0
cephyd@ceph-client1:~$ sudo mkdir /data
cephyd@ceph-client1:~$ sudo mount /dev/rbd0 /data/
cephyd@ceph-client1:~$ sudo cp /var/log/lastlog /data
cephyd@ceph-client1:~$ sudo df -h
Filesystem Size Used Avail Use% Mounted on
udev 922M 0 922M 0% /dev
tmpfs 189M 2.9M 187M 2% /run
/dev/vda1 40G 4.2G 34G 12% /
tmpfs 945M 0 945M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 945M 0 945M 0% /sys/fs/cgroup
tmpfs 189M 0 189M 0% /run/user/2022
/dev/rbd0 2.9G 9.1M 2.9G 1% /data
5. 驗證
cephyd@ceph-client1:~$ sudo dd if=/dev/zero of=/data/ceph-test-file bs=1MB count=300
300+0 records in
300+0 records out
300000000 bytes (300 MB, 286 MiB) copied, 0.492302 s, 609 MB/s
cephyd@ceph-client1:~$ df -h
cephyd@ceph-client1:~$ ll -h /data/
total 287M
drwxr-xr-x 3 root root 4.0K Aug 16 12:43 ./
drwxr-xr-x 23 root root 4.0K Aug 16 12:41 ../
-rw-r--r-- 1 root root 287M Aug 16 12:43 ceph-test-file
-rw-r--r-- 1 root root 18M Aug 16 12:41 lastlog
drwx------ 2 root root 16K Aug 16 12:40 lost+found/
cephyd@ceph-client1:~$ sudo ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 120 GiB 118 GiB 2.4 GiB 2.4 GiB 2.01
TOTAL 120 GiB 118 GiB 2.4 GiB 2.4 GiB 2.01
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 8 0 B 0 37 GiB
ydpool 2 32 11 MiB 1 34 MiB 0.03 37 GiB
ydrbd1 3 64 352 MiB 104 1.0 GiB 0.92 37 GiB