MongoDB WiredTiger引擎調(diào)優(yōu)技巧
- 調(diào)優(yōu)Cache Size
WiredTiger最重要的調(diào)優(yōu)參數(shù)就是cache規(guī)模。默認(rèn),MongoDB從3.x開始會保留可用物理內(nèi)存的50%(3.2是60%)作為數(shù)據(jù)cache。雖然,默認(rèn)的設(shè)置可以應(yīng)對大部分的應(yīng)用,通過調(diào)節(jié)為特定應(yīng)用找到最佳配置值還是非常值得的。cache的規(guī)模必須足夠大,以便保存應(yīng)用整個(gè)工作集(working set)。
除了這個(gè)cache,MongoDB在做諸如聚合、排序、連接管理等操作時(shí)需要額外的內(nèi)存。因此,必須確保有足夠的內(nèi)存可供使用,否則,MongoDB進(jìn)程有被OOM killer殺死的風(fēng)險(xiǎn)。
調(diào)節(jié)這個(gè)參數(shù),首先要理解在默認(rèn)配置下,cache的使用情況。運(yùn)行以下命令,可以獲得cache統(tǒng)計(jì):
db.serverStatus().wiredTiger.cache
{
"tracked dirty bytes in the cache" : 409861,
"tracked bytes belonging to internal pages in the cache" : 738956332,
"bytes currently in the cache" : 25769360777,
"tracked bytes belonging to leaf pages in the cache" : 31473298388,
"maximum bytes configured" : 32212254720,
"tracked bytes belonging to overflow pages in the cache" : 0,
"bytes read into cache" : 29628550664,
"bytes written from cache" : 34634778285,
"pages evicted by application threads" : 0,
"checkpoint blocked page eviction" : 102,
"unmodified pages evicted" : 333277,
"page split during eviction deepened the tree" : 0,
"modified pages evicted" : 437117,
"pages selected for eviction unable to be evicted" : 44825,
"pages evicted because they exceeded the in-memory maximum" : 74,
"pages evicted because they had chains of deleted items" : 33725,
"failed eviction of pages that exceeded the in-memory maximum" : 1518,
"hazard pointer blocked page eviction" : 34814,
"internal pages evicted" : 21623,
"maximum page size at eviction" : 10486876,
"eviction server candidate queue empty when topping up" : 8235,
"eviction server candidate queue not empty when topping up" : 3020,
"eviction server evicting pages" : 191708,
"eviction server populating queue, but not evicting pages" : 2996,
"eviction server unable to reach eviction goal" : 0,
"pages split during eviction" : 8821,
"pages walked for eviction" : 157970002,
"eviction worker thread evicting pages" : 563015,
"in-memory page splits" : 52,
"percentage overhead" : 8,
"tracked dirty pages in the cache" : 9,
"pages currently held in the cache" : 1499798,
"pages read into cache" : 2260232,
"pages written from cache" : 3018846
}
第一個(gè)要關(guān)注的數(shù)值試,`cache`中臟數(shù)據(jù)的百分比。如果這個(gè)百分比比較高,那么調(diào)大`cache`規(guī)模很有可能可以提升性能。如果應(yīng)用是重讀的,可再關(guān)注`bytes read into cache`這個(gè)指標(biāo)。如果這個(gè)指標(biāo)比較高,那么調(diào)大`cache`規(guī)模很有可能可以提升讀性能。
- 動態(tài)調(diào)整WiredTiger
db.adminCommand( { "setParameter": 1, "wiredTigerEngineRuntimeConfig": "cache_size=xxG"})
- 控制Read/Write Tickets
WiredTiger使用tickets來控制可以同時(shí)被存儲引擎處理的讀/寫操作數(shù)。默認(rèn)值是128,在大部分情況下表現(xiàn)良好。如果這個(gè)值經(jīng)常掉到0,所有后續(xù)操作將會被排隊(duì)等待。例如,觀察到讀tickets下降,系統(tǒng)可能有大量長耗時(shí)的操作(未索引操作)。如果你想找出有哪些慢操作,可以用一些第三方工具。你可以根據(jù)系統(tǒng)需要和性能影響上下調(diào)節(jié)tickets。
運(yùn)行以下命令可以確認(rèn)tickets的使用情況:
db.serverStatus().wiredTiger.concurrentTransactions
{
"write" : {
"out" : 0,
"available" : 128,
"totalTickets" : 128
},
"read" : {
"out" : 3,
"available" : 128,
"totalTickets" : 128
}
}
- 可以動態(tài)調(diào)節(jié)
tickets:
db.adminCommand( { setParameter: 1, wiredTigerConcurrentReadTransactions: xx } )
db.adminCommand( { setParameter: 1, wiredTigerConcurrentWriteTransactions: xx } )
一次Mongodb異常宕機(jī)崩潰
- mongos 退出時(shí)的日志如下表所示,報(bào) “Failed to mlock: Cannot allocate memory” 錯(cuò)誤。同時(shí),分析故障發(fā)生時(shí)分片集群中多個(gè)主節(jié)點(diǎn)的日志,發(fā)現(xiàn)都有多個(gè)連接在執(zhí)行耗時(shí)較長的查詢操作。
2017-12-14T18:01:43.047+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-19-0] Successfully connected to 192.168.1.100:27017, took 291ms (10 connections now open to 192.168.1.100:27017)
2017-12-14T18:01:43.048+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-19-0] Connecting to 192.168.1.200:27017
2017-12-14T18:01:43.048+0800 I ASIO [NetworkInterfaceASIO-TaskExecutorPool-19-0] Connecting to 192.168.1.200:27017
2017-12-14T18:01:43.050+0800 F - [NetworkInterfaceASIO-TaskExecutorPool-22-0] Failed to mlock: Cannot allocate memory
2017-12-14T18:01:43.050+0800 I - [NetworkInterfaceASIO-TaskExecutorPool-22-0] Fatal Assertion 28832 at src/mongo/base/secure_allocator.cpp 246
2017-12-14T18:01:43.050+0800 I - [NetworkInterfaceASIO-TaskExecutorPool-22-0] …
2017-12-14T18:01:43.054+0800 F - [NetworkInterfaceASIO-TaskExecutorPool-22-0] Got signal: 6 (Aborted).
- 故障分析已經(jīng)重現(xiàn)
初步分析觸發(fā)故障的原因是大量慢查詢導(dǎo)致了 mongos 與 mongod 之間的連接長時(shí)間沒有釋放,使得連接池中的連接無法復(fù)用,開始新建連接,而新建連接需要分配一定的鎖定內(nèi)存。mongos 需要分配的鎖定內(nèi)存超過 max locked memory 的限制,于是程序崩潰退出。max locked memory 的默認(rèn)值如下圖所示是 64KB(這個(gè)限制是對整個(gè)進(jìn)程的,而且限制只對普通用戶起作用,對 root 用戶不起作用).
[mongodb@mongodb4 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63717
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 59999
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 59999
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
- 解決方法
1、調(diào)整 max locked memory 的大小。根據(jù)之前的分析發(fā)現(xiàn)是 mongos 對鎖定內(nèi)存的分配超出了操作系統(tǒng)的限制,所以可以通過調(diào)整該限制來解決問題。官方在修復(fù) bug 之前,也是推薦增加 max locked memory 的限制。調(diào)整方式如下:ulimit -l 96,該命令將限制值設(shè)置為 96KB。具體調(diào)整為多少官方?jīng)]有說明,之后分析源碼發(fā)現(xiàn)每一個(gè)連接占用的鎖定內(nèi)存是 60 個(gè)字節(jié),但鎖定內(nèi)存不僅僅在此處用到。所以如果每個(gè)連接的請求都發(fā)生慢查詢,那么 64KB 理論上支持的最大連接數(shù)是 1092 個(gè)。
2、升級到 3.4.6 及以上版本。當(dāng)時(shí)線上 mongos 和 mongod 版本是 3.4.4,我們將 mongos 和 mongod 升級到 3.4.9,使用相同的方式進(jìn)行測試,發(fā)現(xiàn)不會發(fā)生類似故障。
MongoDB 官網(wǎng)有 issue 提到這個(gè)問題 SERVER-28997,這是 3.4.4 版本的 bug,而我們當(dāng)時(shí)線上環(huán)境使用的正好也是 3.4.4 版本,之后 3.4.6 版本已經(jīng)修復(fù)。官方描述是:
SasLTCHa1clipse會話有一個(gè)將從緩存中拔出的加密。
RADIONTS在默認(rèn)構(gòu)造函數(shù)中分配安全存儲,因此可以填充它們。
相反,SASLTCHAK1clitTalk和緩存應(yīng)該將SysDypRTs存儲到SCRAMSecret。
生產(chǎn)環(huán)境shard配置文件分享
logpath=/home/mongodb/log/shard11.log
pidfilepath=/home/mongodb/shard11.pid
logappend=true
bind_ip=192.168.128.10
port=27017
fork=true
replSet=shard1
dbpath=/home/mongodb/data/shard11
oplogSize=10000
noprealloc=true
shardsvr=true
directoryperdb=true
storageEngine = wiredTiger
wiredTigerCacheSizeGB = 3
syncdelay = 30
wiredTigerCollectionBlockCompressor = snappy
Mongo日志自動分卷腳本
- mongos_log.sh 可以吧mongos、config都添加上,然后在添加上linux 的定時(shí)任務(wù)。則每天定時(shí)日志分卷
[mongodb@mongodb4 ~]$ more mongos_log.sh
/home/mongodb/mongodb_3.2.12/bin/mongo 192.168.128.19:27021/admin <<!
db.runCommand( {logRotate:1});
!
/home/mongodb/mongodb_3.2.12/bin/mongo 192.168.128.18:27022/admin <<!
db.runCommand( {logRotate:1});
!
/home/mongodb/mongodb_3.2.12/bin/mongo 192.168.128.19:27022/admin <<!
db.runCommand( {logRotate:1});
錯(cuò)誤的Mongodb分片的選擇
-
Mongodb運(yùn)行現(xiàn)狀
- mongodb集群是3臺服務(wù)器三個(gè)副本集群,每臺服務(wù)器配置mongos、confi、mongod(主、從、仲裁)5個(gè)進(jìn)程
場景說明
近期線上Mongodb服務(wù)器某一臺服務(wù)器負(fù)載情況特別高,單個(gè)集合shard主節(jié)點(diǎn)占用CPU負(fù)載很高,使用mongotop 和mongostat命令分析,發(fā)現(xiàn)某個(gè)副本集合,要比其他兩個(gè)集合要操作頻繁,而從影響線上數(shù)據(jù)入庫導(dǎo)致延遲。
-
原因
- 是因?yàn)楫?dāng)初創(chuàng)建表時(shí),使用的start_time來作為片鍵使用的是hash,也就導(dǎo)致入庫hash到某個(gè)集合時(shí),頻繁操作入庫都是在這個(gè)shard集群上面,也就導(dǎo)致服務(wù)器負(fù)載壓力特別高
-
解決方法
- 使用多字段創(chuàng)建分鍵,先導(dǎo)出數(shù)據(jù)、然后刪除表、創(chuàng)建分片、在導(dǎo)入數(shù)據(jù)。