MySQL MGR 故障切換及節(jié)點(diǎn)管理

MySQL MGR 節(jié)點(diǎn)管理及故障切換

文章來(lái)源: 陶老師運(yùn)維筆記- 微信公眾號(hào)

mgr

MySQL Group Replication(MGR)框架讓MySQL具備了自動(dòng)主從切換和故障恢復(fù)能力。

本文介紹節(jié)點(diǎn)管理及故障切換:

  • 1)MGR節(jié)點(diǎn)管理及故障切換相關(guān)理論知識(shí)
  • 2)測(cè)試在MGR單主模式下,master故障后,集群恢復(fù)情況;MGR集群添加/刪除節(jié)點(diǎn)方法。

1. 測(cè)試環(huán)境

1.1 環(huán)境規(guī)劃如下

角色 IP port server-id
DB-1 192.110.103.41 3106 103413106
DB-2 192.110.103.42 3106 103423106
DB-3 192.110.103.43 3106 103433106

說(shuō)明:機(jī)器上port 3306已在使用被占了,只好改用3106。

1.2 安裝MGR

見(jiàn)上節(jié)的MGR安裝步驟。MySQL5.7 MGR安裝及介紹

2. MGR介紹

2.1 MGR簡(jiǎn)介

MySQL Group Replication(簡(jiǎn)稱(chēng)MGR)是MySQL5.7版本出現(xiàn)的新特性,提供高可用、高擴(kuò)展、高可靠(強(qiáng)一致性)的MySQL集群服務(wù)。
MGR由多個(gè)實(shí)例節(jié)點(diǎn)共同組成一個(gè)數(shù)據(jù)庫(kù)集群,系統(tǒng)提交事務(wù)必須經(jīng)過(guò)半數(shù)以上節(jié)點(diǎn)同意方可提交,在集群中每個(gè)節(jié)點(diǎn)上都維護(hù)一個(gè)數(shù)據(jù)庫(kù)狀態(tài)機(jī),保證節(jié)點(diǎn)間事務(wù)的一致性.

2.2 單主/多主模式

MySQL的組復(fù)制可配置為單主模型和多主模型兩種工作模式。
以下是兩種模式簡(jiǎn)介:

  • 單主模型: 從復(fù)制組中多個(gè)MySQL節(jié)點(diǎn)中自動(dòng)選舉一個(gè)master節(jié)點(diǎn),只有master節(jié)點(diǎn)可以寫(xiě),其他節(jié)點(diǎn)自動(dòng)設(shè)置為read only。當(dāng)master節(jié)點(diǎn)故障時(shí),會(huì)自動(dòng)選舉一個(gè)新的master節(jié)點(diǎn),選舉成功后,它將設(shè)置為可寫(xiě),其他slave將指向這個(gè)新的master。
  • 多主模型: 復(fù)制組中的任何一個(gè)節(jié)點(diǎn)都可以寫(xiě),因此沒(méi)有master和slave的概念,只要突然故障的節(jié)點(diǎn)數(shù)量不太多,這個(gè)多主模型就能繼續(xù)可用。

MySQL組復(fù)制使用Paxos分布式算法來(lái)提供節(jié)點(diǎn)間的分布式協(xié)調(diào)。正因如此,它要求組中大多數(shù)節(jié)點(diǎn)在線(xiàn)才能達(dá)到法定票數(shù),從而對(duì)一個(gè)決策做出一致的決定。

大多數(shù)指的是N/2+1(N是組中目前節(jié)點(diǎn)總數(shù)),例如目前組中有5個(gè)節(jié)點(diǎn),則需要3個(gè)節(jié)點(diǎn)才能達(dá)到大多數(shù)的要求。所以,允許出現(xiàn)故障的節(jié)點(diǎn)數(shù)量如下圖:

組大小 大多數(shù)數(shù)量 故障容忍數(shù)量
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2

2.3 配置說(shuō)明

[mysqld]
datadir=/data
socket=/data/mysql.sock

server-id=100                      # 必須
gtid_mode=on                       # 必須
enforce_gtid_consistency=on        # 必須
log-bin=/data/master-bin           # 必須
binlog_format=row                  # 必須
binlog_checksum=none               # 必須
master_info_repository=TABLE       # 必須
relay_log_info_repository=TABLE    # 必須
relay_log=/data/relay-log          # 必須,如果不給,將采用默認(rèn)值
log_slave_updates=ON               # 必須
sync-binlog=1                      # 建議
log-error=/data/error.log
pid-file=/data/mysqld.pid

transaction_write_set_extraction=XXHASH64         # 必須
loose-group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"  # 必須
loose-group_replication_start_on_boot=off        # 建議設(shè)置為OFF
loose-group_replication_member_weigth = 40   # 非必需,mysql 5.7.20才開(kāi)始支持該選項(xiàng)
loose-group_replication_local_address = "192.110.103.41:31061"  #必須,下一行也必須
loose-group_replication_group_seeds = "192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061"
loose-group_replication_bootstrap_group = OFF
loose-group_replication_single_primary_mode = FALSE # = multi-primary
loose-group_replication_enforce_update_everywhere_checks=ON # = multi-primary

分析一下上面的配置選項(xiàng):

  • 1).因?yàn)榻M復(fù)制基于GTID,所以必須開(kāi)啟gtid_mode和enforce_gtid_consistency。
  • 2).組復(fù)制必須設(shè)置log_bin和binlog_format。這樣才能從日志記錄中收集信息且保證數(shù)據(jù)一致性。
  • 3).由于MySQL對(duì)復(fù)制事件校驗(yàn)的設(shè)計(jì)缺陷,組復(fù)制不能對(duì)他們校驗(yàn),所以設(shè)置binlog_checksum=none。
  • 4).組復(fù)制要將master和relay log的元數(shù)據(jù)寫(xiě)入到mysql.slave_master_info和mysql.slave_relay_log_info中。
  • 5).組中的每個(gè)節(jié)點(diǎn)都保留了完整的數(shù)據(jù)副本,它是share-nothing的模式。所以所有節(jié)點(diǎn)上都必須開(kāi)啟log_slave_updates,這樣新節(jié)點(diǎn)隨便選哪個(gè)作為donor都可以進(jìn)行異步復(fù)制。
  • 6).sync_binlog=1是為了保證每次事務(wù)提交都立刻將binlog刷盤(pán),故障也不丟失日志。
  • 7).最后的6行是組復(fù)制插件的配置。以loose_開(kāi)頭表示即使啟動(dòng)組復(fù)制插件,MySQL也繼續(xù)正常允許下去。這個(gè)前綴是可選的。
  • 8).倒數(shù)第6行表示寫(xiě)集合以XXHASH64的算法進(jìn)行hash。所謂寫(xiě)集,是對(duì)事務(wù)中所修改的行進(jìn)行的唯一標(biāo)識(shí),在后續(xù)檢測(cè)并發(fā)事務(wù)之間是否修改同一行沖突時(shí)使用。它基于主鍵生成,所以使用組復(fù)制,表中必須要有主鍵。
  • 9).倒數(shù)第5行表示這個(gè)復(fù)制組的名稱(chēng)。它必須是一個(gè)有效的UUID值。在Linux下,可以使用uuidgen工具來(lái)生成UUID值。

3. 組復(fù)制管理

3.1 視圖及視圖切換

MGR以組視圖(Group View,簡(jiǎn)稱(chēng)視圖)為基礎(chǔ)來(lái)進(jìn)行成員管理。視圖指Group在一段時(shí)間內(nèi)的成員狀態(tài),如果在這段時(shí)間內(nèi)沒(méi)有成員變化,也就是說(shuō)沒(méi)有成員加入或退出,則這段連續(xù)的時(shí)間為一個(gè)視圖,如果發(fā)生了成員加入或退出變化,則視圖也就發(fā)生了變化,MGR使用視圖ID(View ID)來(lái)跟蹤視圖的變化并區(qū)分視圖的先后時(shí)間。

select * from performance_schema.replication_group_member_stats\G ;
*************************** 1. row ***************************
                      CHANNEL_NAME: group_replication_applier
                           VIEW_ID: 15718289704352993:39
                         MEMBER_ID: 509810ee-f3d7-11e9-a7d5-a0369fac2de4
       COUNT_TRANSACTIONS_IN_QUEUE: 0
        COUNT_TRANSACTIONS_CHECKED: 10
          COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-60:1000003-1000006:2000003-2000006
    LAST_CONFLICT_FREE_TRANSACTION: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:58

Group視圖ViewID由前綴時(shí)間戳+序號(hào)部分組成。

  • 前綴部分:是在這個(gè)Group初始化時(shí)產(chǎn)生,為當(dāng)時(shí)的時(shí)間戳,Group存活期間該值不變。
  • 序號(hào)部分:Group初始化時(shí),第一個(gè)視圖序號(hào)為1,以后任何成員加入或退出序號(hào)都增一。
#viewid:15718289704352993:39
select from_unixtime(FLOOR(15718289704352993/10000000),'%Y-%m-%d %h:%m:%s');   
+----------------------------------------------------------------------+
| from_unixtime(FLOOR(15718289704352993/10000000),'%Y-%m-%d %h:%m:%s') |
+----------------------------------------------------------------------+
| 2019-10-23 07:10:30                                                  |
+----------------------------------------------------------------------+

3.2 節(jié)點(diǎn)加入過(guò)程

使用網(wǎng)易云的MGR圖來(lái)介紹過(guò)程:


image

一個(gè)節(jié)點(diǎn)請(qǐng)求加入Group時(shí),其首先會(huì)根據(jù)配置的group_replication_group_seeds參數(shù)跟Group的種子成員建立TCP連接 (Gcs_xcom_control::do_join())。該種子成員會(huì)根據(jù)自己的group_replication_ip_whitelist(ip白名單)檢查是否允許新節(jié)點(diǎn)加入,MGR默認(rèn)不限制新節(jié)點(diǎn)的ip。連接建立后,新節(jié)點(diǎn)發(fā)送請(qǐng)求申請(qǐng)加入組。
收到請(qǐng)求后,種子成員廣播視圖變化的消息給Group中的所有節(jié)點(diǎn),包括申請(qǐng)加入的節(jié)點(diǎn),如右上所示;各節(jié)點(diǎn)收到消息后開(kāi)始做視圖切換。每個(gè)節(jié)點(diǎn)都會(huì)廣播一個(gè)狀態(tài)交換消息,每個(gè)交換消息包含了節(jié)點(diǎn)的當(dāng)前狀態(tài)和信息,如圖左下所示。發(fā)送了交換消息后,各個(gè)節(jié)點(diǎn)開(kāi)始接收其他節(jié)點(diǎn)廣播的消息,將其中的節(jié)點(diǎn)信息更新到本節(jié)點(diǎn)所維護(hù)的成員列表中。

完成視圖切換只是成員加入Group要做的第一步,只是說(shuō)明該成員可以接收到Group中通過(guò)Paxos協(xié)議達(dá)成共識(shí)的消息,并不意味著可以將成員設(shè)置為ONLINE(上線(xiàn))對(duì)外提供服務(wù)。原因是新成員還需要進(jìn)行數(shù)據(jù)同步,建立起正確的數(shù)據(jù)版本(recovery_module->start_recovery)。之后才能執(zhí)行Paxos協(xié)議消息,進(jìn)而上線(xiàn)提供正常的用戶(hù)訪(fǎng)問(wèn)服務(wù)。

3.3 組復(fù)制管理語(yǔ)句

常用操作組復(fù)制的語(yǔ)句。

SELECT * FROM performance_schema.replication_group_members;
#查看master
SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member';
>SHOW STATUS LIKE 'group_replication_primary_member';

start group_replication;
stop group_replication;

當(dāng)要停止組中的某個(gè)成員中的組復(fù)制功能時(shí),可在那個(gè)節(jié)點(diǎn)上執(zhí)行stop group_replication語(yǔ)句。

但一定要注意,在執(zhí)行這個(gè)語(yǔ)句之前,必須要保證這個(gè)節(jié)點(diǎn)不會(huì)向外提供MySQL服務(wù),否則有可能會(huì)有新數(shù)據(jù)寫(xiě)入(例如主節(jié)點(diǎn)停止時(shí)),或者讀取到過(guò)期數(shù)據(jù)。

故要安全地重啟整個(gè)組,最佳方法是先停止所有非主節(jié)點(diǎn)的MySQL實(shí)例(不僅是停止組復(fù)制功能),然后停止主節(jié)點(diǎn)的MySQL實(shí)例。啟動(dòng)次序是先重啟主節(jié)點(diǎn),在這個(gè)節(jié)點(diǎn)上引導(dǎo)組,并啟動(dòng)它的組復(fù)制功能,最后再將各slave節(jié)點(diǎn)加入組。

組復(fù)制中,有兩種節(jié)點(diǎn)離組的情況:自愿離組、非自愿離組。

自愿離組:

執(zhí)行stop group_replication;語(yǔ)句。

  • 1).執(zhí)行該語(yǔ)句表示該節(jié)點(diǎn)自愿離組,它會(huì)觸發(fā)視圖自動(dòng)配置,并將該視圖更改操作復(fù)制到組內(nèi)所有節(jié)點(diǎn),直到大多數(shù)節(jié)點(diǎn)都同意新的視圖配置,該節(jié)點(diǎn)才會(huì)離組。
  • 2).節(jié)點(diǎn)自愿離組時(shí),不會(huì)丟失法定票數(shù)。所以無(wú)論多少節(jié)點(diǎn)自愿離組,都不會(huì)出現(xiàn)"達(dá)不到大多數(shù)"的要求而阻塞組。
  • 3).舉個(gè)例子,5個(gè)節(jié)點(diǎn)的組,自愿退出一個(gè)節(jié)點(diǎn)A后,這個(gè)組的大小為4。這個(gè)組認(rèn)為節(jié)點(diǎn)A從來(lái)都沒(méi)有出現(xiàn)過(guò)。

非自愿離組:

除了上面自愿離組的情況,所有離組的情況都是非自愿離組。比如節(jié)點(diǎn)宕機(jī),斷網(wǎng)等等。

1).節(jié)點(diǎn)非自愿離組時(shí),故障探測(cè)機(jī)制會(huì)檢測(cè)到這個(gè)問(wèn)題,于是向組中報(bào)告這個(gè)問(wèn)題。然后會(huì)觸發(fā)組視圖成員自動(dòng)配置,需要大多數(shù)節(jié)點(diǎn)同意新視圖。

2).非自愿離組時(shí),組的大小不會(huì)改變,無(wú)論多少個(gè)節(jié)點(diǎn)的組,節(jié)點(diǎn)非自愿退出后,組大小還是5,只不過(guò)這些離組的節(jié)點(diǎn)被標(biāo)記為非ONLINE。

3).非自愿離組時(shí),會(huì)丟失法定票數(shù)。 所以,當(dāng)非自愿離組節(jié)點(diǎn)數(shù)量過(guò)多時(shí),導(dǎo)致組中剩余節(jié)點(diǎn)數(shù)量達(dá)不到大多數(shù)的要求,組就會(huì)被阻塞。

4).舉個(gè)例子,5節(jié)點(diǎn)的組,非自愿退出1個(gè)節(jié)點(diǎn)A后,這個(gè)組的大小還是5,但是節(jié)點(diǎn)A在新的視圖中被標(biāo)記為unreachable或其他狀態(tài)。當(dāng)繼續(xù)非自愿退出2個(gè)節(jié)點(diǎn)后,組中只剩下2個(gè)ONLINE節(jié)點(diǎn),這時(shí)達(dá)不到大多數(shù)的要求,組就會(huì)被阻塞。

4. MGR故障恢復(fù)

先簡(jiǎn)單介紹下MGR所涉及的數(shù)據(jù)復(fù)制及其通道(channel),MGR正常運(yùn)行時(shí),即Group中各節(jié)點(diǎn)均在線(xiàn),節(jié)點(diǎn)間通過(guò)Paxos協(xié)議傳輸數(shù)據(jù),異地事務(wù)經(jīng)過(guò)認(rèn)證后寫(xiě)入group_replication_applier Relay log中,由group_replication_applier通道的復(fù)制線(xiàn)程負(fù)責(zé)回放。

當(dāng)有節(jié)點(diǎn)加入Group時(shí),需要用到另一個(gè)復(fù)制通道group_replication_recovery,它是一個(gè)傳統(tǒng)的Master-Slave異步復(fù)制通道。

我們可以把節(jié)點(diǎn)加入的過(guò)程劃分為2個(gè)階段,分為前一個(gè)視圖數(shù)據(jù)恢復(fù)和本視圖緩存事務(wù)執(zhí)行。第一個(gè)階段又可以細(xì)分為本地恢復(fù)和全局恢復(fù)。

4.1 視圖數(shù)據(jù)恢復(fù)

1)本地恢復(fù)階段

第一步:進(jìn)行故障恢復(fù)初始化,包括故障恢復(fù)線(xiàn)程初始化,Group成員信息初始化等。

第二步:?jiǎn)?dòng)group_replication_applier復(fù)制通道。若不是新節(jié)點(diǎn)將這部分Relay log回放掉。

2)全局恢復(fù)

第三步:完成本地Relay log回放后,進(jìn)入故障恢復(fù)第三步,即State transfer是通過(guò)group_replication_recovery復(fù)制通道從Group其他在線(xiàn)節(jié)點(diǎn)拉取本節(jié)點(diǎn)欠缺的非本視圖的數(shù)據(jù)。與傳統(tǒng)的Master-Slave復(fù)制配置不同,MGR中僅需為其配置賬號(hào)和密碼即可,配置方式形如:CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_pass' FOR CHANNEL 'group_replication_recovery'。顯然,該通道是基于GTID以MASTER_AUTO_POSITION的方式進(jìn)行?;謴?fù)線(xiàn)程隨機(jī)選擇一個(gè)在線(xiàn)節(jié)點(diǎn)(donor),調(diào)用rpl_slave.cc中的request_dump函數(shù)建立與donor的復(fù)制關(guān)系,該函數(shù)攜帶了恢復(fù)節(jié)點(diǎn)的gtid_executed信息。donor端會(huì)逆序遍歷其Binlog文件,通過(guò)判斷Binlog文件起始的Previous-GTIDs來(lái)找到第一個(gè)不屬于gtid_executed的事務(wù),從該事務(wù)開(kāi)始進(jìn)行數(shù)據(jù)復(fù)制。

4.2 緩存事務(wù)執(zhí)行及退出恢復(fù)

3)緩存事務(wù)執(zhí)行

第四步:先喚醒在第二步被阻塞的認(rèn)證隊(duì)列處理線(xiàn)程,讓本節(jié)點(diǎn)逐步跟上Group中的其他節(jié)點(diǎn),進(jìn)而將其設(shè)置為在線(xiàn)(ONLINE)。

第五步:在達(dá)到了group_replication_recovery_complete_at所確定的條件后,發(fā)送Recovery_message::RECOVERY_END_MESSAGE消息,通知Group中的各個(gè)節(jié)點(diǎn),將該節(jié)點(diǎn)設(shè)置為在線(xiàn)狀態(tài)。

第六步:在故障恢復(fù)全過(guò)程中,若遇到了無(wú)法繼續(xù)的情況或錯(cuò)誤則先將該節(jié)點(diǎn)置為ERROR狀態(tài),確認(rèn)該節(jié)點(diǎn)復(fù)制相關(guān)線(xiàn)程退出了,節(jié)點(diǎn)向Group發(fā)送leave信息退出Group,而不是置為ERROR狀態(tài)候仍留在Group中。

第七步:不管是否出錯(cuò),均需要重置故障恢復(fù)相關(guān)的參數(shù),并銷(xiāo)毀故障恢復(fù)線(xiàn)程。

5. MGR節(jié)點(diǎn)管理(增/刪節(jié)點(diǎn))

前面都是MGR理論知識(shí),若沒(méi)有時(shí)間可跳過(guò)閱讀,直接依本節(jié)實(shí)踐操作。

5.1 組中節(jié)點(diǎn)刪除

組中節(jié)點(diǎn)刪除較簡(jiǎn)單,執(zhí)行stop group_replication語(yǔ)句,即可。

select * from performance_schema.replication_group_member_stats\G ;
select * from performance_schema.replication_group_members;
-- 停止
stop group_replication;
select * from performance_schema.replication_group_members;

5.2 節(jié)點(diǎn)加入MGR集群

5.2.1 原MGR節(jié)點(diǎn)短時(shí)中斷,恢復(fù)同步

若是以前的MGR某節(jié)點(diǎn)關(guān)停了,然后服務(wù)恢復(fù)了。通常只要start group_replication如下:

show global variables like '%seed%';
show global variables like '%group_repli%';

-- 啟動(dòng)組復(fù)制
set global group_replication_enforce_update_everywhere_checks=OFF;
set global group_replication_single_primary_mode=ON;
START GROUP_REPLICATION;
select * from performance_schema.replication_group_members;

5.2.2 全新節(jié)點(diǎn)加入

1)安裝MySQL軟件

安裝MySQL軟件,詳細(xì)步驟見(jiàn)官方參考。

#建立空的DB實(shí)例目錄結(jié)構(gòu)
mkdir -p /data1/mysql_3106/{data,etc,logs,tmp}
vim  /data1/mysql_3106/etc/my.cnf #

# Group Replication parameter
plugin_load_add='group_replication.so'
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot = OFF
loose-group_replication_local_address = "192.110.103.42:31061"
loose-group_replication_group_seeds = "192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061"
loose-group_replication_bootstrap_group = OFF
loose-group_replication_single_primary_mode = TRUE # FALSE : multi-primary
loose-group_replication_enforce_update_everywhere_checks= FALSE # ON : multi-primary

2)備份master數(shù)據(jù)

#授權(quán)
#set GLOBAL super_read_only=0;
#grant all on *.* to backup@'127.0.0.1' identified by 'backup';
#備份
backup_path=/data1/mysqlbackup/`date +%Y%m%d`/mysql_3106
mkdir -p $backup_path
innobackupex --defaults-file=/data1/mysql_3106/etc/my.cnf -H 127.0.0.1 -P 3106 --user=backup --password=backup --parallel=4 --throttle=4 --no-timestamp --slave-info   $backup_path/ >> $backup_path/mybackup.log 2>&1 &
#應(yīng)用
innobackupex --defaults-file=/data1/mysql_3106/etc/my.cnf --apply-log $backup_path
cp /data1/mysql_3106/etc/my.cnf $backup_path/

3)傳輸備份數(shù)據(jù)

scp -pr $backup_path 192.110.103.42:/data1/mysql_3106_backup
或用nc傳輸, 在slave接收方
nc -l 1234 | tar xzvf - > mysql_3106
#傳輸方
cd $backup_path/
tar czvf - mysql_3106 | nc 192.110.103.42 1234

4)將備份數(shù)據(jù)恢復(fù)到slave

#建立空的DB實(shí)例目錄結(jié)構(gòu)
#mkdir -p /data1/mysql_3106/{data,etc,logs,tmp}

修改配置文件:

cp /data1/mysql_3106_backup/my.cnf /data1/mysql_3106/etc/my.cnf 
#
vim /data1/mysql_3106/etc/my.cnf  #server_id等
server-id=103423106
loose-group_replication_start_on_boot = OFF
loose-group_replication_local_address = "192.110.103.42:31061"
loose-group_replication_group_seeds = "192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061"
loose-group_replication_bootstrap_group = OFF
loose-group_replication_single_primary_mode = TRUE # FALSE : multi-primary
loose-group_replication_enforce_update_everywhere_checks= FALSE # ON : multi-primary
report_host=192.110.103.42
report_port=3106

恢復(fù)備份數(shù)據(jù):

#
innobackupex [--defaults-file=MY.CNF] --copy-back  [--defaults-group=GROUP-NAME] BACKUP-DIR
innobackupex --defaults-file=/data1/mysql_3106/etc/my.cnf --copy-back  /data1/mysql_3106_backup/
#
chown -R mysql:mysql /data1/mysql_3106

5)啟動(dòng)MySQL

#啟動(dòng)MySQL
su - mysql -c "/usr/local/mysql/bin/mysqld_safe --defaults-file=/data1/mysql_3106/etc/my.cnf &"

6)安裝MGR插件

若沒(méi)有安裝MGR需要安裝MGR插件,否則略過(guò)此步驟。

# 安裝MGR插件
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
SHOW PLUGINS;

7)新節(jié)點(diǎn)啟動(dòng)MGR加入集群

加入MGR集群核心步驟:

cat xtrabackup_info -- 得到gtid值
RESET SLAVE ALL;
RESET MASTER;  
#
set global gtid_purged='xx';
START GROUP_REPLICATION;
select * from performance_schema.replication_group_members;

1.新從庫(kù)設(shè)置復(fù)制賬號(hào)(可略)

若是從MGR集群用xtrabackup備份過(guò)來(lái)的新從庫(kù),該已有此復(fù)制賬號(hào)。(可略此步驟)

SET SQL_LOG_BIN=0;
CREATE USER repl@'192.%.%.%' IDENTIFIED WITH 'mysql_native_password' BY 'repl';
GRANT SUPER,REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl@'192.%.%.%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;

2.查看備份時(shí)master最后執(zhí)行的gtid

xtrabackup備份數(shù)據(jù)集卻不備份binlog,所以必須先獲取此次備份結(jié)束時(shí)的最后一個(gè)事務(wù)ID,并在slave上明確指定跳過(guò)這些事務(wù),否則slave會(huì)再次從master上復(fù)制這些binlog并執(zhí)行,導(dǎo)致數(shù)據(jù)重復(fù)執(zhí)行。

可以從數(shù)據(jù)目錄中的xtrabackup_info文件中獲取。如果不是xtrabackup備份的,那么可以直接用備份前的master的show global variables like "gtid_executed";表示master中已執(zhí)行過(guò)的事務(wù),注意不是gtid_purged的值。

查看xtrabackup_info:

cat xtrabackup_info 

uuid = b30fde6b-223e-11ea-bbbe-a0369fac2de4
tool_command = --defaults-file=/data1/mysql_3106/etc/my.cnf -H 127.0.0.1 -P 3106 --user=backup --password=... --parallel=4 --throttle=4 --no-timestamp --slave-info /data1/mysqlbackup/20191219/mysql_3106/
ibbackup_version = 2.4.8
server_version = 5.7.23-log
binlog_pos = filename 'mysql-bin.000008', position '1313', GTID of the last change 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-54:1000003-1000006:2000003-2000006'

3.新從庫(kù)change master

新的從庫(kù)reset master,并change master到主庫(kù)。

-- 在新從庫(kù)上操作
show global variables like '%gtid%';
show master status;
RESET SLAVE ALL;
RESET MASTER;  -- 將gtid_executed,gtid_purged變量置空
show global variables like '%gtid%';
+---------------------------------------------------+---------+
| Variable_name                                     | Value   |
+---------------------------------------------------+---------+
| binlog_gtid_simple_recovery                       | ON      |
| enforce_gtid_consistency                          | ON      |
| group_replication_allow_local_disjoint_gtids_join | OFF     |
| group_replication_gtid_assignment_block_size      | 1000000 |
| gtid_executed                                     |         |
| gtid_executed_compression_period                  | 1000    |
| gtid_mode                                         | ON      |
| gtid_owned                                        |         |
| gtid_purged                                       |         |
| session_track_gtids                               | OFF     |
+---------------------------------------------------+---------+
10 rows in set (0.01 sec)

設(shè)置gtid_purged為xtrabackup_info上的GTID值,這步很關(guān)鍵否則會(huì)出錯(cuò),然后change master

-- 設(shè)置gtid_purged為xtrabackup_info上的值,這步很關(guān)鍵.
set global gtid_purged='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1-54:1000003-1000006:2000003-2000006';
show global variables like '%gtid%' ;


-- change master 
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='repl' FOR CHANNEL 'group_replication_recovery';

4.啟動(dòng)組復(fù)制:(單主模式)

-- 啟動(dòng)組復(fù)制:
-- 查看當(dāng)前情況
select * from performance_schema.replication_group_members;
select * from performance_schema.replication_group_member_stats\G ;
show global variables like 'group_replication%';
show global variables like 'group_replication_single%';

-- 設(shè)置單主模式
SET GLOBAL group_replication_bootstrap_group=OFF;
set global group_replication_enforce_update_everywhere_checks=OFF;
set global group_replication_single_primary_mode=ON;
set global group_replication_allow_local_disjoint_gtids_join=ON; 


-- 啟動(dòng)組
START GROUP_REPLICATION;

-- 檢查狀態(tài)
select * from performance_schema.replication_group_members;

注意:節(jié)點(diǎn)加入集群前最好不要修改任何數(shù)據(jù), 否則就會(huì)出現(xiàn)下面的錯(cuò)誤.

START GROUP_REPLICATION;  
[ERROR] Plugin group_replication reported: 'This member has more > executed transactions than those present in the group. Local transactions: c3c274ff-c63e-11e7-> b339-00163e0c0288:1-4 > Group transactions: 2e6bfa69-0439-41c9-add7-795a9acfd499:1-10,
c5898a84-c63e-11e7-bc8b-00163e0af475:1-4'

#解決方法:
set global group_replication_allow_local_disjoint_gtids_join=ON;  #兼容加入組

5.2.3 數(shù)據(jù)不一致節(jié)點(diǎn)修復(fù)后加入

若是新增一個(gè)節(jié)點(diǎn)或恢復(fù)的Node 故障時(shí)間太長(zhǎng),master log已purge了,則不能直接START GROUP_REPLICATION; 恢復(fù)。需要自己手動(dòng)在MGR集群中備份,MGR集群不存在SST和IST概念,而是通過(guò)GTID和binlog來(lái)實(shí)現(xiàn)“同步,追數(shù)據(jù)”的一個(gè)操作。

說(shuō)明: 數(shù)據(jù)不一致節(jié)點(diǎn)復(fù)制異常,修復(fù)操作可以完全相同于上節(jié)的全新節(jié)點(diǎn)加入!

1.【現(xiàn)象】

192.110.103.42:3106 故障時(shí)間太長(zhǎng),MGR中master log已purge了,導(dǎo)致?tīng)顟B(tài)一直為RECOVERING或ERROR,無(wú)法加入MGR。

說(shuō)明: MGR宕機(jī)節(jié)點(diǎn)會(huì)詢(xún)問(wèn)存活集群,是否能補(bǔ)全binlog?如果能補(bǔ)齊,那么就會(huì)正常傳輸,進(jìn)行追數(shù)據(jù) ;如果宕機(jī)節(jié)點(diǎn)需要的日志不存在了,則該節(jié)點(diǎn)無(wú)法正常加入到集群環(huán)境中。

root@192.110.103.42 : (none) > show variables like 'group_replication_group_seeds';
+-------------------------------+-------------------------------------------------------------+
| Variable_name                 | Value                                                       |
+-------------------------------+-------------------------------------------------------------+
| group_replication_group_seeds | 192.110.103.41:31061,192.110.103.42:31061,192.110.103.43:31061 |
+-------------------------------+-------------------------------------------------------------+
1 row in set (0.00 sec)

root@192.110.103.42 : (none) > start group_replication;
Query OK, 0 rows affected (3.35 sec)

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | RECOVERING   |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
3 rows in set (0.00 sec)

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ERROR        |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

錯(cuò)誤日志:

日志提示:master has purged binary logs containing GTIDs that the slave requires.

2019-10-24T08:15:11.015032Z 198 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but t
he master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2019-10-24T08:15:11.015060Z 198 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CH
ANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2019-10-24T08:15:11.015076Z 198 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2019-10-24T08:15:11.015131Z 196 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-10-24T08:15:11.015164Z 199 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2019-10-24T08:15:11.015530Z 199 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2019-10-24T08:15:11.017239Z 196 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='192.110.103.41', master_port= 3106, master_log_file='', m
aster_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-24T08:15:11.019165Z 196 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'

2.【處理】

此種情況下,可以用xtrabackup重新制做從庫(kù),不過(guò)本節(jié)使用mysqldump來(lái)實(shí)現(xiàn):

1)安裝新DB實(shí)例

若是從庫(kù)還是可以正常啟動(dòng),此步可以略,只需要清理本地GTID信息,導(dǎo)入mysqldump的數(shù)據(jù)。

a)安裝一臺(tái)全新的MySQL(略)

b)也可以把當(dāng)前從庫(kù),清空當(dāng)前數(shù)據(jù)。

2)通過(guò)mysqldump制作從庫(kù)

使用mysqldump來(lái)制作一份完整的數(shù)據(jù)備份。

mysqldump備份:

/usr/local/mysql/bin/mysqldump -h 127.0.0.1 -P3106 --all-databases --default-character-set=utf8 -R -q --triggers --master-data=2 --single-transaction > mysql3106_online.sql ;

傳輸數(shù)據(jù)到從庫(kù):

scp -pr  mysql3106_online.sql root@192.110.103.42:data1/ 
用nc傳輸
#slave接收方
nc -l 1234 | tar xzvf - >mysql3106_online.sql
#傳輸方
tar czvf - mysql3106_online.sql | nc 192.110.103.42 1234

清理本地GTID信息
關(guān)閉同步進(jìn)程、只讀、并清理本地GTID信息。

-- slave 執(zhí)行
show variables like 'group_replication_group_seeds';
STOP GROUP_REPLICATION;
RESET SLAVE ALL;
set global super_read_only=0;
show master logs;
reset master;
show master logs
show global variables like '%gtid%';

導(dǎo)入備份數(shù)據(jù):

#slave執(zhí)行
mysql -h 127.0.0.1 -P3106 -uroot -pxxx --default-character-set=utf8 < mysql3106_online.sql

3)安裝MGR插件

安裝MGR插件
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
SHOW PLUGINS;

4)啟動(dòng)恢復(fù)MGR:

show global variables like '%gtid%';
set global group_replication_enforce_update_everywhere_checks=OFF;
set global group_replication_single_primary_mode=ON;
START GROUP_REPLICATION;

5)結(jié)果:

因?yàn)槭切轮谱鞯膹膸?kù),能正常同步追上數(shù)據(jù)。故可用START GROUP_REPLICATION;成功加入到MGR中。

START GROUP_REPLICATION;
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+

6. 故障切換測(cè)試

6.1 當(dāng)前狀態(tài)

當(dāng)前狀態(tài)為單主模式,狀態(tài)如下:

select * from performance_schema.global_variables  where VARIABLE_NAME 
 in ('group_replication_single_primary_mode','group_replication_enforce_update_everywhere_checks');
+----------------------------------------------------+----------------+
| VARIABLE_NAME                                      | VARIABLE_VALUE |
+----------------------------------------------------+----------------+
| group_replication_enforce_update_everywhere_checks | OFF            |
| group_replication_single_primary_mode              | ON             |
+----------------------------------------------------+----------------+
2 rows in set (0.00 sec)
 select * from performance_schema.global_variables  where VARIABLE_NAME like '%read_only';
+-----------------------+----------------+
| VARIABLE_NAME         | VARIABLE_VALUE |
+-----------------------+----------------+
| innodb_read_only      | OFF            |
| read_only             | OFF            |
| super_read_only       | OFF            |
+-----------------------+----------------+
 > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+

當(dāng)前master:

只有單主模型的組復(fù)制才需要查找主節(jié)點(diǎn),多主模型沒(méi)有master/slave的概念,所以無(wú)需查找。

SELECT * FROM performance_schema.global_status WHERE VARIABLE_NAME='group_replication_primary_member';
+----------------------------------+--------------------------------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 |
+----------------------------------+--------------------------------------+
#查看Master
 SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

6.2 主庫(kù)故障

1)正常關(guān)停master

用kill命令關(guān)停master.

#Master 機(jī)器192.110.103.41上kill mysql 。
$ps aux|grep 3106
mysql    122456  0.0  0.0 106252  1444 ?        S    Oct23   0:00 /bin/sh /usr/local/mysql-5.7.23/bin/mysqld_safe --defaults-file=/data1/mysql_3106/etc/my.cnf
mysql    123471  0.1  0.8 11575792 1069584 ?    Sl   Oct23   1:45 /usr/local/mysql-5.7.23/bin/mysqld --defaults-file=/data1/mysql_3106/etc/my.cnf --basedir=/usr/local/mysql-5.7.23 --datadir=/data1/mysql_3106/data --plugin-dir=/usr/local/mysql-5.7.23/lib/plugin --log-error=/data1/mysql_3106/logs/mysqld.err --open-files-limit=8192 --pid-file=/data1/mysql_3106/tmp/mysql.pid --socket=/data1/mysql_3106/tmp/mysql.sock --port=3106
$kill 122456 123471 ; tail -f /data1/mysql_3106/logs/mysqld.err 

日志:

原DB Master日志

$tail -f /data1/mysql_3106/logs/mysqld.err 
2019-10-24T03:10:32.746843Z 0 [Warning] /usr/local/mysql-5.7.23/bin/mysqld: Forcing close of thread 31  user: 'root'

2019-10-24T03:10:32.746873Z 0 [Note] Plugin group_replication reported: 'Plugin 'group_replication' is stopping.'
2019-10-24T03:10:32.746901Z 0 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2019-10-24T03:10:35.797258Z 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2019-10-24T03:10:40.799923Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2019-10-24T03:10:40.799954Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2019-10-24T03:10:40.800110Z 7 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2019-10-24T03:10:40.800431Z 7 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 65
2019-10-24T03:10:40.800652Z 4 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2019-10-24T03:10:40.800787Z 0 [Note] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.'
2019-10-24T03:10:40.800799Z 0 [Note] Event Scheduler: Purging the queue. 0 events
2019-10-24T03:10:40.801278Z 0 [Note] Binlog end
2019-10-24T03:10:40.802272Z 0 [Note] Shutting down plugin 'group_replication'
2019-10-24T03:10:40.802322Z 0 [Note] Plugin group_replication reported: 'All Group Replication server observers have been successfully unregistered'
...
2019-10-24T03:10:42.804477Z 0 [Note] Shutting down plugin 'binlog'
2019-10-24T03:10:42.805238Z 0 [Note] /usr/local/mysql-5.7.23/bin/mysqld: Shutdown complete

2019-10-24T03:10:42.814933Z mysqld_safe mysqld from pid file /data1/mysql_3106/tmp/mysql.pid ended

新DB Master日志:

$tail -n 30 /data1/mysql_3106/logs/mysqld.err
2019-10-23T11:11:00.671705Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2019-10-23T11:11:00.671736Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 31061'
2019-10-23T11:11:05.400823Z 2 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address 192.110.103.41:3106.'
2019-10-23T11:11:05.401138Z 20 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2019-10-23T11:11:05.401143Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.42:3106 on view 15718289704352993:2.'
2019-10-23T11:11:05.402757Z 20 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.110.103.41', master_port= 3106, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-23T11:11:05.404717Z 20 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 509810ee-f3d7-11e9-a7d5-a0369fac2de4 at 192.110.103.41 port: 3106.'
2019-10-23T11:11:05.404998Z 22 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2019-10-23T11:11:05.406423Z 22 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@192.110.103.41:3106',replication started in log 'FIRST' at position 4
2019-10-23T11:11:05.442349Z 23 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './mysql-relay-bin-group_replication_recovery.000001' position: 4
2019-10-23T11:11:05.461483Z 20 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-10-23T11:11:05.461910Z 23 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'mysql-bin.000002' at position 934
2019-10-23T11:11:05.462119Z 22 [Note] Slave I/O thread killed while reading event for channel 'group_replication_recovery'
2019-10-23T11:11:05.462143Z 22 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'mysql-bin.000002', position 934
2019-10-23T11:11:05.523357Z 20 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='192.110.103.41', master_port= 3106, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-10-23T11:11:05.526137Z 0 [Note] Plugin group_replication reported: 'This server was declared online within the replication group'
2019-10-23T11:15:33.426684Z 0 [Note] Plugin group_replication reported: 'Members joined the group: 192.110.103.43:3106'
2019-10-23T11:15:33.426832Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.42:3106, 192.110.103.43:3106 on view 15718289704352993:3.'
2019-10-23T11:15:34.094942Z 0 [Note] Plugin group_replication reported: 'The member with address 192.110.103.43:3106 was declared online within the replication group'
2019-10-24T03:10:32.839967Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: 192.110.103.41:3106'
2019-10-24T03:10:32.839985Z 0 [Note] Plugin group_replication reported: 'Primary server with address 192.110.103.41:3106 left the group. Electing new Primary.'
2019-10-24T03:10:32.840052Z 0 [Note] Plugin group_replication reported: 'A new primary with address 192.110.103.42:3106 was elected, enabling conflict detection until the new primary applies all relay logs.'
2019-10-24T03:10:32.840086Z 41 [Note] Plugin group_replication reported: 'This server is working as primary member.'
2019-10-24T03:10:32.840107Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.42:3106, 192.110.103.43:3106 on view 15718289704352993:4.'
2019-10-24T03:12:01.677869Z 4 [Note] Plugin group_replication reported: 'Primary had applied all relay logs, disabled conflict detection'

DB自動(dòng)切換:

可以看到DB已順利完成自動(dòng)切換。

root@192.110.103.42 : (none) > select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

root@192.110.103.42 : (none) > SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

原master恢復(fù),重新加入機(jī)群:

原master恢復(fù),START GROUP_REPLICATION;

> select @@group_replication_bootstrap_group;
+-------------------------------------+
| @@group_replication_bootstrap_group |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
> START GROUP_REPLICATION;
> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | 53e462be-f3d7-11e9-9125-a0369fa6cce4 | 192.110.103.42 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+

2)異常關(guān)停master

異常關(guān)停kill -9 master節(jié)點(diǎn),192.110.103.42。從日志中可以看到自動(dòng)選主,并主變成為192.110.103.41。

#kill -9  master節(jié)點(diǎn),192.110.103.42上執(zhí)行.
$kill -9 mysql_pid

#192.110.103.41機(jī)器上日志
tail -f /data1/mysql_3106/logs/mysqld.err     
2019-10-24T06:17:31.473849Z 0 [Warning] Plugin group_replication reported: 'Member with address 192.110.103.42:3106 has become unreachable.'
2019-10-24T06:17:32.479299Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: 192.110.103.42:3106'
2019-10-24T06:17:32.479323Z 0 [Note] Plugin group_replication reported: 'Primary server with address 192.110.103.42:3106 left the group. Electing new Primary.'
2019-10-24T06:17:32.479395Z 0 [Note] Plugin group_replication reported: 'A new primary with address 192.110.103.41:3106 was elected, enabling conflict detection until the new primary applies all relay logs.'
2019-10-24T06:17:32.479439Z 37 [Note] Plugin group_replication reported: 'This server is working as primary member.'
2019-10-24T06:17:32.479465Z 0 [Note] Plugin group_replication reported: 'Group membership changed to 192.110.103.41:3106, 192.110.103.43:3106 on view 15718289704352993:6.'

root@192.110.103.41 : (none) >  select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
| group_replication_applier | ee4a9cec-f3d5-11e9-9ded-a0369fa6cd30 | 192.110.103.43 |        3106 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
2 rows in set (0.00 sec)

root@192.110.103.41 : (none) > SELECT ta.* ,tb.MEMBER_HOST,tb.MEMBER_PORT,tb.MEMBER_STATE FROM performance_schema.global_status ta,performance_schema.replication_group_members tb  WHERE ta.VARIABLE_NAME='group_replication_primary_member' and ta.VARIABLE_VALUE=tb.MEMBER_ID;
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| VARIABLE_NAME                    | VARIABLE_VALUE                       | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_primary_member | 509810ee-f3d7-11e9-a7d5-a0369fac2de4 | 192.110.103.41 |        3106 | ONLINE       |
+----------------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

參考:


陶老師運(yùn)維筆記
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容