一、眾所周知,MySQL在RR隔離級(jí)別下,會(huì)出現(xiàn)幻讀的問題。
出現(xiàn)幻讀的前提條件:
?Innodb存儲(chǔ)引擎,在RR隔離級(jí)別下,并且使用了當(dāng)前讀;
出現(xiàn)幻讀的表現(xiàn):
?一個(gè)事務(wù)在前后兩次查詢同一范圍數(shù)據(jù)的時(shí)候(當(dāng)前讀),后一次查詢看到了前一次查詢沒有看到的行。兩點(diǎn)需要說明:
1 、在可重復(fù)讀隔離級(jí)別下,普通查詢是快照讀,不會(huì)看到其他事務(wù)插入的數(shù)據(jù)?;米x只在當(dāng)前讀才會(huì)出現(xiàn);
2 、幻讀專指新插入的行。當(dāng)前讀的作用就是能讀到其他事務(wù)已經(jīng)提交的新插入的記錄。
二、幻讀帶來的影響是會(huì)導(dǎo)致主從之間數(shù)據(jù)不一致,是很嚴(yán)重的問題。
三、如何解決幻讀的問題:
?產(chǎn)生幻讀的原因是:mysql的record lock(行鎖)只能鎖住行,但是新插入記錄這個(gè)動(dòng)作,是更新了記錄之間的“間隙”。因此,為了解決幻讀,innodb引入了新的鎖,也就是間隙鎖(Gap Lock)。顧名思義,間隙鎖,鎖的就是兩個(gè)值之間的空隙。
四、以上介紹了間隙鎖,它能幫我們解決了幻讀的問題,但同時(shí)也會(huì)給我們帶來一些“困擾”。接下來結(jié)合一個(gè)線上的故障case,講解一下間隙鎖,給我們帶來了什么樣的困擾:
1、故障現(xiàn)象描述:
用戶報(bào)障,一個(gè)insert的SQL語句被堵住了,從DBbrain看到的現(xiàn)象update的SQL持有鎖,導(dǎo)致insert語句無法寫入:

2、分析:
1)、從上圖可以看到,insert語句處于鎖等待狀態(tài):LOCK WAIT。Dbbrain顯示持有鎖的事務(wù)為:“UPDATE t_push_task SET status=9 WHERE push_id=1384715944290652160 AND access_id=1500015064 limit 1” 以及
“UPDATE t_push_task SET status=9 WHERE push_id=1384729752719482880 AND access_id=1500015064 limit 1”
insert語句為:
INSERT INTO t_push_task SET status=0, access_id=1600007315, type=8, push_req='?????(:??[{"Ids":["zhihuishu_class_204117682"],"IdsStru":null,"InnerOperator":1,"OuterOperator":1,"IsNot":false,"TagType":"xg_user_define"}]J?00:4895P?Z?????????\n??{"alert":{"title":"通知","subtitle":"","body":"劉婷班 ......
2)、是否這兩個(gè)update語句有性能問題呢?

從explain執(zhí)行計(jì)劃分析,該SQL掃描的行數(shù)只有一行,并且是走主鍵索引掃描,所以u(píng)pdate語句并沒有性能問題;
3)、那為什么update語句會(huì)把insert語句堵住呢?
我們知道,insert插入記錄,更新的是記錄之間的“間隙”。那么是否有可能是由于間隙鎖的原因,導(dǎo)致insert無法插入呢?
4)、我們來看看表結(jié)構(gòu),如下:
CREATE TABLE t_push_task (
push_id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
group_id varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
status int(8) NOT NULL,
access_id bigint(11) NOT NULL,
type bigint(20) NOT NULL,
target_list mediumtext COLLATE utf8mb4_unicode_ci,
push_req blob,
create_time datetime DEFAULT CURRENT_TIMESTAMP,
push_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
push_node varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
expire_sec int(11) NOT NULL DEFAULT '259200',
start_time datetime DEFAULT NULL,
finish_time datetime DEFAULT NULL,
source int(8) NOT NULL,
msg_type int(8) NOT NULL,
msg_status int(8) NOT NULL DEFAULT '0',
push_content mediumtext COLLATE utf8mb4_unicode_ci,
last_modify_time datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
global_push_type varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
upload_id bigint(20) unsigned NOT NULL DEFAULT '0',
already_send_num bigint(20) unsigned NOT NULL DEFAULT '0',
queue_id varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT '',
collapse_id int(8) NOT NULL DEFAULT '0' COMMENT '????push_id',
expect_send_num bigint(20) unsigned DEFAULT '0',
current_index int(11) DEFAULT '-1' COMMENT '?????????',
PRIMARY KEY (push_id,push_time),
KEY idx_status_create (access_id,status,create_time),
KEY idx_status_push (access_id,status,start_time),
KEY idx_status_push_queenid (queue_id,status,push_time),
KEY idx_source_type_content_push (access_id,create_time,source,msg_type,push_content(512)),
KEY idx_push_time (push_time),
KEY idx_id_pushtime_type_msgtype_source (access_id,push_time,type,msg_type,source),
KEY idx_id_type_status_pushtime (access_id,type,status,push_time),
KEY idx_id_collapseid (access_id,collapse_id),
KEY id_idx_status (push_id,access_id,status)
) ENGINE=InnoDB AUTO_INCREMENT=500534759 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
/*!50100 PARTITION BY RANGE (to_days(push_time))
(PARTITION p201904 VALUES LESS THAN (737545) ENGINE = InnoDB,
PARTITION p201905 VALUES LESS THAN (737576) ENGINE = InnoDB,
PARTITION p201906 VALUES LESS THAN (737606) ENGINE = InnoDB,
PARTITION p201907 VALUES LESS THAN (737637) ENGINE = InnoDB,
PARTITION p201908 VALUES LESS THAN (737668) ENGINE = InnoDB,
PARTITION p201909 VALUES LESS THAN (737698) ENGINE = InnoDB,
PARTITION p201910 VALUES LESS THAN (737729) ENGINE = InnoDB,
PARTITION p201911 VALUES LESS THAN (737759) ENGINE = InnoDB,
PARTITION p201912 VALUES LESS THAN (737790) ENGINE = InnoDB,
PARTITION p202001 VALUES LESS THAN (737821) ENGINE = InnoDB,
PARTITION p202002 VALUES LESS THAN (737850) ENGINE = InnoDB,
PARTITION p202003 VALUES LESS THAN (737881) ENGINE = InnoDB,
PARTITION p202004 VALUES LESS THAN (737911) ENGINE = InnoDB,
PARTITION p202005 VALUES LESS THAN (737942) ENGINE = InnoDB,
PARTITION p202006 VALUES LESS THAN (737972) ENGINE = InnoDB,
PARTITION p202007 VALUES LESS THAN (738003) ENGINE = InnoDB,
PARTITION p202008 VALUES LESS THAN (738034) ENGINE = InnoDB,
PARTITION p202009 VALUES LESS THAN (738064) ENGINE = InnoDB,
PARTITION p202010 VALUES LESS THAN (738095) ENGINE = InnoDB,
PARTITION p202011 VALUES LESS THAN (738125) ENGINE = InnoDB,
PARTITION p202012 VALUES LESS THAN (738156) ENGINE = InnoDB,
PARTITION p202101 VALUES LESS THAN (738187) ENGINE = InnoDB,
PARTITION p202102 VALUES LESS THAN (738215) ENGINE = InnoDB,
PARTITION p202103 VALUES LESS THAN (738246) ENGINE = InnoDB,
PARTITION p202104 VALUES LESS THAN (738276) ENGINE = InnoDB,
PARTITION p202105 VALUES LESS THAN (738307) ENGINE = InnoDB,
PARTITION p202106 VALUES LESS THAN (738337) ENGINE = InnoDB,
PARTITION p202107 VALUES LESS THAN (738368) ENGINE = InnoDB,
PARTITION p202108 VALUES LESS THAN (738399) ENGINE = InnoDB,
PARTITION p202109 VALUES LESS THAN (738429) ENGINE = InnoDB,
PARTITION p202110 VALUES LESS THAN (738460) ENGINE = InnoDB,
PARTITION p202111 VALUES LESS THAN (738490) ENGINE = InnoDB,
PARTITION p202112 VALUES LESS THAN (738521) ENGINE = InnoDB,
PARTITION p2022 VALUES LESS THAN (741443) ENGINE = InnoDB) */
從表結(jié)構(gòu)可以知道,該表主鍵為:PRIMARY KEY (push_id,push_time)。而update語句的where條件為:push_id=1384729752719482880 AND access_id=1500015064,故update語句走的是主鍵索引的前綴索引,長(zhǎng)度為8個(gè)字節(jié)(push_id類型為bigint,長(zhǎng)度為8個(gè)字節(jié);push_time類型為datetime,長(zhǎng)度也是8個(gè)字節(jié));
由于push_id是自增字段auto_increment,故insert語句插入的時(shí)候,push_id會(huì)進(jìn)行自增,會(huì)在當(dāng)前最大max(push_id)的基礎(chǔ)上,自增加1;
5)、用戶提供的線索:
用戶反饋,由于業(yè)務(wù)邏輯設(shè)計(jì)不合理的原因,其update語句中的push_id實(shí)際上在表中是并不存在的,并且遠(yuǎn)遠(yuǎn)大于當(dāng)前表中的最大push_id,當(dāng)前表中最大push_id為:

這就很奇怪了,update語句where 條件的push_id=1384715944290652160在表當(dāng)中根本就不存在,也就是說該update語句實(shí)際上沒生效。update語句和insert語句風(fēng)馬牛不相及,為啥update會(huì)阻塞insert呢???
6)、測(cè)試實(shí)例實(shí)驗(yàn)一把,看看是否會(huì)出現(xiàn)同樣的情況?
a、創(chuàng)建表、寫入記錄
MySQL [(none)]> use test;
MySQL [test]> CREATE TABLE t ( id int(11) NOT NULL, c int(11) DEFAULT NULL, d int(11) DEFAULT NULL, PRIMARY KEY (id), KEY c (c)) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> show create table t\G;
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE t (
id int(11) NOT NULL,
c int(11) DEFAULT NULL,
d int(11) DEFAULT NULL,
PRIMARY KEY (id),
KEY c (c)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
insert into t values(0,0,0),(5,5,5),(10,10,10),(15,15,15),(20,20,20),(25,25,25);
MySQL [test]> select * from t;
+----+------+------+
| id | c | d |
+----+------+------+
| 0 | 0 | 0 |
| 5 | 5 | 5 |
| 10 | 10 | 10 |
| 15 | 15 | 15 |
| 20 | 20 | 20 |
| 25 | 25 | 25 |
+----+------+------+
6 rows in set (0.00 sec)
b、開啟兩個(gè)會(huì)話,分別是session A、session B
session A:
MySQL [test]> begin;
Query OK, 0 rows affected (0.00 sec)
MySQL [test]> update t set d=99 where id=100;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0 Changed: 0 Warnings: 0
session A更新了表中不存在的id,并且比當(dāng)前max(id)還要大。此時(shí)session A的事務(wù)還未提交;
session B:
MySQL [(none)]> use test;
MySQL [test]> insert into t values(26,26,26);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
session B想要插入比當(dāng)前比當(dāng)前max(id)=25要大的一條記錄26,結(jié)果發(fā)現(xiàn)被堵住了,產(chǎn)生了鎖等待;
實(shí)驗(yàn)結(jié)果現(xiàn)象和上述用戶的故障case是一樣的,為什么會(huì)出現(xiàn)這種情況呢?
7)、next-key lock(record lock + gap lock)加鎖原則:
原則 1:加鎖的基本單位是 next-key lock。next-key lock 是前開后閉區(qū)間;
原則 2:查找過程中訪問到的對(duì)象才會(huì)加鎖;
優(yōu)化 1:索引上的等值查詢,給唯一索引加鎖的時(shí)候,next-key lock 退化為行鎖;
優(yōu)化 2:索引上的等值查詢,向右遍歷時(shí)且最后一個(gè)值不滿足等值條件的時(shí)候,next-key lock 退化為間隙鎖;
一個(gè) bug:唯一索引上的范圍查詢會(huì)訪問到不滿足條件的第一個(gè)值為止;
由于用戶的表當(dāng)中并沒有push_id=1384715944290652160或者push_id=1384729752719482880,用上述加鎖規(guī)則來判斷:
a、加鎖的單位是next-key lock,故“UPDATE t_push_task SET status=9 WHERE push_id=1384715944290652160 AND access_id=1500015064 limit 1”的加鎖范圍為:
(max(push_id),+∞];
b、根據(jù)優(yōu)化2原則:update語句是一個(gè)等值查詢(push_id=1384715944290652160),向右遍歷比1384715944290652160大的是+∞,而+∞不滿足查詢條件,故next-key lock退化成間隙鎖,鎖住的范圍為(max(push_id),+∞);
c、由于insert語句寫入,push_id會(huì)自增,比當(dāng)前最大的max(push_id)自增加1,所有落入了加鎖范圍(max(push_id),+∞)內(nèi),導(dǎo)致用戶的insert語句被阻塞。
d、即:如果索引沒有命中,會(huì)有間隙鎖, 向左掃描掃到第一個(gè)比給定參數(shù)小的值, 向右掃描掃描到第一個(gè)比給定參數(shù)大的值, 然后以此為界,構(gòu)建一個(gè)區(qū)間, 鎖住整個(gè)區(qū)間內(nèi)的數(shù)據(jù)
由此就能解釋,為什么insert會(huì)被update阻塞了。將原因同步給用戶之后,用戶承諾會(huì)優(yōu)化業(yè)務(wù)代碼,規(guī)避間隙鎖的問題。
8)、思考:
a、上述case是更新了一個(gè)比表中max(push_id)還要大的push_id,那如果更新的push_id剛好是表中存在的一條記錄,比如更新的是當(dāng)前的max(push_id)=500536561,那么會(huì)如何加鎖呢?
根據(jù)加鎖原則優(yōu)化1:索引上的等值查詢,給唯一索引加鎖的時(shí)候,next-key lock 退化為行鎖。此時(shí)只會(huì)對(duì)push_id=500536561加鎖;
b、如果更新的push_id在表中不存在,但是比max(push_id)要小,例如更新的push_id=447689301,介于447689203 和 448344009之間。此時(shí)會(huì)如何加鎖呢?

根據(jù)原則1,加鎖單位是next-key lock,故加鎖的范圍是(447689203,448344009];
根據(jù)優(yōu)化2,這是一個(gè)等值查詢,push_id=447689301,向右遍歷且最后一個(gè)值不滿足等值條件的時(shí)候,next-key lock 退化為間隙鎖,故最終加鎖的范圍為(447689203,448344009)。