MySQL 提供 MyISAM 、 InnoDB 、 memory(heap) 等多種存儲(chǔ)引擎。每種存儲(chǔ)引擎對(duì)于索引的支持以及實(shí)現(xiàn)都不盡相同,
本文主要討論 InnoDB 引擎相關(guān)的索引應(yīng)用。
為何使用索引
索引用于快速找出在某個(gè)列中有一特定值的行。在查詢(xún)時(shí)如果沒(méi)有應(yīng)用索引,MySQL 將不得不掃描表以找出符合條件的數(shù)據(jù)項(xiàng),我們知道,IO操作是非常耗時(shí)的。
建立索引的幾個(gè)原則
盡量使用唯一索引,對(duì)于有唯一值的列索引效果最好,對(duì)于像性別只有很少的值的列索引效果就不明顯。
索引長(zhǎng)度盡量短,這樣做有幾個(gè)好處,首先短的索引可以節(jié)省索引空間,也會(huì)使查找的速度得到提升。對(duì)于較短的鍵值,索引高速緩存中的塊能容納更多的鍵值,MySQL也可以在內(nèi)存中容納更多的值。
太長(zhǎng)的列,可以選擇只建立部分索引
更新非常頻繁的數(shù)據(jù)不適宜建索引
利用最左前綴原則,比如建立了一個(gè)聯(lián)合索引(a,b,c),那么其實(shí)我們可利用的索引就有(a), (a,b), (a,b,c)
不要過(guò)多創(chuàng)建索引,首先過(guò)多的索引會(huì)占用更多的空間,而且每次增、刪、改操作都會(huì)重建索引,而且過(guò)多索引也會(huì)增加之后的優(yōu)化復(fù)雜度
盡量擴(kuò)展索引,比如現(xiàn)有索引(a),現(xiàn)在我又要對(duì)(a,b)進(jìn)行索引,不需要再建一個(gè)索引(a,b)
請(qǐng)注意,一次查詢(xún)是不能應(yīng)用多個(gè)索引的
<,<=,=,>,>=,BETWEEN,IN 可用到索引,<>,not in ,!= 則不行
like "xxxx%" 是可以用到索引的,like "%xxxx" 則不行(like "%xxx%" 同理)
NULL會(huì)使索引的效果大打折扣
淺談聯(lián)合索引
首先因?yàn)?InnoDB 的數(shù)據(jù)文件本身要按主鍵聚集,所以InnoDB要求表必須有主鍵(MyISAM可以沒(méi)有),如果沒(méi)有顯式指定,則MySQL系統(tǒng)會(huì)自動(dòng)選擇一個(gè)可以唯一標(biāo)識(shí)數(shù)據(jù)記錄的列作為主鍵,如果不存在這種列,則MySQL自動(dòng)為InnoDB表生成一個(gè)隱含字段作為主鍵。主鍵對(duì)于 InnoDB 的索引結(jié)構(gòu)是十分重要的。
InnoDB 引擎的索引是使用 B+樹(shù) 實(shí)現(xiàn)索引結(jié)構(gòu)的,當(dāng)我們建立一個(gè)聯(lián)合索引(a, b, c)時(shí),B+樹(shù)將按照從左至右來(lái)建立搜索樹(shù),然后檢索時(shí)B+樹(shù)將先比較 a 然后再其基礎(chǔ)上比較 b 和 c 。不難看出如果我們的搜索條件中只有 a 和 c ,將不能使用完整的(a, b, c)索引,如果我們的搜索條件中沒(méi)有 a 那么這條查詢(xún)將不會(huì)用上索引,這其實(shí)就是最左前綴特性。
** 接下來(lái)我們來(lái)看下聯(lián)合索引應(yīng)用時(shí)的幾種情況: **
desc users;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| rname | varchar(50) | NO | | NULL | |
| rdesc | varchar(50) | NO | | NULL | |
| age | int(11) | NO | MUL | NULL | |
| card | int(5) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
show index from users;
+-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users | 0 | PRIMARY | 1 | id | A | 15 | NULL | NULL | | BTREE | | |
| users | 1 | age_name_desc | 1 | age | A | 15 | NULL | NULL | | BTREE | | |
| users | 1 | age_name_desc | 2 | rname | A | 15 | 10 | NULL | | BTREE | | |
| users | 1 | age_name_desc | 3 | rdesc | A | 15 | 15 | NULL | | BTREE | | |
+-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
- 全列匹配
explain select * from users where age = 11 and rname = "asd" and rdesc = "asd";
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
| 1 | SIMPLE | users | ref | age_name_desc | age_name_desc | 83 | const,const,const | 1 | Using where |
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
以上 explain 的結(jié)果可以看出,當(dāng)對(duì)索引中所有列進(jìn)行精確匹配的時(shí)候,可以用到完整索引。
explain select * from users where rname = "asd" and age = 11 and rdesc = "asd";
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
| 1 | SIMPLE | users | ref | age_name_desc | age_name_desc | 83 | const,const,const | 1 | Using where |
+----+-------------+-------+------+---------------+---------------+---------+-------------------+------+-------------+
按照B+樹(shù)的結(jié)構(gòu)聯(lián)合索引本是對(duì)順序十分敏感的,但是從以上結(jié)果可看出調(diào)整順序并沒(méi)有影響到索引的選用,這是因?yàn)镸ySQL的查詢(xún)優(yōu)化器會(huì)自動(dòng)調(diào)整where子句的條件順序以使用適合的索引。
- 部分匹配(左前綴)
explain select * from users where age = 11;
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------+
| 1 | SIMPLE | users | ref | age_name_desc | age_name_desc | 4 | const | 1 | NULL |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------+
從上述結(jié)果可看出當(dāng)精確匹配最左前綴的列時(shí),是可以用到索引的,但是 key_len = 4,只用到了第一列前綴索引。
explain select * from users where age = 11 and rdesc = "asd";
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
| 1 | SIMPLE | users | ref | age_name_desc | age_name_desc | 4 | const | 1 | Using where |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
上述查詢(xún)?nèi)笔Я藃name列,可以看出當(dāng)缺失中間列時(shí)將不能使用完整的聯(lián)合索引。查詢(xún)只用到了最左部分索引,而rdesc由于沒(méi)有rname無(wú)法和左前綴銜接。
explain select * from users where rname = "asd";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
上述查詢(xún)由于查詢(xún)條件不是最左前綴,將不能使用聯(lián)合索引age_name_desc,建立聯(lián)合索引時(shí)順序是很重要的,必須在建索引前考慮清楚。
- 非精確查詢(xún)
explain select * from users where age = 11 and rname like 'sad%';
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | users | range | age_name_desc | age_name_desc | 36 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
explain select * from users where age = 11 and rname like 'sad%' and rdesc = 'asd';
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | users | range | age_name_desc | age_name_desc | 83 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
可以看出like查詢(xún)當(dāng)通配符'%'不出現(xiàn)在開(kāi)頭時(shí),是可以應(yīng)用到索引的。
explain select * from users where age = 11 and rname like '%sad%';
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
| 1 | SIMPLE | users | ref | age_name_desc | age_name_desc | 4 | const | 1 | Using where |
+----+-------------+-------+------+---------------+---------------+---------+-------+------+-------------+
通配符'%'出現(xiàn)在開(kāi)頭則不行
接下來(lái)我們看看范圍查詢(xún)
explain select * from users where age = 61 and rname < 'fasd';
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | users | range | age_name_desc | age_name_desc | 36 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
- 函數(shù)和表達(dá)式
explain select * from users where (age + 1) = 33;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 15 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
當(dāng)查詢(xún)條件中出現(xiàn)函數(shù)或表達(dá)式時(shí),將不能應(yīng)用索引。寫(xiě)查詢(xún)語(yǔ)句時(shí)要盡量避免表達(dá)式和函數(shù)的出現(xiàn)。
指導(dǎo)性建議
建議在選擇性高的列上建立索引,所謂索引的選擇性,是指不重復(fù)的索引值與表記錄數(shù)的比值:
Index Selectivity = Cardinality / Count
顯然選擇性的取值范圍為(0, 1],選擇性越高的索引價(jià)值越大。
比例越大我們掃描的記錄數(shù)越少,唯一鍵的區(qū)分度是1,而一些狀態(tài)、性別字段可能在大數(shù)據(jù)面前區(qū)分度就趨近于0
select count(distinct(rname))/count(*) as selectivity from users;
+-------------+
| selectivity |
+-------------+
| 0.9333 |
+-------------+
有一種與索引選擇性有關(guān)的索引優(yōu)化策略叫做前綴索引,就是用列的前綴代替整個(gè)列作為索引key,當(dāng)前綴長(zhǎng)度合適時(shí),可以做到既使得前綴索引的選擇性接近全列索引,同時(shí)因?yàn)樗饕齥ey變短而減少了索引文件的大小和維護(hù)開(kāi)銷(xiāo)。
select count(distinct(left(rname, 3)))/count(*) as selectivity from users;
+-------------+
| selectivity |
+-------------+
| 0.7333 |
+-------------+
select count(distinct(left(rname, 5)))/count(*) as selectivity from users;
+-------------+
| selectivity |
+-------------+
| 0.9333 |
+-------------+
可以看到,當(dāng)把前綴取到5時(shí)selectivity值就和完整的selectivity值一樣了,這樣可以大幅度減小索引所占用的空間,而且相應(yīng)的查詢(xún)速度也會(huì)有一定提升。