一個SQL優(yōu)化技巧
我們系統(tǒng)中存在大量類似如下查詢:
SELECT p.*, di.name AS district, q.name AS scene_name, (
SELECT user_name
FROM fw_user
WHERE fw_user_id = u.parent_id
) AS parent_phone, split_part(p.location, ',', 1) AS province
, split_part(p.location, ',', 2) AS city
FROM jd_patient p
LEFT JOIN fw_user u ON u.fw_user_id = p.fw_user_id
LEFT JOIN jd_district di ON p.jd_district_id = di.jd_district_id
LEFT JOIN jd_qrcode q ON p.jd_qrcode_id = q.jd_qrcode_id
WHERE 1 = 1
AND p.fw_user_id IS NOT NULL
ORDER BY p.create_date DESC, p.location
LIMIT 10 OFFSET 0;
這些查詢在數(shù)據(jù)量小的時候,不會出現(xiàn)效率瓶頸,但是數(shù)據(jù)量一旦增大,就會顯著地降低查詢速度。
使用explain查看其執(zhí)行速度:
QUERY PLAN
------------------------------------------------------------------------------------------
-----------------
Limit (cost=17975306.40..17975306.43 rows=10 width=2008)
-> Sort (cost=17975306.40..17980273.18 rows=1986709 width=2008)
Sort Key: p.create_date, p.location
-> Hash Left Join (cost=570.27..17932374.34 rows=1986709 width=2008)
Hash Cond: (p.jd_qrcode_id = q.jd_qrcode_id)
-> Hash Left Join (cost=2.26..1120474.10 rows=1986709 width=1994)
Hash Cond: (p.jd_district_id = di.jd_district_id)
-> Nested Loop Left Join (cost=0.42..1093343.76 rows=1986709 width=
1856)
-> Seq Scan on jd_patient p (cost=0.00..107821.94 rows=198670
9 width=1840)
Filter: (fw_user_id IS NOT NULL)
-> Index Scan using pk_fw_user on fw_user u (cost=0.42..0.49
rows=1 width=32)
Index Cond: (fw_user_id = p.fw_user_id)
-> Hash (cost=1.37..1.37 rows=37 width=154)
-> Seq Scan on jd_district di (cost=0.00..1.37 rows=37 width=
154)
-> Hash (cost=427.45..427.45 rows=11245 width=30)
-> Seq Scan on jd_qrcode q (cost=0.00..427.45 rows=11245 width=30)
SubPlan 1
-> Index Scan using pk_fw_user on fw_user (cost=0.42..8.44 rows=1 width
=12)
Index Cond: (fw_user_id = u.parent_id)
(19 rows)
優(yōu)化方法
上面的語句查詢了大量數(shù)據(jù)(包括數(shù)據(jù)條數(shù)、以及p.* ),然后使用臨時表order by,但最終又只返回了10條數(shù)據(jù)。
在結(jié)果集大的時候,這個臨時表會非常大,內(nèi)存放不下,于是全部拷貝到磁盤,導(dǎo)致IO飆升,效率急劇下降。
優(yōu)化的基本策略是盡早縮小結(jié)果范圍,思路是拆分sql,將排序操作和查詢所有數(shù)據(jù)的操作分開。
具體來看,考慮到實際上需要的數(shù)據(jù)只有10條,我們可以先找到這10條數(shù)據(jù)的jd_patient_id:
SELECT p.jd_patient_id
FROM jd_patient p
LEFT JOIN fw_user u ON u.fw_user_id = p.fw_user_id
LEFT JOIN jd_district di ON p.jd_district_id = di.jd_district_id
LEFT JOIN jd_qrcode q ON p.jd_qrcode_id = q.jd_qrcode_id
WHERE 1 = 1
AND p.fw_user_id IS NOT NULL
ORDER BY p.create_date DESC, p.location
LIMIT 10 OFFSET 0;
然后在這10條數(shù)據(jù)中聯(lián)合查詢各字段的值:
SELECT p.name, di.name AS district, q.name AS scene_name, (
SELECT user_name
FROM fw_user
WHERE fw_user_id = u.parent_id
) AS parent_phone, split_part(p.location, ',', 1) AS province
, split_part(p.location, ',', 2) AS city
FROM jd_patient p
LEFT JOIN fw_user u ON u.fw_user_id = p.fw_user_id
LEFT JOIN jd_district di ON p.jd_district_id = di.jd_district_id
LEFT JOIN jd_qrcode q ON p.jd_qrcode_id = q.jd_qrcode_id
WHERE p.jd_patient_id IN ( xxxxx )
)
ORDER BY p.create_date DESC, p.location
于是優(yōu)化后的sql如下:
SELECT p.name, di.name AS district, q.name AS scene_name, (
SELECT user_name
FROM fw_user
WHERE fw_user_id = u.parent_id
) AS parent_phone, split_part(p.location, ',', 1) AS province
, split_part(p.location, ',', 2) AS city
FROM jd_patient p
LEFT JOIN fw_user u ON u.fw_user_id = p.fw_user_id
LEFT JOIN jd_district di ON p.jd_district_id = di.jd_district_id
LEFT JOIN jd_qrcode q ON p.jd_qrcode_id = q.jd_qrcode_id
WHERE p.jd_patient_id IN (
SELECT p.jd_patient_id
FROM jd_patient p
LEFT JOIN fw_user u ON u.fw_user_id = p.fw_user_id
LEFT JOIN jd_district di ON p.jd_district_id = di.jd_district_id
LEFT JOIN jd_qrcode q ON p.jd_qrcode_id = q.jd_qrcode_id
WHERE 1 = 1
AND p.fw_user_id IS NOT NULL
ORDER BY p.create_date DESC, p.location
LIMIT 10 OFFSET 0
)
ORDER BY p.create_date DESC, p.location
這樣便避免了產(chǎn)生大臨時表。
最終性能提高了兩個數(shù)量級(17975306.40 --> 150933.10):
QUERY PLAN
------------------------------------------------------------------------------------------
--------------------------------
Sort (cost=150933.10..150933.12 rows=10 width=201)
Sort Key: p.create_date, p.location
-> Nested Loop Left Join (cost=150755.44..150932.93 rows=10 width=201)
-> Nested Loop Left Join (cost=150755.15..150845.33 rows=10 width=203)
-> Nested Loop Left Join (cost=150755.01..150843.65 rows=10 width=81)
-> Nested Loop (cost=150754.59..150838.86 rows=10 width=81)
-> HashAggregate (cost=150754.16..150754.26 rows=10 width=16)
Group Key: p_1.jd_patient_id
-> Limit (cost=150754.01..150754.03 rows=10 width=86)
-> Sort (cost=150754.01..155720.78 rows=1986709 w
idth=86)
Sort Key: p_1.create_date, p_1.location
-> Seq Scan on jd_patient p_1 (cost=0.00..1
07821.94 rows=1986709 width=86)
Filter: (fw_user_id IS NOT NULL)
-> Index Scan using pk_jd_patient on jd_patient p (cost=0.43.
.8.45 rows=1 width=97)
Index Cond: (jd_patient_id = p_1.jd_patient_id)
-> Index Scan using pk_fw_user on fw_user u (cost=0.42..0.47 rows=1
width=32)
Index Cond: (fw_user_id = p.fw_user_id)
-> Index Scan using pk_jd_district on jd_district di (cost=0.14..0.16 row
s=1 width=154)
Index Cond: (p.jd_district_id = jd_district_id)
-> Index Scan using pk_jd_qrcode on jd_qrcode q (cost=0.29..0.30 rows=1 width=3
0)
Index Cond: (p.jd_qrcode_id = jd_qrcode_id)
SubPlan 1
-> Index Scan using pk_fw_user on fw_user (cost=0.42..8.44 rows=1 width=12)
Index Cond: (fw_user_id = u.parent_id)
(24 rows)