ClickHouse 21.7.3.14-2(九) 語法優(yōu)化規(guī)則

ClickHouse 的 SQL 優(yōu)化規(guī)則是基于 RBO(Rule Based Optimization),下面是一些優(yōu)化規(guī)則

上傳測(cè)試用例

將官方提供的測(cè)試集 visits_v1.tar 和 hits_v1.tar下載并上傳到虛擬機(jī),解壓到 clickhouse 數(shù)據(jù)路徑下。數(shù)據(jù)下載地址 ,當(dāng)然官方也可以在線的方式,不需要下載就測(cè)試sql語句查詢。在線地址

# 解壓到 clickhouse 數(shù)據(jù)路徑
sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse
sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse
# 修改所屬用戶
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets
sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets

重啟 clickhouse-server,執(zhí)行查詢

clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"

注意:官方的 tar 包,包含了建庫、建表語句、數(shù)據(jù)內(nèi)容,這種方式不需要手動(dòng)建庫、建表,最方便。hits_v1 表有 130 多個(gè)字段,880 多萬條數(shù)據(jù),visits_v1 表有 180 多個(gè)字段,160 多萬條數(shù)據(jù)。

COUNT 優(yōu)化

在調(diào)用 count 函數(shù)時(shí),如果使用的是 count() 或者 count(*),且沒有 where 條件,則會(huì)直接使用 system.tables 的 total_rows。以下語句可以使用 explain plan select.. 來看變化

#  默認(rèn)會(huì)查詢 count 文件直接拿到數(shù)據(jù)
select count() from hits_v1
# 會(huì)轉(zhuǎn)變成 count()
select count(*) from hits_v1
# 會(huì)轉(zhuǎn)變成 count()
select count(1) from hits_v1
# 只要不是寫具體的字段不會(huì)觸發(fā)沖洗計(jì)算
select count(UserID) from hits_v1

消除子查詢重復(fù)字段或無用字段

# 實(shí)際查詢語句
EXPLAIN SYNTAX SELECT
 a.UserID,
 b.VisitID,
 a.URL,
 b.UserID
 FROM
 hits_v1 AS a
 LEFT JOIN (
 SELECT
 UserID,
 UserID as HaHa,
 VisitID
 FROM visits_v1) AS b
 USING (UserID)
 limit 3;

# 返回優(yōu)化語句
SELECT
 UserID,
 VisitID,
 URL,
 b.UserID
FROM hits_v1 AS a
ALL LEFT JOIN
(
 SELECT
 UserID,
 VisitID
 FROM visits_v1
) AS b USING (UserID)
LIMIT 3

謂詞下推

當(dāng) group by 有 having 子句,但是沒有 with cube、with rollup 或者 with totals 修飾的時(shí)候,having 過濾會(huì)下推到 where 提前過濾。例如下面的查詢,having name 變成了 where name,在 group by 之前過濾。

# 查詢語句
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID ='8585742290196126178';

# 返回優(yōu)化語句
SELECT UserID FROM hits_v1 WHERE UserID = '8585742290196126178' GROUP BY UserID
# 查詢語句
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM visits_v1 ) WHERE UserID = '8585742290196126178'

# 返回優(yōu)化后的語句
SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178' ) WHERE UserID = '8585742290196126178'
# 查詢語句
EXPLAIN SYNTAX SELECT * FROM ( 
  SELECT * FROM ( SELECT UserID FROM visits_v1)  
    UNION ALL 
  SELECT *   FROM ( SELECT UserID FROM visits_v1) 
) WHERE UserID = '8585742290196126178'

# 優(yōu)化后的
SELECT UserID FROM ( 
  SELECT UserID FROM ( 
    SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178' 
  ) WHERE UserID = '8585742290196126178' 
    UNION ALL 
  SELECT UserID FROM ( 
    SELECT UserID FROM visits_v1 WHERE UserID = '8585742290196126178'  
  ) WHERE UserID = '8585742290196126178' 
) WHERE UserID = '8585742290196126178'

聚合計(jì)算外推

聚合函數(shù)內(nèi)的計(jì)算,會(huì)外推

# 查詢語句
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM visits_v1

# 優(yōu)化后
SELECT sum(UserID) * 2 FROM visits_v1

聚合函數(shù)消除

如果對(duì)聚合鍵,也就是 group by key 使用 min、max、any 聚合函數(shù),則將函數(shù)消除

# 查詢語句
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM visits_v1 GROUP BY UserID

# 優(yōu)化后
SELECT sum(UserID) * 2, max(VisitID), UserID FROM visits_v1 GROUP BY UserID

刪除重復(fù)的 order by key

# 查詢語句
EXPLAIN SYNTAX SELECT * FROM visits_v1 ORDER BY  UserID ASC, UserID ASC, VisitID ASC,VisitID ASC

# 優(yōu)化后
select …… FROM visits_v1 ORDER BY UserID ASC,VisitID ASC

刪除重復(fù)的 limit by key

# 查詢語句
EXPLAIN SYNTAX SELECT * FROM visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10

# 返回優(yōu)化后的語句:
select …… FROM visits_v1 LIMIT 3 BY VisitID LIMIT 10

刪除重復(fù)的 USING Key

# 查詢語句
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a 
LEFT JOIN visits_v1 AS b USING (UserID, UserID)

# 返回優(yōu)化后的語句:
SELECT  UserID, UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN visits_v1 AS b USING (UserID)

標(biāo)量替換

如果子查詢只返回一行數(shù)據(jù),在被引用的時(shí)候用標(biāo)量替換,例如下面語句中的 total_disk_usage 字段

# 查詢語句
EXPLAIN SYNTAX WITH (  SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT
 (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, 
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10;

# 返回優(yōu)化后的語句
WITH CAST(0, 'UInt64') AS total_disk_usage SELECT  (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage,
table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10

三元運(yùn)算優(yōu)化

如果開啟了 optimize_if_chain_to_multiif 參數(shù),三元運(yùn)算符會(huì)被替換成 multiIf 函數(shù)

# 查詢語句
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'atguigu') FROM numbers(10) 
settings optimize_if_chain_to_multiif = 1;

# 返回優(yōu)化后的語句:
SELECT multiIf(number = 1, 'hello', number = 2, 'world', 'atguigu') FROM numbers(10)
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容