openGauss內(nèi)核分析(二):簡單查詢的執(zhí)行

SQL的解析過程主要分為:

詞法分析Lexical Analysis:將用戶輸入的SQL語句拆解成單詞(Token)序列,并識別出關(guān)鍵字、標(biāo)識、常量等。

語法分析Syntax Analysis:分析器對詞法分析器解析出來的單詞(Token)序列在語法上是否滿足SQL語法規(guī)則。

語義分析Semantic Analysis:語義分析是SQL解析過程的一個邏輯階段,主要任務(wù)是在語法正確的基礎(chǔ)上進行上下文有關(guān)性質(zhì)的審查,在SQL解析過程中該階段完成表名、操作符、類型等元素的合法性判斷,同時檢測語義上的二義性。

openGauss在pg_parse_query中調(diào)用raw_parser函數(shù)對用戶輸入的SQL命令進行詞法分析和語法分析,生成語法樹添加到鏈表parsetree_list中。完成語法分析后,對于parsetree_list中的每一顆語法樹parsetree,會調(diào)用parse_analyze函數(shù)進行語義分析,根據(jù)SQL命令的不同,執(zhí)行對應(yīng)的入口函數(shù),最終生成查詢樹


詞法分析Lexical Analysis

openGauss使用flex工具進行詞法分析。flex工具通過對已經(jīng)定義好的詞法文件進行編譯,生成詞法分析的代碼。詞法文件是scan.l,它根據(jù)SQL語言標(biāo)準對SQL語言中的關(guān)鍵字、標(biāo)識符、操作符、常量、終結(jié)符進行了定義和識別。在kwlist.h中定義了大量的關(guān)鍵字,按照字母的順序排列,方便在查找關(guān)鍵字時通過二分法進行查找。 在scan.l中處理“標(biāo)識符”時,會到關(guān)鍵字列表中進行匹配,如果一個標(biāo)識符匹配到關(guān)鍵字,則認為是關(guān)鍵字,否則才是標(biāo)識符,即關(guān)鍵字優(yōu)先. 以“select a, b from item”為例說明詞法分析結(jié)果。

名稱詞性內(nèi)容說明

關(guān)鍵字keywordSELECT,FROM如SELECT/FROM/WHERE等,對大小寫不敏感

標(biāo)識符IDENTa,b,item用戶自己定義的名字、常量名、變量名和過程名,若無括號修飾則對大小寫不敏感

語法分析Syntax Analysis

openGauss中定義了bison工具能夠識別的語法文件gram.y,根據(jù)SQL語言的不同定義了一系列表達Statement的結(jié)構(gòu)體(這些結(jié)構(gòu)體通常以Stmt作為命名后綴),用來保存語法分析結(jié)果。以SELECT查詢?yōu)槔?,它對?yīng)的Statement結(jié)構(gòu)體如下。

typedef struct SelectStmt

{

NodeTag type;

List ? *distinctClause; /* NULL, list of DISTINCT ON exprs, or

* lcons(NIL,NIL) for all (SELECT DISTINCT) */

IntoClause *intoClause; /* target for SELECT INTO */

List ? *targetList; /* the target list (of ResTarget) */

List ? *fromClause; /* the FROM clause */

Node ? *whereClause; /* WHERE qualification */

List ? *groupClause; /* GROUP BY clauses */

Node ? *havingClause; /* HAVING conditional-expression */

List ? *windowClause; /* WINDOW window_name AS (...), ... */

WithClause *withClause; /* WITH clause */

List ? *valuesLists; /* untransformed list of expression lists */

List ? *sortClause; /* sort clause (a list of SortBy's) */

Node ? *limitOffset; /* # of result tuples to skip */

Node ? *limitCount; /* # of result tuples to return */

? ? ……

} SelectStmt;

這個結(jié)構(gòu)體可以看作一個多叉樹,每個葉子節(jié)點都表達了SELECT查詢語句中的一個語法結(jié)構(gòu),對應(yīng)到gram.y中,它會有一個SelectStmt。代碼如下:


從simple_select語法分析結(jié)構(gòu)可以看出,一條簡單的查詢語句由以下子句組成:去除行重復(fù)的distinctClause、目標(biāo)屬性targetList、SELECT INTO子句intoClause、FROM子句fromClause、WHERE子句whereClause、GROUP BY子句groupClause、HAVING子句havingClause、窗口子句windowClause和plan_hint子句。在成功匹配simple_select語法結(jié)構(gòu)后,將會創(chuàng)建一個Statement結(jié)構(gòu)體,將各個子句進行相應(yīng)的賦值。對simple_select而言,目標(biāo)屬性、FROM子句、WHERE子句是最重要的組成部分。SelectStmt與其他結(jié)構(gòu)體的關(guān)系如下:


下面以“select a, b from item”為例說明簡單select語句的解析過程,函數(shù)exec_simple_query調(diào)用pg_parse_query執(zhí)行解析,解析樹中只有一個元素。


(gdb) p *parsetree_list

$47 = {type = T_List, length = 1, head = 0x7f5ff986c8f0, tail = 0x7f5ff986c8f0}

List中的節(jié)點類型為T_SelectStmt

(gdb) p *(Node *)(parsetree_list->head.data->ptr_value)

$45 = {type = T_SelectStmt}

查看SelectStmt結(jié)構(gòu)體,targetList 和fromClause非空

(gdb)set$stmt = (SelectStmt *)(parsetree_list->head.data->ptr_value)(gdb) p *$stmt$50= {type= T_SelectStmt, distinctClause =0x0, intoClause =0x0, targetList =0x7f5ffa43d588, fromClause =0x7f5ff986c888, startWithClause =0x0, whereClause =0x0, groupClause =0x0,? havingClause =0x0, windowClause =0x0, withClause =0x0, valuesLists =0x0, sortClause =0x0, limitOffset =0x0, limitCount =0x0, lockingClause =0x0, hintState =0x0, op = SETOP_NONE,all=false,? larg =0x0, rarg =0x0, hasPlus =false}

查看SelectStmt的targetlist,有兩個ResTarget

(gdb) p *($stmt->targetList)

$55 = {type = T_List, length = 2, head = 0x7f5ffa43d540, tail = 0x7f5ffa43d800}

(gdb) p *(Node *)($stmt->targetList->head.data->ptr_value)

$57 = {type = T_ResTarget}

(gdb)set$restarget1=(ResTarget *)($stmt->targetList->head.data->ptr_value)(gdb) p *$restarget1$60= {type= T_ResTarget,name=0x0, indirection =0x0, val =0x7f5ffa43d378, location =7}(gdb) p *$restarget1->val$63= {type= T_ColumnRef}(gdb) p *(ColumnRef *)$restarget1->val$64= {type= T_ColumnRef,fields=0x7f5ffa43d470,prior=false, indnum =0, location =7}(gdb) p *((ColumnRef *)$restarget1->val)->fields$66= {type= T_List,length=1,head=0x7f5ffa43d428, tail =0x7f5ffa43d428}(gdb) p *(Node *)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value$67= {type= T_String}(gdb) p *(Value*)(((ColumnRef *)$restarget1->val)->fields)->head.data->ptr_value$77= {type= T_String, val = {ival =140050197369648,str=0x7f5ffa43d330"a"}}

(gdb)set$restarget2=(ResTarget *)($stmt->targetList->tail.data->ptr_value)(gdb) p *$restarget2$89= {type= T_ResTarget,name=0x0, indirection =0x0, val =0x7f5ffa43d638, location =10}(gdb) p *$restarget2->val$90= {type= T_ColumnRef}(gdb) p *(ColumnRef *)$restarget2->val$91= {type= T_ColumnRef,fields=0x7f5ffa43d730,prior=false, indnum =0, location =10}(gdb) p *((ColumnRef *)$restarget2->val)->fields$92= {type= T_List,length=1,head=0x7f5ffa43d6e8, tail =0x7f5ffa43d6e8}(gdb) p *(Node *)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value$93= {type= T_String}(gdb) p *(Value*)(((ColumnRef *)$restarget2->val)->fields)->head.data->ptr_value$94= {type= T_String, val = {ival =140050197370352,str=0x7f5ffa43d5f0"b"}}

查看SelectStmt的fromClause,有一個RangeVar

(gdb) p *$stmt->fromClause$102 = {type = T_List, length = 1, head = 0x7f5ffa43dfe0, tail = 0x7f5ffa43dfe0}(gdb)set$fromclause=(RangeVar*)($stmt->fromClause->head.data->ptr_value)(gdb) p *$fromclause$103= {type= T_RangeVar, catalogname =0x0, schemaname =0x0, relname =0x7f5ffa43d848"item", partitionname =0x0, subpartitionname =0x0, inhOpt = INH_DEFAULT, relpersistence =112'p',alias=0x0,? location =17, ispartition =false, issubpartition =false, partitionKeyValuesList =0x0, isbucket =false, buckets =0x0,length=0, foreignOid =0, withVerExpr =false}

綜合以上分析可以得到語法樹結(jié)構(gòu)


語義分析Semantic Analysis

在完成詞法分析和語法分析后,parse_analyze函數(shù)會根據(jù)語法樹的類型,調(diào)用transformSelectStmt將parseTree改寫為查詢樹


(gdb) p *result

$3 = {type = T_Query, commandType = CMD_SELECT, querySource = QSRC_ORIGINAL, queryId = 0, canSetTag = false, utilityStmt = 0x0, resultRelation = 0, hasAggs = false, hasWindowFuncs = false,

? hasSubLinks = false, hasDistinctOn = false, hasRecursive = false, hasModifyingCTE = false, hasForUpdate = false, hasRowSecurity = false, hasSynonyms = false, cteList = 0x0, rtable = 0x7f5ff5eb8c88,

? jointree = 0x7f5ff5eb9310, targetList = 0x7f5ff5eb9110,…}

(gdb) p *result->targetList

$13 = {type = T_List, length = 2, head = 0x7f5ff5eb90c8, tail = 0x7f5ff5eb92c8}

(gdb) p *(Node *)(result->targetList->head.data->ptr_value)

$8 = {type = T_TargetEntry}

(gdb) p *(TargetEntry*)(result->targetList->head.data->ptr_value)

$9 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff636ff48, resno = 1, resname = 0x7f5ff5caf330 "a", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 1, resjunk = false}

(gdb) p *(TargetEntry*)(result->targetList->tail.data->ptr_value)

$10 = {xpr = {type = T_TargetEntry, selec = 0}, expr = 0x7f5ff5eb9178, resno = 2, resname = 0x7f5ff5caf5f0 "b", ressortgroupref = 0, resorigtbl = 24576, resorigcol = 2, resjunk = false}

(gdb)

(gdb) p *result->rtable

$14 = {type = T_List, length = 1, head = 0x7f5ff5eb8c40, tail = 0x7f5ff5eb8c40}

(gdb)? p *(Node *)(result->rtable->head.data->ptr_value)

$15 = {type = T_RangeTblEntry}

(gdb) p *(RangeTblEntry*)(result->rtable->head.data->ptr_value)

$16 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relname = 0x7f5ff636efb0 "item", partAttrNum = 0x0, relid = 24576, partitionOid = 0, isContainPartition = false, subpartitionOid = 0……}

得到的查詢樹結(jié)構(gòu)如下:


完成詞法、語法和語義分析后,SQL解析過程完成,SQL引擎開始執(zhí)行查詢優(yōu)化。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容