tpch用法

一、初識(shí)tpch

1.tpch簡(jiǎn)介

TPCH是國(guó)際事務(wù)處理性能委員會(huì)(TPC,Transaction Processing Performance Council)于 1994 年指定的標(biāo)準(zhǔn),是一款面向商品零售業(yè)務(wù)的決策支持系統(tǒng)測(cè)試基準(zhǔn),TPCH包含8張表,22個(gè)查詢,遵循SQL92。
tpc事務(wù)處理性能委員會(huì)( Transaction ProcessingPerformance Council )
tpc-h 模擬決策支持系統(tǒng)中的數(shù)據(jù)庫(kù)操作,測(cè)試數(shù)據(jù)庫(kù)系統(tǒng)復(fù)雜查詢的響應(yīng)時(shí)間,以每小時(shí)執(zhí)行的查詢數(shù)(TPC-H QphH@Siz)作為度量指標(biāo).

tpc-h規(guī)范:
Query: 22條sql
Modify: RF1, RF2
3個(gè)測(cè)試:數(shù)據(jù)裝載、順序執(zhí)行一次query+modify、模擬生產(chǎn)

模擬生產(chǎn):在22個(gè)查詢執(zhí)行的同時(shí),還有一對(duì)更新操作RF1 和RF2 并發(fā)地執(zhí)行,RF1向Order 表和Lineitem 表中插入原行數(shù)的0.1%的新行,模擬新銷(xiāo)售業(yè)務(wù)的數(shù)據(jù)加入到數(shù)據(jù)庫(kù)中;RF2 從Order 表和Lineitem表中刪除等量與RF1 增加的數(shù)據(jù),模擬舊的銷(xiāo)售數(shù)據(jù)被淘汰。

2.tpch關(guān)系圖

TPC-H包括8張表(表上有些約束等需要滿足,參見(jiàn)TPC-H規(guī)范),如下:
PART:表示零件的信息,如表4-1所示。
SUPPLIER:表示供貨商的信息,如表4-2所示。
PARTSUPP:表示供貨商的零件的信息,如表4-3所示。
CUSTOMER:表示消費(fèi)者的信息,如表4-4所示。
ORDERS:表示訂單的信息,如表4-5所示。
LINEITEM:表示在線商品的信息,如表4-6所示。
NATION:表示國(guó)家的信息,如表4-7所示。
REGION:表示地區(qū)的信息,如表4-8所示。
這8張表之間的關(guān)系,如圖4-1所示或圖4-2所示。


image.png

二、安裝TPCH工具

1.安裝工具下載:

1.TPC-H工具下載網(wǎng)站:http://www.tpc.org/tpch/
2.鏈接:https://pan.baidu.com/s/1VuB-reZ_1EiWT8cj_wrGhw 
提取碼:rqgg

2.解壓安裝包

先在windows下解壓得到zip壓縮包,再上傳服務(wù)器。
unzip tpc-h-tool-3.0.0.zip -d  /root

3.編輯makefile.suite文件

cd /root/TPC-H_Tools_v3.0.0/dbgen
vi makefile.suite 
CC      = gcc
DATABASE= POSTGRESQL
MACHINE = LINUX
WORKLOAD = TPCH

4.編輯tpcd.h,追加如下配置

vi tpcd.h
#ifdef POSTGRESQL
#define GEN_QUERY_PLAN  "EXPLAIN PLAN"
#define START_TRAN      "SET TRANSACTION"
#define END_TRAN        "COMMIT;"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "LIMIT %d\n"
#define SET_DBASE      ""
#endif

5.執(zhí)行編譯安裝

make -f makefile.suite

6.生成裸數(shù)據(jù)

dbgen參數(shù)詳解
-v 詳細(xì)信息
-s 表示生成G數(shù)據(jù)
-1 SF,Scale Factor ,數(shù)據(jù)庫(kù)的比例因子。TPC-H標(biāo)準(zhǔn)規(guī)定,測(cè)試數(shù)據(jù)庫(kù)的比例因子必須從下列固定值中選擇:1,10,30,100,1000,3000,10000 (相當(dāng)于1GB,10GB,30GB,100GB,1000GB,3000GB,10000GB)。數(shù)據(jù)庫(kù)的大小缺省定義為1(例如:SF=1;近似于1GB)。
-S 切分?jǐn)?shù)據(jù)
-f 覆蓋之前的文件
更多參數(shù)可使用 ./dbgen -help 查看

./dbgen -s xxx ---這里的xxx表示想要生成的數(shù)據(jù)量(單位GB)
注:這里的-s 參數(shù)指定生成測(cè)試數(shù)據(jù)的倉(cāng)庫(kù)數(shù),建議設(shè)置100以上,這里為了方便,選取較小的值,真實(shí)業(yè)務(wù)場(chǎng)景測(cè)試可設(shè)置1000以上。
生成的數(shù)據(jù)文件存放在/root/TPC-H_Tools_v3.0.0/dbgen目錄下,以tbl結(jié)尾,一共8個(gè)文件,對(duì)應(yīng)的是8個(gè)表。

三、列存測(cè)試

初始化數(shù)據(jù)庫(kù):

create database tpch;
--create tablespace tpch location '/data/tpch';
--alter database tpch set tablespace tpch;

1.創(chuàng)建測(cè)試表

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
                            N_NULL    VARCHAR(10)
                            )with (ORIENTATION = COLUMN);
CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
                            R_NULL VARCHAR(10)  
                            )with (ORIENTATION = COLUMN);
CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
                          P_NULL VARCHAR(10) 
                          )with (ORIENTATION = COLUMN);
                        
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
                             S_NULL VARCHAR(10)         
                             )with (ORIENTATION = COLUMN);
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
                             PS_NULL  VARCHAR(10)   
                             )with (ORIENTATION = COLUMN);
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
                             C_NULL  VARCHAR(10)        
                             )with (ORIENTATION = COLUMN);
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
                           O_NULL   VARCHAR(10)       
                           )with (ORIENTATION = COLUMN);
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
                             L_NULL  VARCHAR(10)  
                             )with (ORIENTATION = COLUMN);

2.創(chuàng)建約束

ALTER TABLE LINEITEM ADD PARTIAL CLUSTER KEY (L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE);
ALTER TABLE ORDERS ADD PARTIAL CLUSTER KEY (O_ORDERDATE);

3.導(dǎo)入數(shù)據(jù)前,上面生成文件傳至數(shù)據(jù)庫(kù)服務(wù)器上

從tpch安裝服務(wù)器把文件傳至數(shù)據(jù)庫(kù)服務(wù)器上
scp *.tbl root@10.0.0.151:/tmp    
給文件授權(quán)
chmod -R 777 /tmp/*.tbl

4.導(dǎo)入數(shù)據(jù)

copy customer from '/tmp/customer.tbl' DELIMITERS '|';
copy lineitem from '/tmp/lineitem.tbl' DELIMITERS '|';
copy nation from '/tmp/nation.tbl' DELIMITERS '|';
copy orders from '/tmp/orders.tbl' DELIMITERS '|';
copy partsupp from '/tmp/partsupp.tbl' DELIMITERS '|';
copy part from '/tmp/part.tbl' DELIMITERS '|';
copy region from '/tmp/region.tbl' DELIMITERS '|';
copy supplier from '/tmp/supplier.tbl' DELIMITERS '|';
vacuum ANALYZE ;

四、執(zhí)行22條測(cè)試sql

Q1:定價(jià)匯總報(bào)表查詢
定價(jià)匯總報(bào)表查詢是分組、排序、聚集操作并存的單表查詢操作,報(bào)告已開(kāi)票、發(fā)貨和退回的業(yè)務(wù)量; 為給定日期發(fā)貨的所有訂單項(xiàng)提供匯總定價(jià)報(bào)告,按l_returnflag和l_linestatus分組并排序,列出了擴(kuò)展價(jià)格、折扣擴(kuò)展價(jià)格、折扣擴(kuò)展價(jià)格加稅、平均數(shù)量、平均擴(kuò)展價(jià)格、平均折扣、各個(gè)分組的總行數(shù)。

Q1語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作并存的單表查詢操作。這個(gè)查詢會(huì)導(dǎo)致表上的數(shù)據(jù)有95%到97%行被讀取到。

--語(yǔ)句詳解

select 
    l_returnflag, //返回標(biāo)志
    l_linestatus, 
    sum(l_quantity) as sum_qty, //總的數(shù)量
    sum(l_extendedprice) as sum_base_price, //聚集函數(shù)操作
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, 
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, 
    avg(l_quantity) as avg_qty, 
    avg(l_extendedprice) as avg_price, 
    avg(l_discount) as avg_disc, 
    count(*) as count_order //每個(gè)分組所包含的行數(shù)
from 
    lineitem
where 
    l_shipdate <= date'1998-12-01' - interval '90' day //時(shí)間段是隨機(jī)生成的
group by //分組操作
    l_returnflag, 
    l_linestatus
order by //排序操作
    l_returnflag, 
    l_linestatus;

示例sql:

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '103' day
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus
LIMIT 1;

Q2:最低成本供應(yīng)商查詢
最低成本供應(yīng)商查詢是帶有排序、聚集操作、子查詢并存的多表查詢操作,查找應(yīng)該選擇那個(gè)供應(yīng)商來(lái)為給定地區(qū)的給定零件下訂單;在給定的區(qū)域中,針對(duì)特定類(lèi)型和尺寸的每個(gè)零件,找到能夠以最低成本供應(yīng)的供應(yīng)商。如果該地區(qū)的多個(gè)供應(yīng)商以相同的(最低)成本供貨。查詢列出了供應(yīng)商的賬戶余額、名稱(chēng)和國(guó)家,零部件編號(hào)、制造商,供應(yīng)商的地址、電話號(hào)碼和評(píng)論信息。

Q2語(yǔ)句的特點(diǎn)是:帶有排序、聚集操作、子查詢并存的多表查詢操作。查詢語(yǔ)句沒(méi)有從語(yǔ)法上限制返回多少條元組,但是TPC-H標(biāo)準(zhǔn)規(guī)定,查詢結(jié)果只返回前100行(通常依賴(lài)于應(yīng)用程序?qū)崿F(xiàn))。

--語(yǔ)句詳解

select
    s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment 
  /*查詢供應(yīng)者的帳戶余額、名字、國(guó)家、零件的號(hào)碼、生產(chǎn)者、供應(yīng)者的地址、電話號(hào)碼、備注信息 */
from
    part, supplier, partsupp, nation, region //五表連接
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = [SIZE] //指定大小,在區(qū)間[1, 50]內(nèi)隨機(jī)選擇
    and p_type like '%[TYPE]' //指定類(lèi)型,在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地區(qū),在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    and ps_supplycost = ( //子查詢
        select
            min(ps_supplycost) //聚集函數(shù)
        from
            partsupp, supplier, nation, region //與父查詢的表有重疊
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = '[REGION]'
    )
order by //排序
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey;

示例sql:

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region,
    (
        select
            ps_partkey,
            min(ps_supplycost) as min
        from
            partsupp,
            supplier,
            nation,
            region
        where
            s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
        group by
            ps_partkey
    ) temp
where
    p_partkey = partsupp.ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 47
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = temp.min
    and p_partkey = temp.ps_partkey
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
LIMIT 100;

Q3:運(yùn)輸優(yōu)先級(jí)查詢
運(yùn)輸優(yōu)先級(jí)查詢是帶有分組、排序、聚集操作并存的三表查詢操作,檢索價(jià)值最高的未發(fā)貨訂單;在給定日期尚未發(fā)貨的訂單中收入最大的訂單的運(yùn)輸優(yōu)先級(jí)和潛在收入(l_extendedprice * (1-l_discount)的總和),訂單按照收入的降序排序。

Q3語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作并存的三表查詢操作。查詢語(yǔ)句沒(méi)有從語(yǔ)法上限制返回多少條元組,但是TPC-H標(biāo)準(zhǔn)規(guī)定,查詢結(jié)果只返回前10行(通常依賴(lài)于應(yīng)用程序?qū)崿F(xiàn))。

--語(yǔ)句詳解

select
    l_orderkey,
    sum(l_extendedprice*(1-l_discount)) as revenue, //潛在的收入,聚集操作
    o_orderdate,
    o_shippriority
from
    customer, orders, lineitem //三表連接
where
    c_mktsegment = '[SEGMENT]' //在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '[DATE]' //指定日期段,在在[1995-03-01, 1995-03-31]中隨機(jī)選擇
    and l_shipdate > date '[DATE]'
group by //分組操作
    l_orderkey, //訂單標(biāo)識(shí)
    o_orderdate, //訂單日期
    o_shippriority //運(yùn)輸優(yōu)先級(jí)
order by //排序操作
    revenue desc, //降序排序,把潛在最大收入列在前面
    o_orderdate;

示例sql:

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-20'
    and l_shipdate > date '1995-03-20'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
LIMIT 10;

Q4:訂單優(yōu)先級(jí)查詢
訂單優(yōu)先級(jí)查詢是帶有分組、排序、聚集操作、子查詢并存的單表查詢操作,確定訂單優(yōu)先系統(tǒng)的工作情況并評(píng)估客戶滿意度;在給定季度中的訂購(gòu)數(shù)量,其中客戶在承諾日志之后收到至少一個(gè)訂單項(xiàng),列出了按優(yōu)先級(jí)升序排序的每個(gè)訂單優(yōu)先級(jí)的此類(lèi)訂單的計(jì)數(shù)。

Q4語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作、子查詢并存的單表查詢操作。子查詢是相關(guān)子查詢。

--語(yǔ)句詳解

select
    o_orderpriority, //訂單優(yōu)先級(jí)
    count(*) as order_count //訂單優(yōu)先級(jí)計(jì)數(shù)
from orders //單表查詢
where
    o_orderdate >= date '[DATE]'
    and o_orderdate < date '[DATE]' + interval '3' month //指定訂單的時(shí)間段+三個(gè)月,DATE是在1993年1月和1997年10月之間隨機(jī)選擇的一個(gè)月的第一天
    and exists ( //子查詢
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by //按訂單優(yōu)先級(jí)分組
    o_orderpriority
order by //按訂單優(yōu)先級(jí)排序
    o_orderpriority;

示例sql:

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1995-12-01'
    and o_orderdate < date '1995-12-01' + interval '3' month
    and exists 
    (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate 
        limit 1
    )
group by
    o_orderpriority
order by
    o_orderpriority
LIMIT 1;

Q5:本地供應(yīng)商收入量查詢
本地供應(yīng)商收入量查詢是帶有分組、排序、聚集操作、子查詢并存的多表連接查詢操作,列出了通過(guò)本地供應(yīng)商完成的收入量;為一個(gè)地區(qū)的每個(gè)國(guó)家列出了由訂單項(xiàng)交易產(chǎn)生的收入量,以確定是否在給定區(qū)域建立本地配送中心。

Q5語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作、子查詢并存的多表連接查詢操作。

--語(yǔ)句詳解

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue //聚集操作
from
    customer,orders,lineitem,supplier,nation,region //六表連接
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地區(qū),在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    and o_orderdate >= date '[DATE]' //DATE是從1993年到1997年中隨機(jī)選擇的一年的1月1日
    and o_orderdate < date '[DATE]' + interval '1' year
group by //按名字分組
    n_name
order by //按收入降序排序,注意分組和排序子句不同
    revenue desc;

示例sql:

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'AMERICA'
    and o_orderdate >= date '1997-01-01'
    and o_orderdate < date '1997-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc
LIMIT 1;

Q6:預(yù)測(cè)收入變化查詢
預(yù)測(cè)收入變化查詢是帶有聚集操作的單表查詢操作,量化了在給定年份中給定百分比范圍內(nèi)消除某些公司范圍內(nèi)的折扣所導(dǎo)致的收入增長(zhǎng)量,用于尋找增加收入的方法;在給定的年份發(fā)貨的所有訂單項(xiàng),折扣介于“DISCOUNT-0.01”和“DISCOUNT+0.01”之間,如果l_quantity小于quantity的訂單的折扣被消除后總收入增加的數(shù)量。

Q6語(yǔ)句的特點(diǎn)是:帶有聚集操作的單表查詢操作。查詢語(yǔ)句使用了BETWEEN-AND操作符,有的數(shù)據(jù)庫(kù)可以對(duì)BETWEEN-AND進(jìn)行優(yōu)化。

--語(yǔ)句詳解

select
    sum(l_extendedprice*l_discount) as revenue //潛在的收入增加量
from
    lineitem //單表查詢
where
    l_shipdate >= date '[DATE]' //DATE是從[1993, 1997]中隨機(jī)選擇的一年的1月1日
    and l_shipdate < date '[DATE]' + interval '1' year //一年內(nèi)
    and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between
    and l_quantity < [QUANTITY]; // QUANTITY在區(qū)間[24, 25]中隨機(jī)選擇

示例sql:

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1997-01-01'
    and l_shipdate < date '1997-01-01' + interval '1' year
    and l_discount between 0.07 - 0.01 and 0.07 + 0.01
    and l_quantity < 24
LIMIT 1;

Q7:批量出貨查詢
批量出貨查詢是帶有分組、排序、聚集、子查詢操作并存的多表查詢操作,確定在某些國(guó)家直接運(yùn)輸?shù)呢浳锏膬r(jià)值以幫助重新談判運(yùn)輸合同;對(duì)于兩個(gè)給定的國(guó)家,在1995年到1996年期間,從其中一個(gè)國(guó)家的供應(yīng)商想另一個(gè)國(guó)家的客戶運(yùn)送零件的訂單項(xiàng)獲得的總折扣收入。

Q7語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢操作并存的多表查詢操作。子查詢的父層查詢不存在其他查詢對(duì)象,是格式相對(duì)簡(jiǎn)單的子查詢。

--語(yǔ)句詳解

select
    supp_nation, //供貨商國(guó)家
    cust_nation, //顧客國(guó)家
    l_year, sum(volume) as revenue //年度、年度的貨運(yùn)收入
from ( //子查詢
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,lineitem,orders,customer,nation n1,nation n2 //六表連接
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and ( // NATION2和NATION1的值不同,表示查詢的是跨國(guó)的貨運(yùn)情況
            (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')
            or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

示例sql:

select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
(
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and 
        (
            (n1.n_name = 'CHINA' and n2.n_name = 'INDONESIA')
            or (n1.n_name = 'INDONESIA' and n2.n_name = 'CHINA')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year
LIMIT 1;

Q8:全國(guó)市場(chǎng)份額查詢
全國(guó)市場(chǎng)份額查詢帶有分組、排序、聚集、子查詢操作并存的查詢操作,確定給定區(qū)域內(nèi)給定國(guó)家的市場(chǎng)份額在兩年內(nèi)對(duì)于給定零件類(lèi)型的變化情況; 給定區(qū)域內(nèi)給定國(guó)家的市場(chǎng)份額被定義為收入的一部分,即 l_extendedprice * (1-l_discount)的總和。

Q8語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢操作并存的查詢操作。子查詢的父層查詢不存在其他查詢對(duì)象,是格式相對(duì)簡(jiǎn)單的子查詢,但子查詢自身是多表連接的查詢。

--語(yǔ)句詳解:

select
    o_year, //年份
    sum(case
    when nation = '[NATION]'//指定國(guó)家,在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    then volume
    else 0
    end) / sum(volume) as mkt_share //市場(chǎng)份額:特定種類(lèi)的產(chǎn)品收入的百分比;聚集操作
from ( //子查詢
    select
        extract(year from o_orderdate) as o_year, //分解出年份
        l_extendedprice * (1-l_discount) as volume, //特定種類(lèi)的產(chǎn)品收入
        n2.n_name as nation
    from
        part,supplier,lineitem,orders,customer,nation n1,nation n2,region //八表連接
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = '[REGION]' //指定地區(qū),在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31' //只查95、96年的情況
        and p_type = '[TYPE]' //指定零件類(lèi)型,在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
) as all_nations
group by //按年分組
    o_year
order by //按年排序
    o_year;

示例sql:

select
    o_year,
    sum(case
    when nation = 'INDONESIA' then volume
    else 0
    end)   / sum(volume) as mkt_share
from
(
    select
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) as volume,
        n2.n_name as nation
        from
        part,
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2,
        region
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31'
        and p_type = 'PROMO POLISHED TIN'
) as all_nations
group by
    o_year
order by
    o_year
LIMIT 1;

Q9:產(chǎn)品類(lèi)型利潤(rùn)度量查詢
產(chǎn)品類(lèi)型利潤(rùn)度量查詢是帶有分組、排序、聚集、子查詢操作并存的查詢操作,確定按供應(yīng)商國(guó)家和年份細(xì)分的給定零件系列的利潤(rùn)。

Q9語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢操作并存的查詢操作。子查詢的父層查詢不存在其他查詢對(duì)象,是格式相對(duì)簡(jiǎn)單的子查詢,但子查詢自身是多表連接的查詢。子查詢中使用了LIKE操作符,有的查詢優(yōu)化器不支持對(duì)LIKE操作符進(jìn)行優(yōu)化。

--語(yǔ)句詳解

select
    nation,
    o_year,
    sum(amount) as sum_profit //每個(gè)國(guó)家每一年所有被定購(gòu)的零件在一年中的總利潤(rùn)
from(
  select
        n_name as nation, //國(guó)家
        extract(year from o_orderdate) as o_year, //取出年份
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //利潤(rùn)
    from
        part,supplier,lineitem,partsupp,orders,nation //六表連接
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%[COLOR]%' //LIKE操作,查詢優(yōu)化器可能進(jìn)行優(yōu)化
) as profit
group by //按國(guó)家和年份分組
    nation,
    o_year
order by //按國(guó)家和年份排序,年份大者靠前
    nation,
    o_year desc;

示例sql:

select
    nation,
    o_year,
    sum(amount) as sum_profit
from
(
    select
        n_name as nation,
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from
        part,
        supplier,
        lineitem,
        partsupp,
        orders,
        nation
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%thistle%'
) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc
LIMIT 1;

Q10:退貨報(bào)告查詢
退貨報(bào)告查詢是帶有分組、排序、聚集操作并存的多表連接查詢操作,識(shí)別可能對(duì)運(yùn)送給他們的零件有問(wèn)題的客戶;根據(jù)對(duì)給定季度收入損失的影響,找出排名靠前的已退回零件的客戶。

Q10語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作并存的多表連接查詢操作。查詢語(yǔ)句沒(méi)有從語(yǔ)法上限制返回多少條元組,但是TPC-H標(biāo)準(zhǔn)規(guī)定,查詢結(jié)果只返回前10行(通常依賴(lài)于應(yīng)用程序?qū)崿F(xiàn))。

--語(yǔ)句詳解:

select
    c_custkey, c_name, //客戶信息
    sum(l_extendedprice * (1 - l_discount)) as revenue, //收入損失
    c_acctbal,
    n_name, c_address, c_phone, c_comment //國(guó)家、地址、電話、意見(jiàn)信息等
from
    customer, orders, lineitem, nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '[DATE]' // DATE是位于1993年一月到1994年十二月中任一月的一號(hào)
    and o_orderdate < date '[DATE]' + interval '3' month //3個(gè)月內(nèi)
    and l_returnflag = 'R' //貨物被回退
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc;

示例sql:

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1994-06-01'
    and o_orderdate < date '1994-06-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
LIMIT 20;

Q11:庫(kù)存價(jià)值查詢
庫(kù)存價(jià)值查詢是帶有分組、排序、聚集、子查詢操作并存的多表連接查詢操作,查找給定國(guó)家/地區(qū)供應(yīng)商的庫(kù)存價(jià)值; 掃描給定國(guó)家的供應(yīng)商的可用庫(kù)存來(lái)查找占有所有可用零件總價(jià)值的重要百分比的所有零件。

Q11語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢操作并存的多表連接查詢操作。子查詢位于分組操作的HAVING條件中。

--語(yǔ)句詳解

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value //聚集操作,商品的總價(jià)值
from
    partsupp, supplier, nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = '[NATION]'
group by
    ps_partkey 
having //帶有HAVING子句的分組操作
    sum(ps_supplycost * ps_availqty) > ( //HAVING子句中包括有子查詢
        select
            sum(ps_supplycost * ps_availqty) * [FRACTION] //子查詢中存在聚集操作;FRACTION為0.0001/SF1
        from
            partsupp, supplier, nation //與父查詢的表連接一致
        where //與父查詢的WHEWR條件一致
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = '[NATION]' //指定國(guó)家
    )
order by //按商品的價(jià)值降序排序
    value desc;

示例sql:

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'ALGERIA'
group by
    ps_partkey having
    sum(ps_supplycost * ps_availqty) > 
    (
        select
            sum(ps_supplycost * ps_availqty) * 0.0001000000
        from
            partsupp,
            supplier,
            nation
        where
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = 'ALGERIA'
    )
order by
    value desc
LIMIT 1;

Q12:運(yùn)送方式和訂單優(yōu)先級(jí)查詢
運(yùn)送方式和訂單優(yōu)先級(jí)查詢是帶有分組、排序、聚集操作并存的兩表連接查詢操作,確定選擇較便宜的運(yùn)送方式是否會(huì)對(duì)關(guān)鍵優(yōu)先級(jí)訂單產(chǎn)生負(fù)面影響,導(dǎo)致消費(fèi)者更多的在合同日期之后收到貨物。

Q12語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集操作并存的兩表連接查詢操作。

--語(yǔ)句詳解

select
    l_shipmode,
    sum(//聚集操作
        case when 
      o_orderpriority ='1-URGENT' //OR運(yùn)算,二者滿足其一即可,選出URGENT或HIGH的
            or o_orderpriority ='2-HIGH'
        then 1
    else 0
        end) as high_line_count,
    sum(
    case when 
        o_orderpriority <> '1-URGENT' //AND運(yùn)算,二者都不滿足,非URGENT非HIGH的
            and o_orderpriority <> '2-HIGH'
        then 1
        else 0
        end) as low_line_count
from
    orders,lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') 
  /* 指定貨運(yùn)模式的類(lèi)型,在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇,SHIPMODE2必須有別于SHIPMODE1 */
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '[DATE]' //從1993年到1997年中任一年的一月一號(hào)
    and l_receiptdate < date '[DATE]' + interval '1' year //1年之內(nèi)
group by //分組操作
    l_shipmode
order by //排序操作
    l_shipmode;

示例sql:

select
    l_shipmode,
    sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count,
    sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('RAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode
LIMIT 1;

Q13:客戶分布查詢
客戶分布查詢是帶有分組、排序、聚集、子查詢、左外連接操作并存的查詢操作;根究客戶的訂單數(shù)量確定客戶的分布,計(jì)算并報(bào)告有多少客戶沒(méi)有訂單,有多個(gè)客戶有1/2/3 等個(gè)訂單。

Q13語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢、左外連接操作并存的查詢操作。

--語(yǔ)句詳解

select
    c_count, count(*) as custdist //聚集操作,統(tǒng)計(jì)每個(gè)組的個(gè)數(shù)
from (//子查詢
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders 
  on //子查詢中包括左外連接操作
        c_custkey = o_custkey
        and o_comment not like ‘%[WORD1]%[WORD2]%’ //LIKE操作
            //WORD1 為以下四個(gè)可能值中任意一個(gè):special、pending、unusual、express
            //WORD2 為以下四個(gè)可能值中任意一個(gè):packages、requests、accounts、deposits
    group by //子查詢中的分組操作
        c_custkey
) as c_orders (c_custkey, c_count)
group by //分組操作
    c_count
order by //排序操作
    custdist desc, //從大到小降序排序
    c_count desc;
TPC-H標(biāo)準(zhǔn)定義了Q13語(yǔ)句等價(jià)的變形SQL,與上述查詢語(yǔ)句格式上不相同,上述語(yǔ)句使用子查詢作為查詢的對(duì)象,變形的SQL把子查詢部分變?yōu)橐晥D,然后基于視圖做查詢,這種做法的意義在于有些數(shù)據(jù)庫(kù)不支持如上語(yǔ)法,但存在等價(jià)的其他語(yǔ)法,如MySQL就不支持如上語(yǔ)法,需要使用如下等價(jià)形式。

create view orders_per_cust:s (custkey, ordercount) as //創(chuàng)建視圖,相當(dāng)與標(biāo)準(zhǔn)Q13的子查詢內(nèi)容
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%:1%:2%'
group by
c_custkey;

select
ordercount,
count(*) as custdist
from
orders_per_cust:s //對(duì)視圖進(jìn)行查詢
group by
ordercount
order by
custdist desc,
ordercount desc;
drop view orders_per_cust:s;

示例sql:

select
    c_count,
    count(*) as custdist
from
(
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders on
        c_custkey = o_custkey
        and o_comment not like '%express%packages%'
    group by
        c_custkey
) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc
LIMIT 1;

Q14:促銷(xiāo)效果查詢
促銷(xiāo)效果查詢是帶有分組、排序、聚集、子查詢、左外連接操作并存的查詢操作,監(jiān)控市場(chǎng)對(duì)促銷(xiāo)的反應(yīng);確定在給定年份和月份的收入中有多少百分比來(lái)之與促銷(xiāo)部分。

Q14語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、子查詢、左外連接操作并存的查詢操作。

語(yǔ)句詳解:

select
    100.00 * sum(
    case when p_type like 'PROMO%' //促銷(xiāo)零件
        then l_extendedprice*(1-l_discount) //某一特定時(shí)間的收入
        else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem, part
where
    l_partkey = p_partkey
    and l_shipdate >= date '[DATE]' // DATE是從1993年到1997年中任一年的任一月的一號(hào)
    and l_shipdate < date '[DATE]' + interval '1' month;

示例sql:

select
    100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end)  / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '1' month
LIMIT 1;

Q15:頂級(jí)供應(yīng)商查詢
頂級(jí)供應(yīng)商查詢是帶有分排序、聚集、聚集子查詢操作并存的普通表與視圖的連接操作,確定頂級(jí)供應(yīng)商以對(duì)其進(jìn)行獎(jiǎng)勵(lì)、給予更多業(yè)務(wù)或獲得特殊認(rèn)可;在給定的年費(fèi)的給定季度對(duì)總出貨量貢獻(xiàn)最大的供應(yīng)商。

Q15語(yǔ)句的特點(diǎn)是:帶有分排序、聚集、聚集子查詢操作并存的普通表與視圖的連接操作。

--語(yǔ)句詳解:

create view revenue[STREAM_ID](supplier_no, total_revenue) as //創(chuàng)建復(fù)雜視圖(帶有分組操作)
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount)) //獲取供貨商為公司帶來(lái)的總利潤(rùn)
from
    lineitem
where
    l_shipdate >= date '[DATE]' //DATE 是從1993年一月到1997年十月中任一月的一號(hào)
    and l_shipdate < date '[DATE]' + interval '3' month //3個(gè)月內(nèi)
group by //分組鍵與查詢對(duì)象之一相同
    l_suppkey;


//查詢語(yǔ)句
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,revenue[STREAM_ID] //普通表與復(fù)雜視圖進(jìn)行連接操作
where
    s_suppkey = supplier_no
    and total_revenue = (//聚集子查詢
        select
            max(total_revenue)
        from
            revenue[STREAM_ID] //聚集子查詢從視圖獲得數(shù)據(jù)
    )
order by
    s_suppkey;
  
//刪除視圖
drop view revenue[STREAM_ID];

TPC-H標(biāo)準(zhǔn)定義了Q15語(yǔ)句等價(jià)的變形SQL,使用了WITH子句,然后用WITH的對(duì)象與表進(jìn)行連接。變形SQL的語(yǔ)句如下:

WITH revenue (supplier_no, total_revenue) as (
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1-l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= date ':1'
        AND l_shipdate < date ':1' + interval '3' month
    GROUP BY
        l_suppkey
)

SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

示例sql:

create or replace view revenue0 (supplier_no, total_revenue) as
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount))
from
    lineitem
where
    l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '3' month
group by
    l_suppkey;
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = 
    (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey
LIMIT 1;

Q16:零部件/供貨商關(guān)系查詢
零部件/供貨商關(guān)系查詢是帶有分組、排序、聚集、去重、NOT IN子查詢操作并存的兩表連接操作;找出有多少供應(yīng)商可以提供具有給定屬性的零件。

Q16語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、去重、NOT IN子查詢操作并存的兩表連接操作。

--語(yǔ)句詳解

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt //聚集、去重操作
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> '[BRAND]'
        // BRAND=Brand#MN ,M和N是兩個(gè)字母,代表兩個(gè)數(shù)值,相互獨(dú)立,取值在1到5之間
    and p_type not like '[TYPE]%' //消費(fèi)者不感興趣的類(lèi)型和尺寸
    and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])
        //TYPEX是在1到50之間任意選擇的一組八個(gè)不同的值
    and ps_suppkey not in ( //NOT IN子查詢,消費(fèi)者排除某些供貨商
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by //分組操作
    p_brand,
    p_type,
    p_size
order by //排序操作
    supplier_cnt desc, //按數(shù)量降序排列,按品牌、種類(lèi)、尺寸升序排列
    p_brand,
    p_type,
    p_size;

示例sql:

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
     p_partkey = ps_partkey
     and p_brand <> 'Brand#55'
     and p_type not like 'MEDIUM PLATED%'
     and p_size in (35, 17, 43, 49, 19, 41, 30, 47)
     and ps_suppkey not in 
     (
         select
             s_suppkey
         from
             supplier
         where
             s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size
LIMIT 1;

Q17:小額訂單收入查詢
小額訂單收入查詢是帶有聚集、聚集子查詢操作并存的兩表連接操作;確定如果不在為少量特定零件完成訂單,平均每年將損失多少收入,通過(guò)集中在更大的出貨量上來(lái)減少管理費(fèi)用;給定品牌和給定容器類(lèi)型的零部件,確定在7年內(nèi)數(shù)據(jù)庫(kù)中所有訂單訂購(gòu)的此類(lèi)零件的平均訂單數(shù)量,如果不再接收低于20%平均水平的訂單,平均每年的總收入損失多少。

Q17語(yǔ)句的特點(diǎn)是:帶有聚集、聚集子查詢操作并存的兩表連接操作。

--語(yǔ)句詳解

select
    sum(l_extendedprice) / 7.0 as avg_yearly //聚集操作
from
    lineitem, part
where
    p_partkey = l_partkey
    and p_brand = '[BRAND]' /*指定品牌。 BRAND=’Brand#MN’ ,M和N是兩個(gè)字母,代表兩個(gè)數(shù)值,相互獨(dú)立,取值在1到5之間 */
    and p_container = '[CONTAINER]' //指定包裝類(lèi)型。在TPC-H標(biāo)準(zhǔn)指定的范圍內(nèi)隨機(jī)選擇
    and l_quantity < ( //聚集子查詢
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
);

示例sql:

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
   lineitem,
   part,
   (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
    p_partkey = l_partkey
    and agg_partkey = l_partkey
    and p_brand = 'Brand#53'
    and p_container = 'JUMBO JAR'
    and l_quantity < avg_quantity    
LIMIT 1;

Q18:大批量客戶查詢
大批量客戶查詢是帶有分組、排序、聚集、IN子查詢操作并存的三表連接操作,根據(jù)客戶的大批量訂單對(duì)客戶進(jìn)行排名。

Q18語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、IN子查詢操作并存的三表連接操作。查詢語(yǔ)句沒(méi)有從語(yǔ)法上限制返回多少條元組,但是TPC-H標(biāo)準(zhǔn)規(guī)定,查詢結(jié)果只返回前100行(通常依賴(lài)于應(yīng)用程序?qū)崿F(xiàn))。

--語(yǔ)句詳解

select
    c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //基本信息
    sum(l_quantity) //訂貨總數(shù)
from
    customer, orders, lineitem
where
    o_orderkey in ( //帶有分組操作的IN子查詢
        select
            l_orderkey
        from
            lineitem
        group by 
            l_orderkey 
    having
            sum(l_quantity) > [QUANTITY] // QUANTITY是位于312到315之間的任意值
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey 
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate;

示例sql:

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in 
    (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
            sum(l_quantity) > 313
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
LIMIT 100;

Q19:折扣收入查詢
折扣收入查詢是帶有分組、排序、聚集、IN子查詢操作并存的三表連接操作,歸因于以特定方式處理的選定零部件的總折扣收入。本查詢是用數(shù)據(jù)挖掘工具產(chǎn)生格式化代碼的一個(gè)例子;對(duì)一些空運(yùn)或人工運(yùn)輸零件三個(gè)不同種類(lèi)的所有訂單的總折扣收入。零件的選擇考慮特定品牌、包裝和尺寸范圍。

Q19語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、IN子查詢操作并存的三表連接操作。

--語(yǔ)句詳解

select
    sum(l_extendedprice * (1 - l_discount) ) as revenue
from
    lineitem, part
where (
        p_partkey = l_partkey
        and p_brand = ‘[BRAND1]’ /*特定品牌。BRAND1、BRAND2、BRAND3=‘Brand#MN’,M和N是兩個(gè)字母,代表兩個(gè)數(shù)值,相互獨(dú)立,取值在1到5之間 */
        and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) //包裝范圍
        and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 /* QUANTITY1 是1到10之間的任意取值 */
        and p_size between 1 and 5 //尺寸范圍
        and l_shipmode in (‘AIR’, ‘AIR REG’) //運(yùn)輸模式,如下帶有陰影的粗體表示的條件是相同的,存在條件化簡(jiǎn)的可能
        and l_shipinstruct = ‘DELIVER IN PERSON’
    ) or (
        p_partkey = l_partkey
        and p_brand = ‘[BRAND2]’
        and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
        and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 /* QUANTITY2 是10到20之間的任意取值 */
        and p_size between 1 and 10
    and l_shipmode in (‘AIR’, ‘AIR REG’)
        and l_shipinstruct = ‘DELIVER IN PERSON’
    ) or (
    p_partkey = l_partkey
    and p_brand = ‘[BRAND3]’
        and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
        and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 /* QUANTITY3 是20到30之間的任意取值 */
        and p_size between 1 and 15
        and l_shipmode in (‘AIR’, ‘AIR REG’)
        and l_shipinstruct = ‘DELIVER IN PERSON’
    );

示例sql:

select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#42'
         and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
         and l_quantity >= 6 and l_quantity <= 6 + 10
         and p_size between 1 and 5
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#55'
         and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
         and l_quantity >= 20 and l_quantity <= 20 + 10
         and p_size between 1 and 10
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
         p_partkey = l_partkey
         and p_brand = 'Brand#24'
         and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
         and l_quantity >= 23 and l_quantity <= 23 + 10
         and p_size between 1 and 15
         and l_shipmode in ('AIR', 'AIR REG')
         and l_shipinstruct = 'DELIVER IN PERSON'
    )
LIMIT 1;

Q20:潛在零部件促銷(xiāo)查詢
Q20語(yǔ)句查詢確定在某一年內(nèi),找出指定國(guó)家的能對(duì)某一零件商品提供更有競(jìng)爭(zhēng)力價(jià)格的供貨貨。所謂更有競(jìng)爭(zhēng)力的供貨商,是指那些零件有過(guò)剩的供貨商,超過(guò)供或商在某一年中貨運(yùn)給定國(guó)的某一零件的50%則為過(guò)剩。
潛在零部件促銷(xiāo)查詢是帶有排序、聚集、IN子查詢、普通子查詢操作并存的兩表連接操作,識(shí)別特定國(guó)家的供應(yīng)商,他們選擇的零部件可能是促銷(xiāo)報(bào)價(jià)的候選者;可識(shí)別提供給定零件過(guò)多的供應(yīng)商,過(guò)多被定義為超過(guò)50%的零部件。

Q20語(yǔ)句的特點(diǎn)是:帶有排序、聚集、IN子查詢、普通子查詢操作并存的兩表連接操作。

--語(yǔ)句詳解

select
    s_name, s_address
from
    supplier, nation
where
    s_suppkey in ( //第一層的IN子查詢
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in ( //第二層嵌套的IN子查詢
                select
                    p_partkey
                from
                    part
                where
                    p_name like '[COLOR]%' //COLOR為產(chǎn)生P_NAME的值的列表中的任意值
            )
            and ps_availqty > (//第二層嵌套的子查詢
                select
                    0.5 * sum(l_quantity) //聚集子查詢
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date('[DATE]') // DATE為在1993年至1997年的任一年的一月一號(hào)
                    and l_shipdate < date('[DATE]') + interval ‘1’ year //1年內(nèi)
            )
    )
  and s_nationkey = n_nationkey
  and n_name = '[NATION]' //TPC-H標(biāo)準(zhǔn)定義的任意值
order by
    s_name;

示例sql:

select s_name,s_address
from supplier,nation
where
    s_suppkey in 
    (select ps_suppkey
        from partsupp,
            (
                select
                    l_partkey agg_partkey,
                    l_suppkey agg_suppkey,
                    0.5 * sum(l_quantity) AS agg_quantity
                from
                    lineitem
                where
                    l_shipdate >= date '1997-01-01'
                    and l_shipdate < date '1997-01-01' + interval '1' year
                group by
                    l_partkey,
                    l_suppkey
            ) agg_lineitem
        where
            agg_partkey = ps_partkey
            and agg_suppkey = ps_suppkey
            and ps_partkey in 
            (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'frosted%'
            )
            and ps_availqty > agg_quantity
    )
    and s_nationkey = n_nationkey
    and n_name = 'UNITED STATES'
order by
    s_name
LIMIT 1;

Q21:供應(yīng)商留單等待查詢
供應(yīng)商留單等待查詢是帶有分組、排序、聚集、EXISTS子查詢、NOT EXISTS子查詢操作并存的四表連接操作,標(biāo)識(shí)無(wú)法及時(shí)發(fā)貨的供應(yīng)商。

Q21語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、EXISTS子查詢、NOT EXISTS子查詢操作并存的四表連接操作。查詢語(yǔ)句沒(méi)有從語(yǔ)法上限制返回多少條元組,但是TPC-H標(biāo)準(zhǔn)規(guī)定,查詢結(jié)果只返回前100行(通常依賴(lài)于應(yīng)用程序?qū)崿F(xiàn))。

--語(yǔ)句詳解

select
    s_name, count(*) as numwait
from
    supplier, lineitem l1, orders, nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists ( //EXISTS子查詢
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists ( //NOT EXISTS子查詢
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = '[NATION]' //TPC-H標(biāo)準(zhǔn)定義的任意值
group by
    s_name
order by
    numwait desc,
    s_name;

示例sql:

select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists 
    (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists 
    (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'VIETNAM'
group by
    s_name
order by
    numwait desc,
    s_name
LIMIT 100;

Q22:全球銷(xiāo)售機(jī)會(huì)查詢
全球銷(xiāo)售機(jī)會(huì)查詢是帶有分組、排序、聚集、EXISTS子查詢、NOT EXISTS子查詢操作并存的四表連接操作,確定有可能進(jìn)行購(gòu)買(mǎi)的客戶所在的地理位置,計(jì)算特定國(guó)家代碼范圍內(nèi)有多少客戶7年未下訂單,但他們的“正”賬戶余額高于平均水平。

Q22語(yǔ)句的特點(diǎn)是:帶有分組、排序、聚集、EXISTS子查詢、NOT EXISTS子查詢操作并存的四表連接操作。

--語(yǔ)句詳解

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from ( //第一層子查詢
    select
        substring(c_phone from 1 for 2) as cntrycode,
        c_acctbal
    from
        customer
    where
        // I1…I7是在TPC-H中定義國(guó)家代碼的可能值中不重復(fù)的任意值
        substring(c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        and c_acctbal > (//第二層聚集子查詢
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substr (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        )
        and not exists (//第二層NOT EXISTS子查詢
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

示例sql:

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
(
    select
        substring(c_phone from 1 for 2) as cntrycode,
        c_acctbal
    from
        customer
    where
        substring(c_phone from 1 for 2) in
        ('34', '18', '30', '24', '33', '22', '29')
        and c_acctbal > 
        (
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substring(c_phone from 1 for 2) in
                ('34', '18', '30', '24', '33', '22', '29')
        )
        and not exists 
        (
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
) as custsale
group by
    cntrycode
order by
    cntrycode limit 1;

五、測(cè)試結(jié)果舉例


image.png

六、測(cè)試總結(jié)

1.行存模式,執(zhí)行計(jì)劃均不走索引,獲取超過(guò)80%以上的記錄,走的是全表掃描
2.列存模式tpch查詢性能優(yōu)于行存。
3.列存不支持主外鍵約束。delta表是系統(tǒng)表,不支持變更,即不支持建立主外鍵
4.列存cluster key約束需要在導(dǎo)入數(shù)據(jù)前創(chuàng)建,否則只會(huì)影響新數(shù)據(jù),不會(huì)影響存量數(shù)據(jù)
5.cluster key帶排序功能,會(huì)稍微提升22條sql的查詢性能
6.列存可以創(chuàng)建btree索引,適用于局部很小范圍查詢或等值查詢,不適用于這些sql
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容