681. 【數(shù)據(jù)庫評測】Cloudwave 4.0 集群版(4節(jié)點) VS Starrocks 3.0 集群版(4節(jié)點)

一、評測結(jié)果


評測結(jié)論1:4臺64核256g阿里云服務(wù)器組成的4節(jié)點集群,hadoop3.2.2 作為分布式存儲,Cloudwave4.0在 SSB1000g 國際標準測試集下,整體性能優(yōu)于Starrocks3.0近0.36倍。
評測結(jié)論2:在多表聯(lián)合join場景下,Cloudwave4.0版本,耗時幾乎等于零

[附]13條標準測試SQL測試結(jié)果表:

數(shù)據(jù)庫 數(shù)據(jù)集 響應(yīng)時間(s) CPU 最大占用率 存儲壓縮比 數(shù)據(jù)導入時間
Cloudwave4.0 ssb1000 7.602 90%(5763%/6400%) 59%(360g/606g) 58分鐘
Starrocks3.0 ssb1000 10.397 66.6%(4266%/6400%) 169%(1024g/606g) 112分鐘

[附]2條拓展測試SQL測試結(jié)果表

數(shù)據(jù)庫 數(shù)據(jù)集 拓展SQL1響應(yīng)時間(s) 拓展SQL1 CPU 最大占用率 拓展SQL2響應(yīng)時間(s) 拓展SQL2 CPU 最大占用率
Cloudwave4.0 ssb1000 0.012 0.0935%(6%/6400%) 0.014 0.118%(7.6%/6400%)
Starrocks3.0 ssb1000 2.79 78.7%(5037%/6400%) 4.8 90.5%(5797%/6400%)

二、評測環(huán)境

  • 硬件環(huán)境:4臺 64核256g 云服務(wù)器(組成4節(jié)點的集群),essd pl1 高效云盤
  • 軟件環(huán)境:jdk19(Cloudwave4.0官方推薦版本,官方基于jdk19版本里頭的的vector api,實現(xiàn)全面向量化引擎)、jdk8(starrocks安裝推薦jdk版本,主要用于fe,亦可少踩坑)、mysql8(作為starrocks的客戶端)、hadoop 3.2.2(作為cloudwave 和 starrocks 共同的分布式存儲,副本數(shù)=2)
  • 軟件版本:Cloudwave 4.0(最新版在2023年5月份發(fā)版),Starrocks 3.0(最新版在2023年4月份發(fā)版)
  • 評測數(shù)據(jù)集:ssb1000
表名 行數(shù) 說明
lineorder 60 億 SSB 商品訂單表
customer 3000 萬 SSB 客戶表
part 200 萬 SSB 零部件表
supplier 200 萬 SSB 供應(yīng)商表
dates 2556 日期表
硬件環(huán)境
jdk版本
mysql版本
hadoop版本
Starrocks版本
Starrocks版本

三、評測方法

  • 執(zhí)行19輪測試腳本,每輪執(zhí)行13條標準測試sql,去除第1輪的測試數(shù)據(jù)(由于IO原因,第1次查詢兩邊的性能均受IO影響,本測試主要測數(shù)據(jù)庫引擎的算法在同等計算資源的條件下的優(yōu)劣,因此去除第一輪測試數(shù)據(jù)),將余下的18輪測試數(shù)據(jù)做平均,獲得每條sql的平均耗時;
  • 觀察最大CPU占用
  • 觀察存儲壓縮比
  • 觀察數(shù)據(jù)加載時間

多表聯(lián)合join測試

  • 測試方法:執(zhí)行19輪SQL測試腳本,每輪執(zhí)行1條多表聯(lián)合join拓展測試sql,去除第1輪的測試數(shù)據(jù)(由于IO原因,第1次查詢兩邊的性能均受IO影響,本測試主要測數(shù)據(jù)庫引擎的算法在同等計算資源的條件下的優(yōu)劣,因此去除第一輪測試數(shù)據(jù)),將余下的18輪測試數(shù)據(jù)做平均,獲得sql的平均耗時
  • 觀察最大CPU占用
  • 統(tǒng)計耗時
  • 多表聯(lián)合join拓展測試SQL1:select count(*) from lineorder,customer where lo_custkey = c_custkey;
  • 多表聯(lián)合join拓展測試SQL2:select count(*) from lineorder,customer,supplier where lo_custkey = c_custkey and lo_suppkey = s_suppkey;

四、開始測試[cloudwave]

  1. 查看為hadoop準備的存儲空間


./sync_scripts.sh 'df -h' | grep home
  1. 格式化hadoop


hdfs namenode -format
  1. 啟動hdfs,并查看服務(wù)狀態(tài)


start-dfs.sh 
./sync_scripts.sh 'jps'
  1. 創(chuàng)建ssb1000數(shù)據(jù)上傳目錄


hdfs dfs -mkdir /cloudwave
hdfs dfs -mkdir /cloudwave/uploads
hdfs dfs -put ssb1000 /cloudwave/uploads/

  1. 檢查數(shù)據(jù)上傳結(jié)果


  • 可以看到ssb1000的數(shù)據(jù),占用606GB的存儲空間
hdfs dfs -du -h /
du -sh /home/cloudwave/ssb-poc-0.9.3/ssb-poc/output/data_dir/ssb1000
  1. 啟動cloudwave數(shù)據(jù)庫,并導入ssb1000數(shù)據(jù)


  • 啟動數(shù)據(jù)庫
./start-all-server.sh
  • 導入數(shù)據(jù)
./cplus_go.bin -s 'loaddata ssb1000'
  • 可以看到3493秒導入完成,也就是58分鐘。
  • 上圖通過hdfs命令,可以看到cloudwave做了數(shù)據(jù)壓縮,ssb1000數(shù)據(jù)的原始大小是606G,導入cloudwave數(shù)據(jù)庫之后,壓縮到了360g(圖中的720G 表示hdfs兩個數(shù)據(jù)副本的總大?。?,壓縮比為59%。
  1. [cloudwave]開始測試13條標準測試SQL


  • 執(zhí)行測試腳本./test_ssb.sh,七鏡觀察到cloudwave 的4節(jié)點集群測ssb1000 CPU最大占用是90%(5763%/6400%)
  • 執(zhí)行分析腳本./analysis.sh cloudwave "$(ls n*txt)" +,可以看到13條標準測試SQL的合計時間平均是7.6s
  1. [cloudwave] 開始測試2條多表聯(lián)合joinSQL



  • 執(zhí)行測試腳本 ./test_ex.sh,七鏡觀察到cloudwave的4節(jié)點集群測ssb1000 的拓展SQL1的CPU最大占用是0.0935%(6%/6400%)
  • 執(zhí)行分析腳本./analysis.sh cloudwave "$(ls n*txt)" +,可以看到拓展SQL1耗時是12ms。
  • 將sql_ex.sql里的sql換成拓展SQL2,執(zhí)行測試腳本 ./test_ex.sh,七鏡觀察到cloudwave的4節(jié)點集群測ssb1000 的拓展SQL2的CPU最大占用是0.118%(7.6%/6400%)
  • 執(zhí)行分析腳本./analysis.sh cloudwave "$(ls n*txt)" +,可以看到拓展SQL2耗時是14ms。

五、對比測試

  1. 清空hdfs


  2. [starrocks] 啟動 starrocks3.0 fe


./fe/bin/start_fe.sh --daemon
  1. [starrocks] 添加starrocks3.0 be


mysql -uroot -h127.0.0.1 -P9030
ALTER SYSTEM ADD BACKEND "172.17.161.33:9050"; 
ALTER SYSTEM ADD BACKEND "172.17.161.32:9050"; 
ALTER SYSTEM ADD BACKEND "172.17.161.31:9050"; 
ALTER SYSTEM ADD BACKEND "172.17.161.30:9050"; 
  1. [starrocks] 啟動 starrocks 3.0 be


./sync_scripts.sh "cd $(pwd)/be/bin && ./start_be.sh --daemon &&ps -ef | grep starrocks_be"
  1. [starrocks] 驗證集群狀態(tài),4個節(jié)點的 Alive=true 即可。



  1. [starrocks] 創(chuàng)建表


  1. [starrocks] 開始導入數(shù)據(jù),ssb1000導入時間是


  • 如上圖所示,8點58分開始執(zhí)行的導入命令。
date && ./bin/stream_load.sh data_dir/ssb30 && date
  • 如上圖所示,導入過程中,發(fā)現(xiàn)在我設(shè)置的hdfs副本數(shù)默認=2的配置下,starrocks自己把自己建的文件副本數(shù)改成了3。
  • 如上圖所示,10點50分導入結(jié)束,總計耗時112分鐘。
  1. [starrocks] 查看ssb1000 壓縮比,ssb1000數(shù)據(jù)的原始大小是606G,導入starrocks數(shù)據(jù)庫之后,神奇的發(fā)現(xiàn),占用了1T的分布式存儲(壓縮呢???)。
  1. [starrocks] 開始測試
  • 執(zhí)行測試腳本./test_ssb.sh,七鏡觀察到 starrocks 的4節(jié)點集群測ssb1000 CPU最大占用是4266%/6400%


  1. [starrocks]分析測試結(jié)果
  • 執(zhí)行分析腳本./analysis.sh starrocks "$(ls n*txt)" +,去掉第一輪查詢(42.57s)的平均時間是10.39秒


  1. [starrocks] 開始測試2條多表聯(lián)合joinSQL


  • 執(zhí)行測試腳本 ./test_ex.sh,七鏡觀察到starrocks的4節(jié)點集群測ssb1000 的拓展SQL1的CPU最大占用是78.7%(5037%/6400%)
  • 執(zhí)行分析腳本./analysis.sh starrocks "$(ls n*txt)" +,可以看到拓展SQL1耗時是2.79s。
  • 將sql_ex.sql里的sql換成拓展SQL2,執(zhí)行測試腳本 ./test_ex.sh,七鏡觀察到starrocks的4節(jié)點集群測ssb1000 的拓展SQL2的CPU最大占用是90.5%(5797%/6400%)
  • 執(zhí)行分析腳本./analysis.sh starrocks "$(ls n*txt)" +,可以看到拓展SQL2耗時是4.8s。

五、附加

  1. Cloudwave 測試腳本
#!/bin/bash
# Program:
#       test ssb
# History:
# 2023/03/17    junfenghe.cloud@qq.com  version:0.0.1

rm -rf ./n*txt
for ((i=1; i<20; i++))
do

    cat sql_ssb.sql |./cplus.sh > n${i}.txt

done

  1. Starrocks 測試腳本
#!/bin/bash
# Program:
#       test ssb
# History:
# 2023/03/17    junfenghe.cloud@qq.com  version:0.0.1

rm -rf ./n*txt
for ((i=1; i<20; i++))
do

    cat sql_ssb.sql | mysql -uroot -P 9030 -h 127.0.0.1 -v -vv -vvv >n${i}.txt

done

  1. 分析腳本
#!/bin/bash
#Program:
#       analysis cloudwave/starrocks logs of base compute
#History:
#2023/02/20     junfenghe.cloud@qq.com  version:0.0.1

path=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:~/bin
export path



suff="(s)#####"

if [ -z "${1}" ]
then

        echo "Please input database'name"
        exit -1

fi

if [ -z "$2" ]
then

        echo "Please input times of scanner"
        exit -f
fi

if [ -n "${3}" ]
then
        suff=${3}
fi

for current in ${2}
do
        result_time=""

        if [ "${1}" == "starrocks" ]
        then
            for time in $( cat ${current} | grep sec  | awk -F '('  '{print $2}' | awk -F ' ' '{print $1}' )
            do
                result_time="${result_time}${time}${suff}"
            done
        elif [ "${1}" == "cloudwave" ]
        then
            for time in $( cat ${current} | grep Elapsed | awk '{print $2}'| sed 's/:/*60+/g'| sed 's/+00\*60//g ; s/+0\*60//g ; s/^0\*60+//g' )
            do
                result_time="${result_time}${time}${suff}"
            done
        fi

        echo ${result_time%${suff}*}

done


exit 0
  1. sql_ssb.sql
use ssb1000;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand;
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and  c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth  = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category;
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' group by d_year, s_city, p_brand order by d_year, s_city, p_brand;
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容