一、評測結(jié)果

評測結(jié)論1:4臺64核256g阿里云服務(wù)器組成的4節(jié)點集群,hadoop3.2.2 作為分布式存儲,Cloudwave4.0在 SSB1000g 國際標準測試集下,整體性能優(yōu)于Starrocks3.0近0.36倍。
評測結(jié)論2:在多表聯(lián)合join場景下,Cloudwave4.0版本,耗時幾乎等于零
[附]13條標準測試SQL測試結(jié)果表:
| 數(shù)據(jù)庫 | 數(shù)據(jù)集 | 響應(yīng)時間(s) | CPU 最大占用率 | 存儲壓縮比 | 數(shù)據(jù)導入時間 |
|---|---|---|---|---|---|
| Cloudwave4.0 | ssb1000 | 7.602 | 90%(5763%/6400%) | 59%(360g/606g) | 58分鐘 |
| Starrocks3.0 | ssb1000 | 10.397 | 66.6%(4266%/6400%) | 169%(1024g/606g) | 112分鐘 |
[附]2條拓展測試SQL測試結(jié)果表
| 數(shù)據(jù)庫 | 數(shù)據(jù)集 | 拓展SQL1響應(yīng)時間(s) | 拓展SQL1 CPU 最大占用率 | 拓展SQL2響應(yīng)時間(s) | 拓展SQL2 CPU 最大占用率 |
|---|---|---|---|---|---|
| Cloudwave4.0 | ssb1000 | 0.012 | 0.0935%(6%/6400%) | 0.014 | 0.118%(7.6%/6400%) |
| Starrocks3.0 | ssb1000 | 2.79 | 78.7%(5037%/6400%) | 4.8 | 90.5%(5797%/6400%) |
二、評測環(huán)境
- 硬件環(huán)境:4臺 64核256g 云服務(wù)器(組成4節(jié)點的集群),essd pl1 高效云盤
- 軟件環(huán)境:jdk19(Cloudwave4.0官方推薦版本,官方基于jdk19版本里頭的的vector api,實現(xiàn)全面向量化引擎)、jdk8(starrocks安裝推薦jdk版本,主要用于fe,亦可少踩坑)、mysql8(作為starrocks的客戶端)、hadoop 3.2.2(作為cloudwave 和 starrocks 共同的分布式存儲,副本數(shù)=2)
- 軟件版本:Cloudwave 4.0(最新版在2023年5月份發(fā)版),Starrocks 3.0(最新版在2023年4月份發(fā)版)
- 評測數(shù)據(jù)集:ssb1000
| 表名 | 行數(shù) | 說明 |
|---|---|---|
| lineorder | 60 億 | SSB 商品訂單表 |
| customer | 3000 萬 | SSB 客戶表 |
| part | 200 萬 | SSB 零部件表 |
| supplier | 200 萬 | SSB 供應(yīng)商表 |
| dates | 2556 | 日期表 |

硬件環(huán)境

jdk版本

mysql版本

hadoop版本

Starrocks版本

Starrocks版本
三、評測方法
- 執(zhí)行19輪測試腳本,每輪執(zhí)行13條標準測試sql,去除第1輪的測試數(shù)據(jù)(由于IO原因,第1次查詢兩邊的性能均受IO影響,本測試主要測數(shù)據(jù)庫引擎的算法在同等計算資源的條件下的優(yōu)劣,因此去除第一輪測試數(shù)據(jù)),將余下的18輪測試數(shù)據(jù)做平均,獲得每條sql的平均耗時;
- 觀察最大CPU占用
- 觀察存儲壓縮比
- 觀察數(shù)據(jù)加載時間
多表聯(lián)合join測試
- 測試方法:執(zhí)行19輪SQL測試腳本,每輪執(zhí)行1條多表聯(lián)合join拓展測試sql,去除第1輪的測試數(shù)據(jù)(由于IO原因,第1次查詢兩邊的性能均受IO影響,本測試主要測數(shù)據(jù)庫引擎的算法在同等計算資源的條件下的優(yōu)劣,因此去除第一輪測試數(shù)據(jù)),將余下的18輪測試數(shù)據(jù)做平均,獲得sql的平均耗時
- 觀察最大CPU占用
- 統(tǒng)計耗時
- 多表聯(lián)合join拓展測試SQL1:select count(*) from lineorder,customer where lo_custkey = c_custkey;
- 多表聯(lián)合join拓展測試SQL2:select count(*) from lineorder,customer,supplier where lo_custkey = c_custkey and lo_suppkey = s_suppkey;
四、開始測試[cloudwave]
-
查看為hadoop準備的存儲空間
./sync_scripts.sh 'df -h' | grep home
-
格式化hadoop
hdfs namenode -format
-
啟動hdfs,并查看服務(wù)狀態(tài)
start-dfs.sh
./sync_scripts.sh 'jps'
-
創(chuàng)建ssb1000數(shù)據(jù)上傳目錄
hdfs dfs -mkdir /cloudwave
hdfs dfs -mkdir /cloudwave/uploads
hdfs dfs -put ssb1000 /cloudwave/uploads/
-
檢查數(shù)據(jù)上傳結(jié)果
- 可以看到ssb1000的數(shù)據(jù),占用606GB的存儲空間
hdfs dfs -du -h /
du -sh /home/cloudwave/ssb-poc-0.9.3/ssb-poc/output/data_dir/ssb1000
-
啟動cloudwave數(shù)據(jù)庫,并導入ssb1000數(shù)據(jù)
- 啟動數(shù)據(jù)庫
./start-all-server.sh

- 導入數(shù)據(jù)
./cplus_go.bin -s 'loaddata ssb1000'

- 可以看到3493秒導入完成,也就是58分鐘。

- 上圖通過hdfs命令,可以看到cloudwave做了數(shù)據(jù)壓縮,ssb1000數(shù)據(jù)的原始大小是606G,導入cloudwave數(shù)據(jù)庫之后,壓縮到了360g(圖中的720G 表示hdfs兩個數(shù)據(jù)副本的總大?。?,壓縮比為59%。
-
[cloudwave]開始測試13條標準測試SQL
- 執(zhí)行測試腳本
./test_ssb.sh,七鏡觀察到cloudwave 的4節(jié)點集群測ssb1000 CPU最大占用是90%(5763%/6400%)

- 執(zhí)行分析腳本
./analysis.sh cloudwave "$(ls n*txt)" +,可以看到13條標準測試SQL的合計時間平均是7.6s
-
[cloudwave] 開始測試2條多表聯(lián)合joinSQL
- 執(zhí)行測試腳本
./test_ex.sh,七鏡觀察到cloudwave的4節(jié)點集群測ssb1000 的拓展SQL1的CPU最大占用是0.0935%(6%/6400%)

- 執(zhí)行分析腳本
./analysis.sh cloudwave "$(ls n*txt)" +,可以看到拓展SQL1耗時是12ms。

- 將sql_ex.sql里的sql換成拓展SQL2,執(zhí)行測試腳本
./test_ex.sh,七鏡觀察到cloudwave的4節(jié)點集群測ssb1000 的拓展SQL2的CPU最大占用是0.118%(7.6%/6400%)

- 執(zhí)行分析腳本
./analysis.sh cloudwave "$(ls n*txt)" +,可以看到拓展SQL2耗時是14ms。
五、對比測試
-
清空hdfs
-
[starrocks] 啟動 starrocks3.0 fe
./fe/bin/start_fe.sh --daemon
-
[starrocks] 添加starrocks3.0 be
mysql -uroot -h127.0.0.1 -P9030
ALTER SYSTEM ADD BACKEND "172.17.161.33:9050";
ALTER SYSTEM ADD BACKEND "172.17.161.32:9050";
ALTER SYSTEM ADD BACKEND "172.17.161.31:9050";
ALTER SYSTEM ADD BACKEND "172.17.161.30:9050";
-
[starrocks] 啟動 starrocks 3.0 be
./sync_scripts.sh "cd $(pwd)/be/bin && ./start_be.sh --daemon &&ps -ef | grep starrocks_be"
-
[starrocks] 驗證集群狀態(tài),4個節(jié)點的 Alive=true 即可。
-
[starrocks] 創(chuàng)建表
-
[starrocks] 開始導入數(shù)據(jù),ssb1000導入時間是
- 如上圖所示,8點58分開始執(zhí)行的導入命令。
date && ./bin/stream_load.sh data_dir/ssb30 && date

- 如上圖所示,導入過程中,發(fā)現(xiàn)在我設(shè)置的hdfs副本數(shù)默認=2的配置下,starrocks自己把自己建的文件副本數(shù)改成了3。

- 如上圖所示,10點50分導入結(jié)束,總計耗時112分鐘。

- [starrocks] 查看ssb1000 壓縮比,ssb1000數(shù)據(jù)的原始大小是606G,導入starrocks數(shù)據(jù)庫之后,神奇的發(fā)現(xiàn),占用了1T的分布式存儲(壓縮呢???)。

- [starrocks] 開始測試
-
執(zhí)行測試腳本./test_ssb.sh,七鏡觀察到 starrocks 的4節(jié)點集群測ssb1000 CPU最大占用是4266%/6400%
- [starrocks]分析測試結(jié)果
-
執(zhí)行分析腳本./analysis.sh starrocks "$(ls n*txt)" +,去掉第一輪查詢(42.57s)的平均時間是10.39秒
-
[starrocks] 開始測試2條多表聯(lián)合joinSQL
- 執(zhí)行測試腳本
./test_ex.sh,七鏡觀察到starrocks的4節(jié)點集群測ssb1000 的拓展SQL1的CPU最大占用是78.7%(5037%/6400%)

- 執(zhí)行分析腳本
./analysis.sh starrocks "$(ls n*txt)" +,可以看到拓展SQL1耗時是2.79s。

- 將sql_ex.sql里的sql換成拓展SQL2,執(zhí)行測試腳本
./test_ex.sh,七鏡觀察到starrocks的4節(jié)點集群測ssb1000 的拓展SQL2的CPU最大占用是90.5%(5797%/6400%)

- 執(zhí)行分析腳本
./analysis.sh starrocks "$(ls n*txt)" +,可以看到拓展SQL2耗時是4.8s。
五、附加
- Cloudwave 測試腳本
#!/bin/bash
# Program:
# test ssb
# History:
# 2023/03/17 junfenghe.cloud@qq.com version:0.0.1
rm -rf ./n*txt
for ((i=1; i<20; i++))
do
cat sql_ssb.sql |./cplus.sh > n${i}.txt
done
- Starrocks 測試腳本
#!/bin/bash
# Program:
# test ssb
# History:
# 2023/03/17 junfenghe.cloud@qq.com version:0.0.1
rm -rf ./n*txt
for ((i=1; i<20; i++))
do
cat sql_ssb.sql | mysql -uroot -P 9030 -h 127.0.0.1 -v -vv -vvv >n${i}.txt
done
- 分析腳本
#!/bin/bash
#Program:
# analysis cloudwave/starrocks logs of base compute
#History:
#2023/02/20 junfenghe.cloud@qq.com version:0.0.1
path=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:~/bin
export path
suff="(s)#####"
if [ -z "${1}" ]
then
echo "Please input database'name"
exit -1
fi
if [ -z "$2" ]
then
echo "Please input times of scanner"
exit -f
fi
if [ -n "${3}" ]
then
suff=${3}
fi
for current in ${2}
do
result_time=""
if [ "${1}" == "starrocks" ]
then
for time in $( cat ${current} | grep sec | awk -F '(' '{print $2}' | awk -F ' ' '{print $1}' )
do
result_time="${result_time}${time}${suff}"
done
elif [ "${1}" == "cloudwave" ]
then
for time in $( cat ${current} | grep Elapsed | awk '{print $2}'| sed 's/:/*60+/g'| sed 's/+00\*60//g ; s/+0\*60//g ; s/^0\*60+//g' )
do
result_time="${result_time}${time}${suff}"
done
fi
echo ${result_time%${suff}*}
done
exit 0
- sql_ssb.sql
use ssb1000;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35;
select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand;
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category;
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' group by d_year, s_city, p_brand order by d_year, s_city, p_brand;



















