
本測試報告由PandaDB開發(fā)團隊提供
時間: 2021年3月31日
1.測試簡介
PandaDB是以屬性圖為基礎(chǔ)實現(xiàn)的大規(guī)模異構(gòu)數(shù)據(jù)的融合管理。為指導(dǎo)后續(xù)研發(fā),我們以目前最為成熟、應(yīng)用最廣泛、單機圖查詢的性能標桿圖數(shù)據(jù)庫——Neo4j為參照,實測了PandaDB和Neo4j在單機圖查詢上的性能差異。
本次測試,我們采用了圖數(shù)據(jù)庫的國際通行基準測試LDBC的測試數(shù)據(jù)集和部分測試負載。
2.測試環(huán)境
表 1:測試環(huán)境
| 環(huán)境 | 配置 |
|---|---|
| 硬件環(huán)境 | 單臺測試物理機,配置: 雙路至強可擴展金牌6230R CPU 384GB DDR4內(nèi)存 220 TB Raid 5 HDD |
| 軟件環(huán)境 | 操作系統(tǒng)版本:CentOS 7.8 (64 bit) JDK版本:1.8 |
| 測試使用軟件版本 | PandaDB版本:v0.3.0.210331 Neo4j版本:v3.5.6 Community |
3.測試負載
基于基準測試LDBC的測試數(shù)據(jù)和測試負載。其中測試數(shù)據(jù)中有170億邊,25億節(jié)點。
首先git clone https://github.com/ldbc/ldbc_snb_datagen,然后生成測試數(shù)據(jù)并導(dǎo)入PandaDB。
(1)測試數(shù)據(jù)的生成
編輯ldbc_snb_datagen根目錄下的params.ini文件,將generator.scaleFactor設(shè)置為1000。然后執(zhí)行命令:
tools/run.py --cores 24 --memory 100g ./target/ldbc_snb_datagen-0.4.0-SNAPSHOT-jar-with-dependencies.jar params.ini
生成的數(shù)據(jù)量在1.3TB左右。
(2)測試數(shù)據(jù)的導(dǎo)入
將測試數(shù)據(jù)分別導(dǎo)入Neo4j、PandaDB,導(dǎo)入語句見附錄1。
Neo4j導(dǎo)入耗時:1d 5h 40m 49s 176ms。
PandaDB導(dǎo)入耗時:21h 19m 13s 107ms。
(3) 數(shù)據(jù)索引
:person("id")
:post("id")
:comment("id")
:person("firstName")
(4) 數(shù)據(jù)量
PandaDB磁盤占用為2.4 TB,Neo4j 1.8 TB。
4.測試語句
表 2 : 本測試報告所用測試負載(Cypher語句)
| 編號 | 查詢語句 | 對應(yīng)的LDBC測試語句 | 測試語義 |
|---|---|---|---|
| C1 | MATCH (n:person{firstName:"%s"}) RETURN n |
根據(jù)非唯一屬性過濾節(jié)點 | |
| C2 | MATCH (m:comment {id: "%s"}) RETURN m.creationDate AS messageCreationDate, m.content as content |
interactive-short4 | 根據(jù)唯一屬性過濾節(jié)點 |
| C3 | MATCH (n:person {id:"%s"})-[r:knows]-(friend:person{lastName:"Sharma"}) RETURN id(friend) |
interactive-short3 | 一度關(guān)系,返回id |
| C4 | MATCH (n:person{id:"%s"})-[r:knows]-(friend) RETURN friend.id AS personId, friend.firstName AS firstName, friend.lastName AS lastName, r.creationDate AS friendshipCreationDate |
interactive-short3 | 一度關(guān)系,返回節(jié)點數(shù)據(jù) |
| C5 | MATCH (n:person {id:"%s"})-[:isLocatedIn]->(p:place) RETURN n.firstName AS firstName, n.lastName AS lastName, n.birthday AS birthday, n.locationIP AS locationIP, n.browserUsed AS browserUsed, p.id AS cityId, n.gender AS gender, n.creationDate AS creationDate |
interactive-short1 | 一度關(guān)系,返回節(jié)點數(shù)據(jù) |
| C6 | MATCH (m:comment{id:"%s"})-[:hasCreator]->(p:person) RETURN p.id AS personId, p.firstName AS firstName, p.lastName AS lastName |
nteractive-short5 | 一度關(guān)系,返回節(jié)點數(shù)據(jù) |
| C7 | MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person{gender:"male"}) RETURN id(m) |
二度關(guān)系,首尾節(jié)點加屬性過濾 | |
| C8 | MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person) RETURN m.firstName AS firstName, m.lastName AS lastName, m.birthday AS birthday, m.locationIP AS locationIP, m.browserUsed AS browserUsed |
二度關(guān)系,返回屬性 | |
| C9 | MATCH (:person {id:"%s"})<-[:hasCreator]-(m)-[:replyOf]->(p:post)-[:hasCreator]->(c) RETURN m.id AS messageId, m.creationDate AS messageCreationDate, p.id AS originalPostId, c.id AS originalPostAuthorId, c.firstName AS originalPostAuthorFirstName, c.lastName AS originalPostAuthorLastName |
interactive-short2 | 三度關(guān)系 |
| C10 | MATCH (m:comment{id:"%s"})-[:replyOf]->(p:post)<-[:containerOf]-(f:forum)-[:hasModerator]->(mod:person) RETURN f.id AS forumId, f.title AS forumTitle, mod.id AS moderatorId, mod.firstName AS moderatorFirstName, mod.lastName AS moderatorLastName |
interactive-short6 | 三度關(guān)系 |
| C11 | MATCH (m:post{id:"%s"})<-[:replyOf]-(c:comment)-[:hasCreator]->(p:person) RETURN c.id AS commentId, c.content AS commentContent, c.creationDate AS commentCreationDate, p.id AS replyAuthorId, p.firstName AS replyAuthorFirstName, p.lastName AS replyAuthorLastName |
interactive-short7(前半部分) | 兩度關(guān)系 |
| C12 | MATCH (m:post{id:"%s"})-[:hasCreator]->(a:person)-[r:knows]-(p) RETURN m.id AS postId, m.language as postLanguage, p.id AS replyAuthorId, p.firstName AS replyAuthorFirstName, p.lastName AS replyAuthorLastName |
interactive-short7(后半部分) | 兩度關(guān)系 |
5. 測試結(jié)果
表3:測試結(jié)果(ms)
| 查詢語句 | Neo4j 查詢耗時 |
PandaDB 查詢耗時 |
加速比[1] (PandaDB相對于Neo4j) |
|---|---|---|---|
| C1 | 998 | 1,125 | 0.89 |
| C2 | 154 | 54 | 2.85 |
| C3 | 7,381 | 1,197 | 6.17 |
| C4 | 1,261 | 473 | 2.67 |
| C5 | 68 | 109 | 0.62 |
| C6 | 139 | 126 | 1.10 |
| C7 | 2,218 | 486 | 4.56 |
| C8 | 3,275 | 2,447 | 1.34 |
| C9 | 37,793 | 27,743 | 1.36 |
| C10 | 164 | 169 | 0.97 |
| C11 | 117 | 107 | 1.09 |
| C12 | 2,232 | 212 | 10.53 |


附錄:測試數(shù)據(jù)導(dǎo)入語句
導(dǎo)入語句如下所示。其中<data-dir>修改為數(shù)據(jù)實際存儲路徑。
(1)Neo4j數(shù)據(jù)導(dǎo)入命令
nohup neo4j-community-3.5.6/bin/neo4j-admin import --database graph1000.db --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv --delimiter "|" --array-delimiter ";" > neo4j-import-0303.log 2>&1 &
(2)PandaDB數(shù)據(jù)導(dǎo)入命令
nohup java -jar pandadb-importer-v0.3.jar --db-path=<data-dir>/panda-server/ldbc-1000.0302.db --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=<data-dir>/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=<data-dir>/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv --delimeter="|" --array-delimeter=";" > 1000-0302.log 2>&1 &
[1] 注:加速比計算公式:Neo4j查詢時間/PandaDB查詢時間。此值越大表示pandadb性能優(yōu)勢越明顯,為1表示查詢性能相同