北大鄒磊:RDF Data Management(第三講)

這篇筆記來自于北大鄒磊教授的知識圖譜講座(第三講)

RDF圖數(shù)據(jù)管理(RDF Data Management)

視頻地址:https://www.bilibili.com/video/BV1yg4y1v7aD

主要內(nèi)容:

  • RDF Database Systems
    • Background
    • Existing Solutions
    • Graph-Based Approaches
  • Distributed System
    • Existing Solutions
    • gStore: a distributed SPARQL query engine

RDF and Semantic Web

  • RDF(Resource Description Framework) is a language for the conceptual modeling of information about web resources
  • A building block of semantic web(語義網(wǎng)的基石)
    • Facilitates exchange of informatioin
    • Search engines can retrieve more relevant information
    • Facilitates data integration (mashes)
  • Machine understandable(機器可理解)
    • Understand the information on the web and the interrelationships among them

1. RDF Database Systems

RDF Introduction

  • Everything is an uniquely named resource
  • Namespaces can be used to scope the names
  • Properties of resources can be defined
  • Relationships with other resources can be defined
  • Resources can be contributed by different people / groups and can be located anywhere in the web
    • Integrated web "database"

RDF Data Model (RDF 數(shù)據(jù)模型)

RDF Data Model

RDF本質(zhì)上一個三元組的集合,可以用圖來表示
RDF Graph

RDF Query Model (RDF 查詢模型)

RDF Query Model

傳統(tǒng)存儲查詢方式:

  • 用關(guān)系型數(shù)據(jù)庫來存儲三元組數(shù)據(jù)
  • 將SPARQL語句轉(zhuǎn)換為Sql語句進行查詢
  • 這種方式的缺點是,當數(shù)據(jù)量大的時候,查詢效率非常低。
  • 已有的優(yōu)化方法:
    • Property Tables (屬性表)
    • Binary Tables
    • Exhaustive Indexing(完全索引)

Graph-Based Approaches (基于圖的方法)

gStore

  • We work directly on the RDF graph and the SPARQL graph(利用圖的方式來回答SPARQL的查詢)
    • Answering SPARQL query = subgraph matching
    • Subgraph matching is computationally expensive
  • Use a signature-based encoding of each entity and classvertex to speed up matching
  • Filter-and-evaluate
    • Use a false posititve algorithm to prune nodes and obtain a set of candidates; then do more detailed evaluation on those
  • We develop an Index over the data signature graph for efficient pruning
    核心思想:用子圖匹配的方式來回答SPARSQL查詢

Subgraph Isomorphism

  • Edge-Join Strategy
  • Verte-Join Strategy

Distributed System

RDF Data Volumes

  • RDF Data Volumes are growing and fast
    • Linked data cloud currently consists of 325 datasets with > 25B triples
    • Size almost doubling every year
  • Linking Open Data cloud diagram

Large RDF Datasets

  • Now, RDF datasets become larger and larger
  • Yago, Freebase, DBpedia
  • Massive volumes of RDF data are growing beyond the capacity of RDF database systems operating on a single machine.

gStore: a distributed SPARQL query engine

System Framework: Overview
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容