DDIA Ch6

Partition

split of DB to multiple nodes

Parition key-value

Partitioning by Key Range

just like encyclopedia, you assign range of key to a node

Partitioning by hash

using the hash of the key for partitioning we lose a nice property of key-range partitioning: the ability to do efficient range queries (good for OLAP)

Range queries on primary key are not supported by Riak, Couchbase, or Voldemort

所以這些DB不適合做analytics? 應(yīng)該可以用他們做OLTP, 然后在轉(zhuǎn)到data warehouse 的時候轉(zhuǎn)成別的DB(適合range queries的DB) 進(jìn)行存儲

Partitioning Secondary indexes by document

![[DDIA-Figure-6-4.png]]
這張圖很好的概括了partitioning by document 什么意思,就是每次把相關(guān)的secondary index 按照document來區(qū)分,然后查找的時候要從所有的partition 里面找,這個找的過程也叫scatter/gather

scatter/gather is prone to tail latency amplification

Partitioning by term

![[DDIA-figure-6-5.png]]
partitioning by term 就是說把所有color 從 a-z分開(partition), 圖中這個例子是a-r開始的都被分到了partition 0 里面,s-z 分到了partition1 里面

term partition對于document partition的好處在于不用scatter/gather 了,

a client only needs to make a request to the partition containing the term that it wants. However, downside of a global index is that writes are slower and more complicated

term partition也叫g(shù)lobal index,寫入操作會比較麻煩,因為你要計算這個term需要分配到那個partition,所以一個寫入操作會分到不同的node (本身寫入操作需要存儲的node,以及存 global index 的node。

Massively Parallel Processing(MPP)

MPP is different from simple queries that read or write a single key. Massively parallel processing often used for analytics, usually in relational database products, are much more sophisticated in the types of queries they support.

大數(shù)據(jù)吞吐,通常是relational DB 支持的query

A typical data warehouse query contains several join, filtering grouping, and aggregation operations. The MPP query optimizer breaks this complex query into a number of execution stages and partitions, many of which can be excuted in parallel on different nodes of the database cluster

就是因為data warehouse 的query通常很復(fù)雜(因為他要用不同的filter,join 來分析數(shù)據(jù)),所以MPP query optimizer 通常會把這些復(fù)雜的query拆分成不同階段的執(zhí)行命令和對應(yīng)的partition,然后并行執(zhí)行,executed in parallel on different nodes of the database cluster

總結(jié)

partition 是在dataset很大的時候必須做的事情,能夠增加scalable的必要步驟,通常有2種partition方式

partition的目的是把query合理的分配到不同的機器上,避免hotspot,不然你partition的意義就沒了,為了避免hotspot,我們就需要選擇partition scheme

(scheme:mac字典上的定義:a large-scale systematic plan or arrangement for attaining a particular object or putting a particular idea into effect
mac上字典的同義詞:Plan, game plan )

有了scheme之后,你就要選擇如何balance了,因為dataset會增加或者減少,增加的時候超過一個threshold,那就需要rebalance,或者你有的node需要維護(hù),那就要移出當(dāng)前的cluster,然后做rebalance,或者你要增加一個node,這時候也要rebalance

這一章討論了兩種partitioning scheme?

  • partition by key range
    Where key are sorted, and a partition owns all the keys from some minimum up to some maximum. Sorting has the advantage that efficient range queries are possible, but there is a risk of hot spots

    In this approach, partitions are typically rebalanced dynamically by splitting the range into two subranges when a partition gets too big

  • Hash partitioning, where a hash function is applied to each key, and a partition owns a range of hashes. This method destroys the ordering of keys, making range queries inefficient, but may distribute load more evenly

    When partitioning by hash, it is common to create a fixed number of partitions in advance, to assign several partitions to each node, and to move entire parti‐ tions from one node to another when nodes are added or removed. Dynamic partitioning can also be used.

LSM 跟Btree也有這個問題吧?就是你要選擇使用key range,還是hash

Hybrid approaches are also possible, for example with a compound key: using one part of the key to identify the partition and another part for the sort order.

這一章還討論了secondary indexes. secondary index 也需要partition, 也有兩種方案來partition

  • Document-partitioned indexes (local indexes). 這個方案優(yōu)點在于寫入的時候只需要寫入一個node,因為每個partition存的就是自己的secondary index,但是讀取的時候需要scatter/gather
  • Term-partitioned indexes (global indexes), 這個方案是把secondary index按照分類存到一個node里面,比如color從a-r放到node0 里面, color 從s-z開頭的顏色放到node1里面,參考本文圖片[[#Partitioning by term]] 這種方法好處就是讀取的時候,一個index就存了所有有關(guān)這個secondary index的所有primary key,但寫入的時候就需要把有關(guān)這個secondary index相關(guān)的partition全部更新

所以總結(jié)一下,document 是寫入方便,讀取麻煩,term是讀取方便,寫入麻煩

最后這篇文章還討論了routing的方法,Zookeeper, 不同的routing method(client aware, query all nodes, Zookeeper-authoritative node )
一般zookeeper就是存儲所有關(guān)于哪個 partition 在哪里的key-value pair,包括IP地址等信息都存在里面,所以是authoritative node,然后有query router 會subscribe zookeeper 的信息,zookeeper 每次更新都會通知subscriber

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容