1. A BRIEF HISTORY OF DBMSs?
and THE RISE OF NEWSQL
(1)the older DBMSs from the 1980-1990s
(2)the NoSQL DBMSs from the 2000s
? 隨著 Internet 的發(fā)展,一方面,企業(yè)所需存儲(chǔ)的數(shù)據(jù)規(guī)模不斷增加;另一方面,在線業(yè)務(wù)必須 24 小時(shí)不間斷的對(duì)外提供服務(wù)。
? 企業(yè)的解決方案有了如下(a)(b)的階段性發(fā)展:
(a)通過(guò)簡(jiǎn)單的更換更好的物理機(jī)器的方式,對(duì) DBMS 的性能進(jìn)行擴(kuò)展。
缺點(diǎn):
(1)隨著數(shù)據(jù)量的增加,這種方法很容易又會(huì)再次達(dá)到瓶頸
(2)將數(shù)據(jù)從一臺(tái)舊的服務(wù)器遷移到新的服務(wù)器中,經(jīng)常需要停機(jī),在這段時(shí)間中,應(yīng)用將無(wú)法對(duì)外提供服務(wù);
(b)為了解決(a)中提到的問(wèn)題,企業(yè)采用中間件(middleware)的方式,將一個(gè)單節(jié)點(diǎn)(single-node)的DBMS中的數(shù)據(jù)進(jìn)行分片,存儲(chǔ)到多臺(tái)由廉價(jià)的物理機(jī)構(gòu)成的集群中。從應(yīng)用角度看,中間件對(duì)上在邏輯層面表現(xiàn)為一個(gè)單節(jié)點(diǎn)DBMS。當(dāng)應(yīng)用發(fā)起查詢請(qǐng)求時(shí),中間件將該請(qǐng)求 redirect and/or rewrite 到集群中的一個(gè)或多個(gè)節(jié)點(diǎn)上,并將這些節(jié)點(diǎn)的查詢結(jié)果進(jìn)行匯總后,返回給應(yīng)用。
缺點(diǎn):
? 對(duì)于類似于讀取或更新單條記錄的簡(jiǎn)單查詢請(qǐng)求,通過(guò)中間件性能尚可,但是,當(dāng)我們需要更新一個(gè)事務(wù)中的多條記錄,或一個(gè) join 語(yǔ)句連接的多個(gè)表中的多條記錄時(shí),早期的中間件便無(wú)能為力。
(c)一些企業(yè)選擇放棄中間件,開(kāi)發(fā)自己的分布式數(shù)據(jù)庫(kù)。除了(b)所述原因外,還有三點(diǎn)主要原因,(1)傳統(tǒng)的 DBMS 以可用性和性能為代價(jià),必須滿足一致性和正確性,然而這個(gè)特性對(duì)于24小時(shí)在線提供服務(wù)并需要支持大規(guī)模的并發(fā)行為的應(yīng)用并不必要;(2)使用如MySQL等擁有 full featured DBMS 并沒(méi)有必要;(3)關(guān)系模型并不是描述應(yīng)用數(shù)據(jù)的最佳模型,并且對(duì)于簡(jiǎn)單的 look-up 查詢操作而言,SQL 顯得殺雞焉用牛刀。
? 上述的這些問(wèn)題催生了 NoSQL ,NoSQL 系統(tǒng)的主要特點(diǎn)是:放棄了強(qiáng)事務(wù)保證和關(guān)系模型(這兩點(diǎn)被認(rèn)為影響了Web-based application 所需的 DBMS 的擴(kuò)展性和高可用性的),而采用最終一致性和可選擇的數(shù)據(jù)模型(如,key/value,graphs,documents)。
? ?使用NoSQL的優(yōu)點(diǎn)在于,開(kāi)發(fā)人員可以關(guān)注在應(yīng)用開(kāi)發(fā)上,而不用再擔(dān)心如何擴(kuò)展 DBMS。缺點(diǎn)在于,對(duì)于諸如金融、訂單相關(guān)的系統(tǒng),無(wú)法放棄強(qiáng)事務(wù)模型和強(qiáng)一致性的保證,開(kāi)發(fā)人員需要花費(fèi)大量的精力編寫(xiě)代碼處理數(shù)據(jù)一致性和事務(wù)的問(wèn)題。
? ?此類需求催生了NewSQL。
(2)the NewSQL DBMSs from the 2010s
文中給出的關(guān)于NewSQL的定義:
Our definition of NewSQL is that they are a class of modern relational DBMSs that seek to provide the same scalable?performance of NoSQL for OLTP read-write workloads while still maintaining ACID guarantees for transactions. In other words, these systems want to achieve the same scalability of NoSQL DBMSs from the 2000s, but still keep the relational model (with SQL) and transaction support of the legacy DBMSs from the 1970–80s. This enables applications to execute a large number of concurrent transactions to ingest new information and modify the state of the database using SQL (instead of a proprietary API).
? 文中指出,the OLAP(on-line analytical procession) data warehouses from the 2000s 不應(yīng)當(dāng)被歸類為NewSQL。OLAP DBMSs are focused on executing complex read-only queries that take a long time to process large data sets. 而 NewSQL 的特性在于:(1)executing read-write transactions (2)touch a small subset?of data using index lookups (3) repetitive (executing the same queries with different inputs).
2. CATEGORIZATION
本文根據(jù)系統(tǒng)的實(shí)現(xiàn)方式,認(rèn)為 NewSQL 系統(tǒng)可以被分為三類:
New Architectures/ Transparent Sharding Middleware/ Database-as-a-Service
3. THE STATE OF THE ART (what is novel in these NewSQL systems)
Main Memory Storage
Partitioning / Sharding
Concurrency Control
Secondary Indexes
Replication
Crash Recovery