原文網(wǎng)址:https://github.com/facebook/rocksdb/wiki/Basic-Operations
(有道)
Basic operations
The <code>rocksdb</code> library provides a persistent key value store. Keys and values are arbitrary byte arrays. The keys are ordered within the key value store according to a user-specified comparator function.
rocksdb 庫提供了一個持久的鍵值存儲。鍵和值是任意的字節(jié)數(shù)組。鍵根據(jù)用戶指定的比較器函數(shù)在鍵值存儲區(qū)中排序。
Opening A Database
A <code>rocksdb</code> database has a name which corresponds to a file system directory. All of the contents of database are stored in this directory. The following example shows how to open a database, creating it if necessary:
一個<code>rocksdb</code>數(shù)據(jù)庫有一個與文件系統(tǒng)目錄對應(yīng)的名稱。數(shù)據(jù)庫的所有內(nèi)容都存儲在這個目錄中。下面的例子展示了如何打開一個數(shù)據(jù)庫,并在必要時創(chuàng)建它:
#include <cassert>
#include "rocksdb/db.h"
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &db);
assert(status.ok());
...
If you want to raise an error if the database already exists, add the following line before the <code>rocksdb::DB::Open</code> call:
如果你想在數(shù)據(jù)庫已經(jīng)存在的情況下引發(fā)錯誤,在rocksdb::DB::Open調(diào)用之前添加以下代碼:
options.error_if_exists = true;
If you are porting code from <code>leveldb</code> to <code>rocksdb</code>, you can convert your <code>leveldb::Options</code> object to a <code>rocksdb::Options</code> object using <code>rocksdb::LevelDBOptions</code>, which has the same functionality as <code>leveldb::Options</code>:
如果你要將代碼從leveldb移植到rocksdb,你可以使用rocksdb::LevelDBOptions將你的leveldb::Options對象轉(zhuǎn)換為rocksdb::Options對象,它具有與leveldb::Options相同的功能。
#include "rocksdb/utilities/leveldb_options.h"
rocksdb::LevelDBOptions leveldb_options;
leveldb_options.option1 = value1;
leveldb_options.option2 = value2;
...
rocksdb::Options options = rocksdb::ConvertOptions(leveldb_options);
RocksDB Options
Users can choose to always set options fields explicitly in code, as shown above. Alternatively, you can also set it through a string to string map, or an option string. See [[Option String and Option Map]].
用戶可以選擇總是在代碼中顯式地設(shè)置選項字段,如上所示。或者,您也可以通過字符串到字符串的映射或選項字符串來設(shè)置它。請參見[[Option String and Option Map]]。
Some options can be changed dynamically while DB is running. For example:
一些選項可以在DB運行時動態(tài)更改。例如:
rocksdb::Status s;
s = db->SetOptions({{"write_buffer_size", "131072"}});
assert(s.ok());
s = db->SetDBOptions({{"max_background_flushes", "2"}});
assert(s.ok());
RocksDB automatically keeps options used in the database in OPTIONS-xxxx files under the DB directory. Users can choose to preserve the option values after DB restart by extracting options from these option files. See [[RocksDB Options File]].
RocksDB會自動將數(shù)據(jù)庫中使用的選項保存在DB目錄下的options -xxxx文件中。用戶可以通過從這些選項文件中提取選項來選擇在DB重啟后保留選項值。參見[[RocksDB Options File]]。
Status
You may have noticed the <code>rocksdb::Status</code> type above. Values of this type are returned by most functions in <code>rocksdb</code> that may encounter an error. You can check if such a result is ok, and also print an associated error message:
您可能已經(jīng)注意到上面的rocksdb::Status類型。該類型的值由rocksdb中的大多數(shù)可能遇到錯誤的函數(shù)返回。你可以檢查這樣的結(jié)果是否正確,并打印一個相關(guān)的錯誤消息:
rocksdb::Status s = ...;
if (!s.ok()) cerr << s.ToString() << endl;
Closing A Database
When you are done with a database, there are 2 ways to gracefully close the database -
當(dāng)你關(guān)閉一個數(shù)據(jù)庫,有兩種方法優(yōu)雅地關(guān)閉數(shù)據(jù)庫-
- Simply delete the database object. This will release all the resources that were held while the database was open. However, if any error is encountered when releasing any of the resources, for example error when closing the info_log file, it will be lost.
只需刪除數(shù)據(jù)庫對象。這將釋放數(shù)據(jù)庫打開時持有的所有資源。但是,如果在釋放資源時出現(xiàn)錯誤,例如關(guān)閉info_log文件時出現(xiàn)error,則資源將丟失。 - Call
DB::Close(), followed by deleting the database object. TheDB::Close()returnsStatus, which can be examined to determine if there were any errors. Regardless of errors,DB::Close()will release all resources and is irreversible.
調(diào)用DB::Close(),然后刪除數(shù)據(jù)庫對象。DB::Close()返回Status,可以檢查Status以確定是否有任何錯誤。不管有什么錯誤,DB::Close()都會釋放所有的資源,并且是不可逆的。
Example:
... open the db as described above ...
... do something with db ...
delete db;
Or
... open the db as described above ...
... do something with db ...
Status s = db->Close();
... log status ...
delete db;
Reads
The database provides <code>Put</code>, <code>Delete</code>, <code>Get</code>, and <code>MultiGet</code> methods to modify/query the database. For example, the following code moves the value stored under key1 to key2.
數(shù)據(jù)庫提供Put、Delete、Get、MultiGet等方法對數(shù)據(jù)庫進行修改和查詢。例如,下面的代碼將存儲在key1下的值移動到key2。
std::string value;
rocksdb::Status s = db->Get(rocksdb::ReadOptions(), key1, &value);
if (s.ok()) s = db->Put(rocksdb::WriteOptions(), key2, value);
if (s.ok()) s = db->Delete(rocksdb::WriteOptions(), key1);
Right now, value size must be smaller than 4GB.
現(xiàn)在,值size必須小于4GB。
RocksDB also allows [[Single Delete]] which is useful in some special cases.
RocksDB還允許使用[[Single Delete]],這在某些特殊情況下非常有用。
Each Get results into at least a memcpy from the source to the value string. If the source is in the block cache, you can avoid the extra copy by using a PinnableSlice.
每個Get結(jié)果到至少一個memcpy從源到值字符串。如果源文件在塊緩存中,可以使用PinnableSlice來避免額外的拷貝。
PinnableSlice pinnable_val;
rocksdb::Status s = db->Get(rocksdb::ReadOptions(), key1, &pinnable_val);
The source will be released once pinnable_val is destructed or ::Reset is invoked on it. Read more here.
當(dāng)pinnable_val被銷毀或者::Reset被調(diào)用時,資源將被釋放。閱讀更多(這里)(http://rocksdb.org/blog/2017/08/24/pinnableslice.html)。
When reading multiple keys from the database, MultiGet can be used. There are two variations of MultiGet: 1. Read multiple keys from a single column family in a more performant manner, i.e it can be faster than calling Get in a loop, and 2. Read keys across multiple column families consistent with each other.
當(dāng)從數(shù)據(jù)庫讀取多個鍵時,可以使用MultiGet。MultiGet有兩種變體:以一種更高效的方式從一個列族中讀取多個鍵,即它可以比在循環(huán)中調(diào)用Get更快??缍鄠€一致的列族讀取鍵。
For example,
std::vector<Slice> keys;
std::vector<PinnableSlice> values;
std::vector<Status> statuses;
for ... {
keys.emplace_back(key);
}
values.resize(keys.size());
statuses.resize(keys.size());
db->MultiGet(ReadOptions(), cf, keys.size(), keys.data(), values.data(), statuses.data());
In order to avoid the overhead of memory allocations, the keys, values and statuses above can be of type std::array on stack or any other type that provides contiguous storage.
為了避免內(nèi)存分配的開銷,上面的鍵、值和狀態(tài)可以是std::array on stack 或任何其他提供連續(xù)存儲的類型。
Or
std::vector<ColumnFamilyHandle*> column_families;
std::vector<Slice> keys;
std::vector<std::string> values;
for ... {
keys.emplace_back(key);
column_families.emplace_back(column_family);
}
values.resize(keys.size());
std::vector<Status> statuses = db->MultiGet(ReadOptions(), column_families, keys, values);
For a more in-depth discussion of performance benefits of using MultiGet, see [[MultiGet Performance]].
有關(guān)使用MultiGet的性能好處的更深入的討論,請參見[[MultiGet性能]]。
Writes
Atomic Updates
Note that if the process dies after the Put of key2 but before the delete of key1, the same value may be left stored under multiple keys. Such problems can be avoided by using the <code>WriteBatch</code> class to atomically apply a set of updates:
請注意,如果進程在key2的Put之后但在刪除key1之前死亡,那么相同的值可能會保存在多個鍵下。這樣的問題可以通過使用WriteBatch類來自動應(yīng)用一組更新來避免:
#include "rocksdb/write_batch.h"
...
std::string value;
rocksdb::Status s = db->Get(rocksdb::ReadOptions(), key1, &value);
if (s.ok()) {
rocksdb::WriteBatch batch;
batch.Delete(key1);
batch.Put(key2, value);
s = db->Write(rocksdb::WriteOptions(), &batch);
}
The <code>WriteBatch</code> holds a sequence of edits to be made to the database, and these edits within the batch are applied in order. Note that we called <code>Delete</code> before <code>Put</code> so that if <code>key1</code> is identical to <code>key2</code>, we do not end up erroneously dropping the value entirely.
WriteBatch保存要對數(shù)據(jù)庫進行的編輯的序列,批處理中的這些編輯是按順序應(yīng)用的。注意,我們在Put之前調(diào)用了Delete,這樣如果key1與key2相同,我們就不會錯誤地完全放棄該值。
Apart from its atomicity benefits, <code>WriteBatch</code> may also be used to speed up bulk updates by placing lots of individual mutations into the same batch.
除了原子性的好處外,WriteBatch還可以通過將許多單獨的突變放到同一個批處理中來加快批量更新的速度。
Synchronous Writes
By default, each write to <code>rocksdb</code> is asynchronous: it returns after pushing the write from the process into the operating system. The transfer from operating system memory to the underlying persistent storage happens asynchronously. The <code>sync</code> flag can be turned on for a particular write to make the write operation not return until the data being written has been pushed all the way to persistent storage. (On Posix systems, this is implemented by calling either <code>fsync(...)</code> or <code>fdatasync(...)</code> or <code>msync(..., MS_SYNC)</code> before the write operation returns.)
默認(rèn)情況下,對rocksdb的每次寫操作都是異步的:它會在進程將寫操作推入操作系統(tǒng)后返回。從操作系統(tǒng)內(nèi)存到底層持久存儲的傳輸是異步進行的。對于特定的寫操作,可以打開同步標(biāo)志,使寫操作在被寫的數(shù)據(jù)被推到持久存儲之前不會返回。(在Posix系統(tǒng)上,這是通過調(diào)用fsync(…)或fdatasync(…)或msync(…)實現(xiàn)的。在寫操作返回之前,MS_SYNC)。
rocksdb::WriteOptions write_options;
write_options.sync = true;
db->Put(write_options, ...);
Non-sync Writes
With non-sync writes, RocksDB only buffers WAL write in OS buffer or internal buffer (when options.manual_wal_flush = true). They are often much faster than synchronous writes. The downside of non-sync writes is that a crash of the machine may cause the last few updates to be lost. Note that a crash of just the writing process (i.e., not a reboot) will not cause any loss since even when <code>sync</code> is false, an update is pushed from the process memory into the operating system before it is considered done.
對于非同步寫入,RocksDB只會在操作系統(tǒng)緩沖區(qū)或內(nèi)部緩沖區(qū)中進行WAL寫入。manual_wal_flush = true)。它們通常比同步寫要快得多。非同步寫的缺點是,機器的崩潰可能會導(dǎo)致最后幾次更新丟失。請注意,僅僅是寫入進程的崩潰(即,不是重新啟動)不會造成任何損失,因為即使sync為false,更新在被認(rèn)為完成之前,也會從進程內(nèi)存中推送到操作系統(tǒng)。
Non-sync writes can often be used safely. For example, when loading a large amount of data into the database you can handle lost updates by restarting the bulk load after a crash. A hybrid scheme is also possible where DB::SyncWAL() is called by a separate thread.
非同步寫通??梢园踩褂?。例如,當(dāng)將大量數(shù)據(jù)加載到數(shù)據(jù)庫中時,您可以在崩潰后通過重新啟動批量加載來處理丟失的更新?;旌夏J揭部梢允褂茫渲蠨B::SyncWAL()由單獨的線程調(diào)用。
We also provide a way to completely disable Write Ahead Log for a particular write. If you set <code>write_options.disableWAL</code> to true, the write will not go to the log at all and may be lost in an event of process crash.
我們還提供了一種方法來完全禁用WAL。如果你設(shè)置了write_options。如果disableal為true,則寫操作根本不會進入日志,并且可能在進程崩潰時丟失。
RocksDB by default uses <code>fdatasync()</code> to sync files, which might be faster than fsync() in certain cases. If you want to use fsync(), you can set <code>Options::use_fsync</code> to true. You should set this to true on filesystems like ext3 that can lose files after a reboot.
RocksDB默認(rèn)使用fdatasync()來同步文件,在某些情況下,這可能比fsync()更快。如果需要使用fsync(),可以將Options::use_fsync設(shè)置為true。在ext3這樣的文件系統(tǒng)上,重啟后可能會丟失文件,應(yīng)該將此設(shè)置為true。
Advanced
For more information about write performance optimizations and factors influencing performance, see [[Pipelined Write]] and [[Write Stalls]].
有關(guān)寫性能優(yōu)化和影響性能的因素的更多信息,請參見[[Pipelined Write]] 和 [[Write Stalls]]。
Concurrency
A database may only be opened by one process at a time. The <code>rocksdb</code> implementation acquires a lock from the operating system to prevent misuse. Within a single process, the same <code>rocksdb::DB</code> object may be safely shared by multiple concurrent threads. I.e., different threads may write into or fetch iterators or call <code>Get</code> on the same database without any external synchronization (the rocksdb implementation will automatically do the required synchronization). However other objects (like Iterator and WriteBatch) may require external synchronization. If two threads share such an object, they must protect access to it using their own locking protocol. More details are available in the public header files.
一個數(shù)據(jù)庫一次只能由一個進程打開。rocksdb實現(xiàn)從操作系統(tǒng)獲取一個鎖,以防止誤用。在一個進程中,同一個rocksdb::DB對象可以被多個并發(fā)線程安全地共享。也就是說,不同的線程可以寫入迭代器或獲取迭代器,或者在同一個數(shù)據(jù)庫上調(diào)用Get,而不需要任何外部同步(rocksdb實現(xiàn)會自動完成所需的同步)。然而,其他對象(如Iterator和WriteBatch)可能需要外部同步。如果兩個線程共享這樣一個對象,它們必須使用自己的鎖定協(xié)議來保護對它的訪問。更多細(xì)節(jié)可以在公共頭文件中找到。
Merge operators
Merge operators provide efficient support for read-modify-write operation.
合并操作符為讀-修改-寫操作提供了有效的支持。
More on the interface and implementation can be found on:
有關(guān)接口和實現(xiàn)的更多信息,請參閱:
- [[Merge Operator | Merge-Operator]]
- [[Merge Operator Implementation | Merge-Operator-Implementation]]
- Get Merge Operands
Iteration
The following example demonstrates how to print all (key, value) pairs in a database.
下面的例子演示了如何打印數(shù)據(jù)庫中的所有(鍵、值)對。
rocksdb::Iterator* it = db->NewIterator(rocksdb::ReadOptions());
for (it->SeekToFirst(); it->Valid(); it->Next()) {
cout << it->key().ToString() << ": " << it->value().ToString() << endl;
}
assert(it->status().ok()); // Check for any errors found during the scan
delete it;
The following variation shows how to process just the keys in the range <code>[start, limit)</code>:
下面的變化顯示了如何處理范圍內(nèi)的鍵[開始,限制]:
for (it->Seek(start);
it->Valid() && it->key().ToString() < limit;
it->Next()) {
...
}
assert(it->status().ok()); // Check for any errors found during the scan
You can also process entries in reverse order. (Caveat: reverse iteration may be somewhat slower than forward iteration.)
您也可以按相反的順序處理條目。(注意:反向迭代可能比向前迭代慢一些。)
for (it->SeekToLast(); it->Valid(); it->Prev()) {
...
}
assert(it->status().ok()); // Check for any errors found during the scan
This is an example of processing entries in range (limit, start] in reverse order from one specific key:
這是一個處理range (limit, start)中的條目的例子,從一個特定的鍵逆序開始:
for (it->SeekForPrev(start);
it->Valid() && it->key().ToString() > limit;
it->Prev()) {
...
}
assert(it->status().ok()); // Check for any errors found during the scan
See [[SeekForPrev]].
For explanation of error handling, different iterating options and best practice, see [[Iterator]].
有關(guān)錯誤處理、不同迭代選項和最佳實踐的解釋,請參見[[Iterator]]。
To know about implementation details, see Iterator's Implementation
要了解實現(xiàn)的細(xì)節(jié),請參見Iterator's Implementation
Snapshots
Snapshots provide consistent read-only views over the entire state of the key-value store. <code>ReadOptions::snapshot</code> may be non-NULL to indicate that a read should operate on a particular version of the DB state.
快照提供鍵值存儲的整個狀態(tài)的一致的只讀視圖。snapshot可以是非null,表示讀取操作應(yīng)該在DB狀態(tài)的特定版本上進行。
If <code>ReadOptions::snapshot</code> is NULL, the read will operate on an implicit snapshot of the current state.
如果ReadOptions::snapshot為NULL,則read操作將對當(dāng)前狀態(tài)的隱式快照進行操作。
Snapshots are created by the DB::GetSnapshot() method:
快照是由DB::GetSnapshot()方法創(chuàng)建的:
rocksdb::ReadOptions options;
options.snapshot = db->GetSnapshot();
... apply some updates to db ...
rocksdb::Iterator* iter = db->NewIterator(options);
... read using iter to view the state when the snapshot was created ...
delete iter;
db->ReleaseSnapshot(options.snapshot);
Note that when a snapshot is no longer needed, it should be released using the DB::ReleaseSnapshot interface. This allows the implementation to get rid of state that was being maintained just to support reading as of that snapshot.
注意,當(dāng)不再需要快照時,應(yīng)該使用DB:: releassnapshot接口來釋放它。這允許實現(xiàn)擺脫正在維護的狀態(tài),以支持讀取快照。
Slice
The return value of the <code>it->key()</code> and <code>it->value()</code> calls above are instances of the <code>rocksdb::Slice</code> type. <code>Slice</code> is a simple structure that contains a length and a pointer to an external byte array. Returning a <code>Slice</code> is a cheaper alternative to returning a <code>std::string</code> since we do not need to copy potentially large keys and values. In addition, <code>rocksdb</code> methods do not return null-terminated C-style strings since <code>rocksdb</code> keys and values are allowed to contain '\0' bytes.
上面的it->key()和it->value()調(diào)用的返回值是rocksdb::Slice類型的實例。Slice是一個簡單的結(jié)構(gòu),包含一個長度和一個指向外部字節(jié)數(shù)組的指針。返回Slice是一個比返回std::string更便宜的選擇,因為我們不需要復(fù)制可能很大的鍵和值。另外,rocksdb方法不返回以null結(jié)尾的c風(fēng)格字符串,因為rocksdb的鍵和值允許包含'\0'字節(jié)。
C++ strings and null-terminated C-style strings can be easily converted to a Slice:
c++字符串和以null結(jié)尾的C風(fēng)格字符串可以很容易地轉(zhuǎn)換為Slice:
rocksdb::Slice s1 = "hello";
std::string str("world");
rocksdb::Slice s2 = str;
A Slice can be easily converted back to a C++ string:
一個Slice可以很容易地轉(zhuǎn)換回一個c++字符串:
std::string str = s1.ToString();
assert(str == std::string("hello"));
Be careful when using Slices since it is up to the caller to ensure that the external byte array into which the Slice points remains live while the Slice is in use. For example, the following is buggy:
使用Slice時要小心,因為調(diào)用者要確保在使用Slice時,Slice點所在的外部字節(jié)數(shù)組仍處于活動狀態(tài)。例如,以下是錯誤的:
rocksdb::Slice slice;
if (...) {
std::string str = ...;
slice = str;
}
Use(slice);
When the <code>if</code> statement goes out of scope, <code>str</code> will be destroyed and the backing storage for <code>slice</code> will disappear.
當(dāng)if語句超出作用域時,str將被銷毀,slice的備份存儲也將消失。
Transactions
RocksDB now supports multi-operation transactions. See [[Transactions]]
RocksDB現(xiàn)在支持多操作事務(wù)。[[交易]]
Comparators
The preceding examples used the default ordering function for key, which orders bytes lexicographically. You can however supply a custom comparator when opening a database. For example, suppose each database key consists of two numbers and we should sort by the first number, breaking ties by the second number. First, define a proper subclass of <code>rocksdb::Comparator</code> that expresses these rules:
前面的示例使用了默認(rèn)的key排序函數(shù),該函數(shù)按字典順序排序字節(jié)。不過,您可以在打開數(shù)據(jù)庫時提供一個自定義比較器。例如,假設(shè)每個數(shù)據(jù)庫鍵由兩個數(shù)字組成,我們應(yīng)該按第一個數(shù)字排序,打破按第二個數(shù)字排序的僵局。首先,定義一個合適的rocksdb::Comparator子類來表達以下規(guī)則:
class TwoPartComparator : public rocksdb::Comparator {
public:
// Three-way comparison function:
// if a < b: negative result
// if a > b: positive result
// else: zero result
int Compare(const rocksdb::Slice& a, const rocksdb::Slice& b) const {
int a1, a2, b1, b2;
ParseKey(a, &a1, &a2);
ParseKey(b, &b1, &b2);
if (a1 < b1) return -1;
if (a1 > b1) return +1;
if (a2 < b2) return -1;
if (a2 > b2) return +1;
return 0;
}
// Ignore the following methods for now:
const char* Name() const { return "TwoPartComparator"; }
void FindShortestSeparator(std::string*, const rocksdb::Slice&) const { }
void FindShortSuccessor(std::string*) const { }
};
Now create a database using this custom comparator:
現(xiàn)在用這個自定義比較器創(chuàng)建一個數(shù)據(jù)庫:
TwoPartComparator cmp;
rocksdb::DB* db;
rocksdb::Options options;
options.create_if_missing = true;
options.comparator = &cmp;
rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &db);
...
Column Families
[[Column Families]] provide a way to logically partition the database. Users can provide atomic writes of multiple keys across multiple column families and read a consistent view from them.
[[Column Families]]提供了一種邏輯分區(qū)數(shù)據(jù)庫的方法。用戶可以跨多個列族提供多個鍵的原子寫入,并從中讀取一致的視圖。
Bulk Load
You can [[Creating and Ingesting SST files]] to bulk load a large amount of data directly into DB with minimum impacts on the live traffic.
您可以[[Creating and Ingesting SST files]]將大量的數(shù)據(jù)直接批量加載到DB中,對實時流量的影響最小。
Backup and Checkpoint
Backup allows users to create periodic incremental backups in a remote file system (think about HDFS or S3) and recover from any of them.
備份允許用戶在遠(yuǎn)程文件系統(tǒng)(例如HDFS或S3)中創(chuàng)建定期增量備份,并從其中恢復(fù)。
[[Checkpoints]] provides the ability to take a snapshot of a running RocksDB database in a separate directory. Files are hardlinked, rather than copied, if possible, so it is a relatively lightweight operation.
[[Checkpoints]]提供了在一個單獨的目錄下對運行中的RocksDB數(shù)據(jù)庫進行快照的能力。如果可能的話,文件是硬鏈接的,而不是復(fù)制的,所以它是一個相對輕量級的操作。
I/O
By default, RocksDB's I/O goes through operating system's page cache. Setting [[Rate Limiter]] can limit the speed that RocksDB issues file writes, to make room for read I/Os.
在默認(rèn)情況下,RocksDB的I/O將通過操作系統(tǒng)的頁面緩存。通過設(shè)置[[Rate elimiter]],可以限制RocksDB的文件寫入速度,為讀I/O留出空間。
Users can also choose to bypass operating system's page cache, using Direct I/O.
用戶也可以選擇繞過操作系統(tǒng)的頁面緩存,使用Direct I/O。
See [[IO]] for more details.
詳見[[IO]]。
Backwards compatibility
The result of the comparator's <code>Name</code> method is attached to the database when it is created, and is checked on every subsequent database open. If the name changes, the <code>rocksdb::DB::Open</code> call will fail. Therefore, change the name if and only if the new key format and comparison function are incompatible with existing databases, and it is ok to discard the contents of all existing databases.
在創(chuàng)建數(shù)據(jù)庫時,比較器的Name方法的結(jié)果被附加到數(shù)據(jù)庫中,并在隨后打開的每個數(shù)據(jù)庫中進行檢查。如果名稱改變,則rocksdb::DB::Open調(diào)用將失敗。因此,當(dāng)且僅當(dāng)新的鍵格式和比較函數(shù)與現(xiàn)有數(shù)據(jù)庫不兼容時,更改名稱,并且可以丟棄所有現(xiàn)有數(shù)據(jù)庫的內(nèi)容。
You can however still gradually evolve your key format over time with a little bit of pre-planning. For example, you could store a version number at the end of each key (one byte should suffice for most uses).
然而,你仍然可以在預(yù)先計劃的情況下,隨著時間的推移逐步發(fā)展你的key格式。例如,您可以在每個鍵的末尾存儲一個版本號(對于大多數(shù)使用,一個字節(jié)應(yīng)該足夠了)。
When you wish to switch to a new key format (e.g., adding an optional third part to the keys processed by <code>TwoPartComparator</code>),
當(dāng)您希望切換到一個新的密鑰格式(例如,添加一個可選的第三部分密鑰由TwoPartComparator處理),
(a) keep the same comparator name
保持相同的比較器名稱
(b) increment the version number for new keys
增加新密鑰版本號
(c) change the comparator function so it uses the version numbers found in the keys to decide how to interpret them.
改變比較器函數(shù)所以它使用版本號在決定如何解釋他們的關(guān)鍵。
MemTable and Table factories
By default, we keep the data in memory in skiplist memtable and the data on disk in a table format described here: <a >RocksDB Table Format</a>.
默認(rèn)情況下,我們會將內(nèi)存中的數(shù)據(jù)保存在skip memtable中,而將磁盤中的數(shù)據(jù)保存在如下所示的表格格式中:RocksDB表格格式。
Since one of the goals of RocksDB is to have different parts of the system easily pluggable, we support different implementations of both memtable and table format. You can supply your own memtable factory by setting <code>Options::memtable_factory</code> and your own table factory by setting <code>Options::table_factory</code>. For available memtable factories, please refer to <code>rocksdb/memtablerep.h</code> and for table factories to <code>rocksdb/table.h</code>. These features are both in active development and please be wary of any API changes that might break your application going forward.
由于RocksDB的目標(biāo)之一是讓系統(tǒng)的不同部分能夠輕松插入,所以我們支持memtable和table格式的不同實現(xiàn)。你可以通過設(shè)置Options::memtable_factory來提供你自己的memtable工廠,也可以通過設(shè)置Options::table_factory來提供你自己的table工廠。對于可用的memtable工廠,請參考rocksdb/memtablerep.h,對于表工廠,請參考rocksdb/table.h。這些特性都在積極開發(fā)中,請小心任何可能會破壞應(yīng)用程序的API更改。
You can also read more about memtables here and [[here|MemTable]].
你也可以在這里和[[here|MemTable]]閱讀更多關(guān)于memtables的信息。
Performance
Start with [[Setup Options and Basic Tuning]]. For more information about RocksDB performance, see the "Performance" section in the sidebar in the right side.
從[[Setup Options and Basic Tuning]]開始。有關(guān)RocksDB性能的更多信息,請參見右側(cè)欄的“性能”部分。
Block size
<code>rocksdb</code> groups adjacent keys together into the same block and such a block is the unit of transfer to and from persistent storage. The default block size is approximately 4096 uncompressed bytes. Applications that mostly do bulk scans over the contents of the database may wish to increase this size. Applications that do a lot of point reads of small values may wish to switch to a smaller block size if performance measurements indicate an improvement. There isn't much benefit in using blocks smaller than one kilobyte, or larger than a few megabytes. Also note that compression will be more effective with larger block sizes. To change block size parameter, use <code>Options::block_size</code>.
Rocksdb將相鄰的鍵分組到同一個塊中,這樣的塊就是與持久存儲進行傳輸?shù)膯挝弧DJ(rèn)的塊大小大約是4096個未壓縮字節(jié)。主要對數(shù)據(jù)庫內(nèi)容進行批量掃描的應(yīng)用程序可能希望增加這個大小。如果性能測量表明有改進,那么對小值進行大量點讀取的應(yīng)用程序可能希望切換到更小的塊大小。使用小于1千字節(jié)或大于幾兆字節(jié)的塊沒有什么好處。還要注意的是,壓縮將會在較大的塊大小時更有效。要更改塊大小參數(shù),請使用Options::block_size。
Write buffer
<code>Options::write_buffer_size</code> specifies the amount of data to build up in memory before converting to a sorted on-disk file. Larger values increase performance, especially during bulk loads. Up to max_write_buffer_number write buffers may be held in memory at the same time, so you may wish to adjust this parameter to control memory usage. Also, a larger write buffer will result in a longer recovery time the next time the database is opened.
write_buffer_size指定在轉(zhuǎn)換為已排序的磁盤文件之前要在內(nèi)存中積累的數(shù)據(jù)量。較大的值可以提高性能,特別是在批量加載期間。最高max_write_buffer_number的寫緩沖區(qū)可以同時保存在內(nèi)存中,因此您可能希望調(diào)整這個參數(shù)來控制內(nèi)存使用。另外,更大的寫緩沖區(qū)將導(dǎo)致下一次打開數(shù)據(jù)庫時更長的恢復(fù)時間。
Related option is <code>Options::max_write_buffer_number</code>, which is maximum number of write buffers that are built up in memory. The default is 2, so that when 1 write buffer is being flushed to storage, new writes can continue to the other write buffer. The flush operation is executed in a [[Thread Pool]].
相關(guān)選項為Options::max_write_buffer_number,它是內(nèi)存中構(gòu)建的最大寫緩沖區(qū)數(shù)。默認(rèn)值是2,因此當(dāng)一個寫緩沖區(qū)被刷新到存儲時,新的寫可以繼續(xù)到另一個寫緩沖區(qū)。刷新操作在[[Thread Pool]]中執(zhí)行。
<code>Options::min_write_buffer_number_to_merge</code> is the minimum number of write buffers that will be merged together before writing to storage. If set to 1, then all write buffers are flushed to L0 as individual files and this increases read amplification because a get request has to check all of these files. Also, an in-memory merge may result in writing lesser data to storage if there are duplicate records in each of these individual write buffers. Default: 1
min_write_buffer_number_to_merge是寫入存儲之前將合并在一起的寫緩沖區(qū)的最小數(shù)量。如果設(shè)置為1,那么所有的寫緩沖區(qū)都將作為單獨的文件刷新到L0,這將增加讀放大,因為get請求必須檢查所有這些文件。此外,如果每個單獨的寫緩沖區(qū)中都有重復(fù)的記錄,那么內(nèi)存中的合并可能會導(dǎo)致向存儲空間寫入較少的數(shù)據(jù)。默認(rèn)值:1
Compression
Each block is individually compressed before being written to persistent storage. Compression is on by default since the default compression method is very fast, and is automatically disabled for uncompressible data. In rare cases, applications may want to disable compression entirely, but should only do so if benchmarks show a performance improvement:
每個塊在被寫入持久存儲之前都被單獨壓縮。默認(rèn)情況下,壓縮是打開的,因為默認(rèn)的壓縮方法非常快,并且對于不可壓縮的數(shù)據(jù)自動禁用。在極少數(shù)情況下,應(yīng)用程序可能希望完全禁用壓縮,但只有在基準(zhǔn)測試顯示性能提高時才應(yīng)該這樣做:
rocksdb::Options options;
options.compression = rocksdb::kNoCompression;
... rocksdb::DB::Open(options, name, ...) ....
Also [[Dictionary Compression]] is also available.
此外[[Dictionary Compression]]也是可用的。
Cache
The contents of the database are stored in a set of files in the filesystem and each file stores a sequence of compressed blocks. If <code>options.block_cache</code> is non-NULL, it is used to cache frequently used uncompressed block contents. We use operating systems file cache to cache our raw data, which is compressed. So file cache acts as a cache for compressed data.
數(shù)據(jù)庫的內(nèi)容存儲在文件系統(tǒng)中的一組文件中,每個文件存儲一系列壓縮塊。如果選項。block_cache是非空的,用于緩存常用的未壓縮塊內(nèi)容。我們使用操作系統(tǒng)文件緩存來緩存被壓縮的原始數(shù)據(jù)。因此,文件緩存充當(dāng)了壓縮數(shù)據(jù)的緩存。
#include "rocksdb/cache.h"
rocksdb::BlockBasedTableOptions table_options;
table_options.block_cache = rocksdb::NewLRUCache(100 * 1048576); // 100MB uncompressed cache
rocksdb::Options options;
options.table_factory.reset(rocksdb::NewBlockBasedTableFactory(table_options));
rocksdb::DB* db;
rocksdb::DB::Open(options, name, &db);
... use the db ...
delete db
When performing a bulk read, the application may wish to disable caching so that the data processed by the bulk read does not end up displacing most of the cached contents. A per-iterator option can be used to achieve this:
在執(zhí)行批量讀取時,應(yīng)用程序可能希望禁用緩存,以便由批量讀取處理的數(shù)據(jù)最終不會替換大多數(shù)緩存的內(nèi)容??梢允褂妹康鬟x項來實現(xiàn)這一點:
rocksdb::ReadOptions options;
options.fill_cache = false;
rocksdb::Iterator* it = db->NewIterator(options);
for (it->SeekToFirst(); it->Valid(); it->Next()) {
...
}
You can also disable block cache by setting <code>options.no_block_cache</code> to true.
您還可以通過設(shè)置選項禁用塊緩存。no_block_cache為true。
See [[Block Cache]] for more details.
詳情請參見[[Block Cache]]。
Key Layout
Note that the unit of disk transfer and caching is a block. Adjacent keys (according to the database sort order) will usually be placed in the same block. Therefore the application can improve its performance by placing keys that are accessed together near each other and placing infrequently used keys in a separate region of the key space.
注意,磁盤傳輸和緩存的單位是一個塊。相鄰的鍵(根據(jù)數(shù)據(jù)庫排序順序)通常被放在同一個塊中。因此,應(yīng)用程序可以通過將被訪問的鍵放在相鄰的位置,并將不經(jīng)常使用的鍵放在鍵空間的單獨區(qū)域中來提高性能。
For example, suppose we are implementing a simple file system on top of <code>rocksdb</code>. The types of entries we might wish to store are:
例如,假設(shè)我們正在rocksdb上實現(xiàn)一個簡單的文件系統(tǒng)。我們可能希望存儲的條目類型是:
filename -> permission-bits, length, list of file_block_ids
file_block_id -> data
We might want to prefix <code>filename</code> keys with one letter (say '/') and the <code>file_block_id</code> keys with a different letter (say '0') so that scans over just the metadata do not force us to fetch and cache bulky file contents.
我們可能想要文件名鍵的前綴是一個字母(比如'/'),而file_block_id鍵的前綴是一個不同的字母(比如'0'),這樣只掃描元數(shù)據(jù)就不會迫使我們獲取和緩存大量的文件內(nèi)容。
Filters
Because of the way <code>rocksdb</code> data is organized on disk, a single <code>Get()</code> call may involve multiple reads from disk. The optional <code>FilterPolicy</code> mechanism can be used to reduce the number of disk reads substantially.
由于rocksdb數(shù)據(jù)在磁盤上的組織方式,一個Get()調(diào)用可能涉及多個磁盤讀取。可選的FilterPolicy機制可以大大減少磁盤讀取的數(shù)量。
rocksdb::Options options;
rocksdb::BlockBasedTableOptions bbto;
bbto.filter_policy.reset(rocksdb::NewBloomFilterPolicy(
10 /* bits_per_key */,
false /* use_block_based_builder*/));
options.table_factory.reset(rocksdb::NewBlockBasedTableFactory(bbto));
rocksdb::DB* db;
rocksdb::DB::Open(options, "/tmp/testdb", &db);
... use the database ...
delete db;
delete options.filter_policy;
The preceding code associates a [[Bloom Filter | RocksDB-Bloom-Filter]] based filtering policy with the database. Bloom filter based filtering relies on keeping some number of bits of data in memory per key (in this case 10 bits per key since that is the argument we passed to NewBloomFilter). This filter will reduce the number of unnecessary disk reads needed for <code>Get()</code> calls by a factor of approximately a 100. Increasing the bits per key will lead to a larger reduction at the cost of more memory usage. We recommend that applications whose working set does not fit in memory and that do a lot of random reads set a filter policy.
上述代碼將基于[[Bloom Filter | RocksDB-Bloom-Filter]]的過濾策略與數(shù)據(jù)庫關(guān)聯(lián)?;贐loom過濾器的過濾依賴于在內(nèi)存中每個鍵保留一定數(shù)量的數(shù)據(jù)位(在本例中每個鍵保留10位,因為這是我們傳遞給NewBloomFilter的參數(shù))。此篩選器將減少Get()調(diào)用所需的不必要磁盤讀取數(shù)量,大約為100倍。增加每個鍵的比特將導(dǎo)致更大的減少,但代價是更多的內(nèi)存使用。我們建議工作集不適合內(nèi)存的應(yīng)用程序設(shè)置一個過濾策略,并進行大量隨機讀取。
If you are using a custom comparator, you should ensure that the filter policy you are using is compatible with your comparator. For example, consider a comparator that ignores trailing spaces when comparing keys. <code>NewBloomFilter</code> must not be used with such a comparator. Instead, the application should provide a custom filter policy that also ignores trailing spaces.
如果您使用的是自定義比較器,那么您應(yīng)該確保所使用的篩選策略與您的比較器兼容。例如,考慮在比較鍵時忽略尾隨空格的比較器。NewBloomFilter不能與這樣的比較器一起使用。相反,應(yīng)用程序應(yīng)該提供一個自定義的過濾策略,該策略也應(yīng)該忽略尾隨空格。
For example:
class CustomFilterPolicy : public rocksdb::FilterPolicy {
private:
FilterPolicy* builtin_policy_;
public:
CustomFilterPolicy() : builtin_policy_(NewBloomFilter(10, false)) { }
~CustomFilterPolicy() { delete builtin_policy_; }
const char* Name() const { return "IgnoreTrailingSpacesFilter"; }
void CreateFilter(const Slice* keys, int n, std::string* dst) const {
// Use builtin bloom filter code after removing trailing spaces
std::vector<Slice> trimmed(n);
for (int i = 0; i < n; i++) {
trimmed[i] = RemoveTrailingSpaces(keys[i]);
}
return builtin_policy_->CreateFilter(&trimmed[i], n, dst);
}
bool KeyMayMatch(const Slice& key, const Slice& filter) const {
// Use builtin bloom filter code after removing trailing spaces
return builtin_policy_->KeyMayMatch(RemoveTrailingSpaces(key), filter);
}
};
Advanced applications may provide a filter policy that does not use a bloom filter but uses some other mechanisms for summarizing a set of keys. See <code>rocksdb/filter_policy.h</code> for detail.
高級應(yīng)用程序可能提供不使用bloom過濾器的過濾策略,但使用其他一些機制來匯總一組鍵。請參見rocksdb/filter_policy.h。
Checksums
<code>rocksdb</code> associates checksums with all data it stores in the file system. There are two separate controls provided over how aggressively these checksums are verified:
Rocksdb將校驗和與存儲在文件系統(tǒng)中的所有數(shù)據(jù)關(guān)聯(lián)起來。對于這些校驗和的驗證力度有兩種不同的控制:
<ul>
<li>
<code>ReadOptions::verify_checksums</code> forces checksum verification of all data that is read from the file system on behalf of a particular read. This is on by default.
ReadOptions::verify_checksum強制對代表特定讀操作從文件系統(tǒng)讀取的所有數(shù)據(jù)進行校驗和驗證。這是默認(rèn)開啟的。
<li> <code>Options::paranoid_checks</code> may be set to true before opening a database to make the database implementation raise an error as soon as it detects an internal corruption. Depending on which portion of the database has been corrupted, the error may be raised when the database is opened, or later by another database operation. By default, paranoid checking is on.
Options::paranoid_checks可以在打開數(shù)據(jù)庫之前設(shè)置為true,以使數(shù)據(jù)庫實現(xiàn)在檢測到內(nèi)部損壞時立即引發(fā)錯誤。根據(jù)數(shù)據(jù)庫的哪個部分已損壞,該錯誤可能在打開數(shù)據(jù)庫時引發(fā),或者稍后由另一個數(shù)據(jù)庫操作引發(fā)。默認(rèn)情況下,偏執(zhí)檢查是打開的。
</ul>
Checksum verification can also be manually triggered by calling DB::VerifyChecksum(). This API walks through all the SST files in all levels for all column families, and for each SST file, verifies the checksum embedded in the metadata and data blocks. At present, it is only supported for the BlockBasedTable format. The files are verified serially, so the API call may take a significant amount of time to finish. This API can be useful for proactive verification of data integrity in a distributed system, for example, where a new replica can be created if the database is found to be corrupt.
校驗和校驗也可以通過調(diào)用DB::VerifyChecksum()來手動觸發(fā)。這個API遍歷所有列族的所有級別的所有SST文件,并對每個SST文件驗證嵌入元數(shù)據(jù)和數(shù)據(jù)塊中的校驗和。目前,它只支持BlockBasedTable格式。這些文件是串行驗證的,因此API調(diào)用可能要花很長時間才能完成。這個API對于分布式系統(tǒng)中的數(shù)據(jù)完整性的主動驗證非常有用,例如,在分布式系統(tǒng)中,如果發(fā)現(xiàn)數(shù)據(jù)庫損壞,可以創(chuàng)建一個新的副本。
If a database is corrupted (perhaps it cannot be opened when paranoid checking is turned on), the <code>rocksdb::RepairDB</code> function may be used to recover as much of the data as possible.
如果數(shù)據(jù)庫損壞了(可能在啟用偏執(zhí)檢查時無法打開),可以使用rocksdb::RepairDB函數(shù)來恢復(fù)盡可能多的數(shù)據(jù)。
Compaction
RocksDB keeps rewriting existing data files. This is to clean stale versions of keys, and to keep the data structure optimal for reads.
RocksDB一直在重寫現(xiàn)有的數(shù)據(jù)文件。這是為了清除過時的鍵版本,并保持?jǐn)?shù)據(jù)結(jié)構(gòu)的最佳讀取。
The information about compaction has been moved to Compaction. Users don't have to know internal of compactions before operating RocksDB.
關(guān)于壓縮的信息已移動到“壓縮”。用戶在運行RocksDB之前不需要了解內(nèi)部壓縮。
Approximate Sizes
The <code>GetApproximateSizes</code> method can be used to get the approximate number of bytes of file system space used by one or more key ranges.
GetApproximateSizes方法可用于獲取一個或多個鍵范圍使用的文件系統(tǒng)空間的大約字節(jié)數(shù)。
rocksdb::Range ranges[2];
ranges[0] = rocksdb::Range("a", "c");
ranges[1] = rocksdb::Range("x", "z");
uint64_t sizes[2];
db->GetApproximateSizes(ranges, 2, sizes);
The preceding call will set <code>sizes[0]</code> to the approximate number of bytes of file system space used by the key range <code>[a..c)</code> and <code>sizes[1]</code> to the approximate number of bytes used by the key range <code>[x..z)</code>.
前面的調(diào)用將設(shè)置sizes[0]為鍵范圍[a..c]所使用的文件系統(tǒng)空間的大約字節(jié)數(shù),設(shè)置sizes[1]為鍵范圍[x..z]所使用的大約字節(jié)數(shù)。
Environment
All file operations (and other operating system calls) issued by the <code>rocksdb</code> implementation are routed through a <code>rocksdb::Env</code> object. Sophisticated clients may wish to provide their own <code>Env</code> implementation to get better control. For example, an application may introduce artificial delays in the file IO paths to limit the impact of <code>rocksdb</code> on other activities in the system.
由rocksdb實現(xiàn)發(fā)出的所有文件操作(以及其他操作系統(tǒng)調(diào)用)都通過一個rocksdb::Env對象進行路由。成熟的客戶可能希望提供他們自己的Env實現(xiàn)以獲得更好的控制。例如,應(yīng)用程序可能會在文件IO路徑中引入人為延遲,以限制rocksdb對系統(tǒng)中其他活動的影響。
class SlowEnv : public rocksdb::Env {
.. implementation of the Env interface ...
};
SlowEnv env;
rocksdb::Options options;
options.env = &env;
Status s = rocksdb::DB::Open(options, ...);
Porting
<code>rocksdb</code> may be ported to a new platform by providing platform specific implementations of the types/methods/functions exported by <code>rocksdb/port/port.h</code>. See <code>rocksdb/port/port_example.h</code> for more details.
通過提供Rocksdb /port/port.h導(dǎo)出的類型/方法/函數(shù)的特定平臺實現(xiàn),Rocksdb可以被移植到一個新的平臺上。詳見rocksdb/port/port_example.h。
In addition, the new platform may need a new default <code>rocksdb::Env</code> implementation. See <code>rocksdb/util/env_posix.h</code> for an example.
此外,新平臺可能需要一個新的默認(rèn)rocksdb::Env實現(xiàn)。示例請參見rocksdb/util/env_posix.h。
Manageability
To be able to efficiently tune your application, it is always helpful if you have access to usage statistics. You can collect those statistics by setting <code>Options::table_properties_collectors</code> or <code>Options::statistics</code>. For more information, refer to <code>rocksdb/table_properties.h</code> and <code>rocksdb/statistics.h</code>. These should not add significant overhead to your application and we recommend exporting them to other monitoring tools. See [[Statistics]]. You can also profile single requests using [[Perf Context and IO Stats Context]]. Users can register [[EventListener]] for callbacks for some internal events.
為了能夠有效地調(diào)優(yōu)應(yīng)用程序,能夠訪問使用統(tǒng)計數(shù)據(jù)總是很有幫助的。您可以通過設(shè)置<code>Options::table_properties_collectors</code>或<code>Options::statistics</code>來收集這些統(tǒng)計信息。更多信息,請參考<code>rocksdb/table_properties.h</code>和<code>rocksdb/statistics.h</code>。這些不會給您的應(yīng)用程序增加很大的開銷,我們建議將它們導(dǎo)出到其他監(jiān)視工具中。[[Statistics]]。你也可以使用[[Perf Context and IO Stats Context]]來分析單個請求。用戶可以為一些內(nèi)部事件的回調(diào)注冊[[EventListener]]。
Purging WAL files
By default, old write-ahead logs are deleted automatically when they fall out of scope and application doesn't need them anymore. There are options that enable the user to archive the logs and then delete them lazily, either in TTL fashion or based on size limit.
默認(rèn)情況下,當(dāng)舊的預(yù)寫日志超出范圍且應(yīng)用程序不再需要它們時,將自動刪除它們。有一些選項允許用戶對日志進行歸檔,然后根據(jù)TTL方式或大小限制惰性地刪除它們。
The options are <code>Options::WAL_ttl_seconds</code> and <code>Options::WAL_size_limit_MB</code>. Here is how they can be used:
設(shè)置項為:options::WAL_ttl_seconds和options::WAL_size_limit_MB。下面是它們的用法:
<ul>
<li>
If both set to 0, logs will be deleted asap and will never get into the archive.
如果兩者都設(shè)置為0,則日志將被盡快刪除,并且永遠(yuǎn)不會進入存檔。
<li>
If <code>WAL_ttl_seconds</code> is 0 and WAL_size_limit_MB is not 0, WAL files will be checked every 10 min and if total size is greater then <code>WAL_size_limit_MB</code>, they will be deleted starting with the earliest until size_limit is met. All empty files will be deleted.
如果WAL_ttl_seconds為0且WAL_size_limit_MB不為0,則WAL文件將每10分鐘檢查一次,如果總大小大于WAL_size_limit_MB,則從最早的文件開始刪除,直到滿足size_limit。所有空文件將被刪除。
<li>
If <code>WAL_ttl_seconds</code> is not 0 and WAL_size_limit_MB is 0, then WAL files will be checked every <code>WAL_ttl_seconds / 2</code> and those that are older than WAL_ttl_seconds will be deleted.
如果WAL_ttl_seconds不為0且WAL_size_limit_MB為0,那么每WAL_ttl_seconds / 2都會檢查WAL文件,并且那些比WAL_ttl_seconds早的文件將被刪除。
<li>
If both are not 0, WAL files will be checked every 10 min and both checks will be performed with ttl being first.
如果兩者都不為0,則WAL文件將每10分鐘檢查一次,兩次檢查都將首先執(zhí)行ttl。
</ul>
Other Information
To set up RocksDB options:
設(shè)置RocksDB選項:
- Set Up Options And Basic Tuning
- Some detailed Tuning Guide
Details about the <code>rocksdb</code> implementation may be found in the following documents:
關(guān)于rocksdb實現(xiàn)的詳細(xì)信息可以在以下文檔中找到:
- RocksDB Overview and Architecture
- Format of an immutable Table file
- <a href="log_format.txt">Format of a log file</a>
</ul>