原網(wǎng)址:https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ
有道+自己翻譯
Building RocksDB
Q: What is the absolute minimum version of gcc that we need to build RocksDB?
gcc最低版本
A: 4.8.
Q: What is RocksDB's latest stable release?
最新穩(wěn)定版獲取方式
A: All the releases in https://github.com/facebook/rocksdb/releases are stable. For RocksJava, stable releases are available in https://oss.sonatype.org/#nexus-search;quick~rocksdb.
地址為https://github.com/facebook/rocksdb/releases。對于java用戶,穩(wěn)定版地址為https://oss.sonatype.org/#nexus-search;quick~rocksdb.
Basic Read/Write
Q: Are basic operations Put(), Write(), Get() and NewIterator() thread safe?
基本操作Put(), Write(), Get() 和 NewIterator()是否線程安全?
A: Yes.
Q: Can I write to RocksDB using multiple processes?
是否可用于對進(jìn)程應(yīng)用?
A: No. However, it can be opened using Secondary DB. If no write goes to the database, it can be opened in read-only mode from multiple processes.
不可以??梢允褂脧膸齑蜷_。寫操作只能有一個進(jìn)程只想,其他的為只讀模式打開。
Q: Does RocksDB support multi-process read access?
是否支持多進(jìn)程讀操作?
A: Yes, you can read it using secondary database using DB::OpenAsSecondary(). RocksDB can also support multi-process read only process without writing the database. This can be done by opening the database with DB::OpenForReadOnly() call.
支持,可以使用從庫打開。支持多進(jìn)程只讀操作。可以通過函數(shù)DB::OpenForReadOnly()完成。
Q: Is it safe to close RocksDB while another thread is issuing read, write or manual compaction requests?
有其他線程在執(zhí)行讀、寫或者合并操作時是否可以關(guān)閉DB?
A: No. The users of RocksDB need to make sure all functions have finished before they close RocksDB. You can speed up the waiting by calling CancelAllBackgroundWork().
不可以。用戶需要保證這些操作完成之后才可以關(guān)閉。你可以通過調(diào)用' cancelallback地基()'來加速等待。
Q: What's the maximum key and value sizes supported?
支持的最大key,value大???
A: In general, RocksDB is not designed for large keys. The maximum recommended sizes for key and value are 8MB and 3GB respectively.
一般情況,rocksdb不是設(shè)計用于大key的。最大推薦是key 8MB,value 3GB。
Q: What's the fastest way to load data into RocksDB?
最快的load data方式是什么?
A: A fast way to direct insert data to the DB:
快速插入數(shù)據(jù)方式:
- using single writer thread and insert in sorted order
單線程寫如排序數(shù)據(jù) - batch hundreds of keys into one write batch
數(shù)百key執(zhí)行batch操作 - use vector memtable
使用vector memtable - make sure
options.max_background_flushesis at least 4
options.max_background_flushes至少是4 - before inserting the data, disable automatic compaction, set
options.level0_file_num_compaction_trigger,options.level0_slowdown_writes_triggerandoptions.level0_stop_writes_triggerto very large value. After inserting all the data, issue a manual compaction.
插入之前關(guān)閉自動compact,設(shè)置options.level0_file_num_compaction_trigger,options.level0_slowdown_writes_trigger和options.level0_stop_writes_trigger到很大的值。在插入完成之后,執(zhí)行手動compact。
3-5 will be automatically done if you call Options::PrepareForBulkLoad() to your option
調(diào)用Options::PrepareForBulkLoad()之后3~5回自動完成。
If you can pre-process the data offline before inserting. There is a faster way: you can sort the data, generate SST files with non-overlapping ranges in parallel and bulk load the SST files. See https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files
如果可以在插入之前預(yù)處理數(shù)據(jù)。會有更快的方式:排序數(shù)據(jù),生成不重合的sst文件,只想批量加載sst。參考https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files
Q: What is the correct way to delete the DB? Can I simply call DestroyDB() on a live DB?
正確刪除DB方式?只是執(zhí)行DestroyDB()可以嗎?
A: Close the DB then destroy the DB is the correct way. Calling DestroyDB() on a live DB is an undefined behavior.
關(guān)閉DB之后destory是正確方式。在運(yùn)行的DB上執(zhí)行DestroyDB()是為定義行為。
Q: What is the difference between DestroyDB() and directly deleting the DB directory manually?
執(zhí)行DestroyDB()和手動刪除DB目錄的區(qū)別是啥?
A: The major difference is that DestroyDB() will take care of the case where the RocksDB database is stored in multiple directories. For instance, a single DB can be configured to store its data in multiple directories by specifying different paths to DBOptions::db_paths, DBOptions::db_log_dir, and DBOptions::wal_dir.
主要區(qū)別是DestroyDB()會處理有多個目錄的情況。例如,可以通過DBOptions::db_paths, DBOptions::db_log_dir, 和 DBOptions::wal_dir設(shè)置不同內(nèi)容的目錄在不同地方。
Q: Any better way to dump key-value pairs generated by map-reduce job into RocksDB?
是否有更好的方法將map-reduce job生成的鍵值對轉(zhuǎn)儲到RocksDB?
A: A better way is to use SstFileWriter, which allows you to directly create RocksDB SST files and add them to a RocksDB database. However, if you're adding SST files to an existing RocksDB database, then its key-range must not overlap with the database. https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files
一個更好的方法是使用SstFileWriter,它允許你直接創(chuàng)建RocksDB的SST文件,并將它們添加到RocksDB數(shù)據(jù)庫中。然而,如果你要將SST文件添加到現(xiàn)有的RocksDB數(shù)據(jù)庫中,那么它的鍵范圍就不能與數(shù)據(jù)庫重疊。
Q: Is it safe to read from or write to RocksDB inside compaction filter callback?
在壓縮過濾器回調(diào)中讀取或?qū)懭隦ocksDB是否安全?
A: It is safe to read but not always safe to write to RocksDB inside compaction filter callback as write might trigger deadlock when write-stop condition is triggered.
讀取是安全的,但在壓縮過濾器回調(diào)中寫入RocksDB并不總是安全的,因為當(dāng)寫入條件被觸發(fā)時,可能會觸發(fā)死鎖。
Q: Does RocksDB hold SST files and memtables for a snapshot?
RocksDB是否為快照保存SST文件和memtables?
A: No. See https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#gets-iterators-and-snapshots for how snapshots work.
不會。關(guān)于snapshots,參見https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#gets-iterators-and-snapshots
Q: With DBWithTTL, is there a time bound for the expired keys to be removed?
使用' DBWithTTL ',是否有一個時間限制來刪除過期的密鑰?
A: DBwithTTL itself does not provide an upper time bound. Expired keys will be removed when they are part of any compaction. However, there is no guarantee that when such compaction will start. For instance, if you have a certain key-range that is never updated, then compaction is less likely to apply to that key-range. For leveled compaction, you can enforce some limit using the feature of periodic compaction to do that. The feature right now has a limitation: if the write rate is too slow that memtable flush is never triggered, the periodic compaction won't be triggered either.
' DBwithTTL '本身沒有提供上限時間。過期的密鑰將被刪除,當(dāng)它們是任何壓縮的一部分。然而,不能保證何時會開始這樣的壓實(shí)。例如,如果您有一個永遠(yuǎn)不會更新的鍵范圍,那么壓縮就不太可能應(yīng)用于該鍵范圍。對于分級壓縮,您可以使用定期壓縮的特性來執(zhí)行一些限制。這個特性現(xiàn)在有一個限制:如果寫速率太慢,memtable flush不會被觸發(fā),那么周期性的壓縮也不會被觸發(fā)。
Q: If I delete a column family, and I didn't yet delete the column family handle, can I still use it to access the data?
如果我刪除了一個列族,而我還沒有刪除列族句柄,我還可以使用它來訪問數(shù)據(jù)嗎?
A: Yes. DropColumnFamily() only marks the specified column family as dropped, and it will not be dropped until its reference count goes to zero and marked as dropped.
是的。' DropColumnFamily()'只將指定的列族標(biāo)記為已刪除,并且在其引用計數(shù)為0并標(biāo)記為已刪除之前不會被刪除。
Q: Why does RocksDB issue reads from the disk when I only make write request?
為什么當(dāng)我只發(fā)出寫請求時,RocksDB會從磁盤讀取數(shù)據(jù)?
A: Such IO reads are from compactions. RocksDB compaction reads from one or more SST files, perform merge-sort like operation, generate new SST files, and delete the old SST files it inputs.
這樣的IO讀取來自壓縮。RocksDB壓縮從一個或多個SST文件中讀取數(shù)據(jù),進(jìn)行類似合并排序的操作,生成新的SST文件,并刪除其輸入的舊SST文件。
Q: Is block_size before compression , or after?
block_size是在壓縮之前還是壓縮之后?
A: block_size is for size before compression.
Block_size表示壓縮前的大小
Q: After using options.prefix_extractor, I sometimes see wrong results. What's wrong?
在使用的選項。prefix_extractor ',我有時會看到錯誤的結(jié)果。怎么了?
A: There are limitations in options.prefix_extractor. If prefix iterating is used, doesn't support Prev() or SeekToLast(), and many operations don't support SeekToFirst() either. A common mistake is to seek the last key of a prefix by calling Seek(), followed by Prev(). This is, however, not supported. Currently there is no way to find the last key of prefix with prefix iterating. Also, you can't continue iterating keys after finishing the prefix you seek to. In places where those operations are needed, you can try to set ReadOptions.total_order_seek = true to disable prefix iterating.
在' options.prefix_extractor '中有一些限制。如果使用前綴迭代,不支持' Prev() '或' SeekToLast() ',許多操作也不支持' SeekToFirst() '。一個常見的錯誤是通過調(diào)用' seek() '和' Prev() '來查找前綴的最后一個鍵。但是,這并不支持。目前還無法通過前綴迭代找到前綴的最后一個鍵。此外,在完成了所尋求的前綴之后,您不能繼續(xù)迭代鍵。在需要這些操作的地方,你可以嘗試設(shè)置“ReadOptions”。Total_order_seek = true '禁用前綴迭代。
Q: If Put() or Write() is called with WriteOptions.sync=true, does it mean all previous writes are persistent too?
如果' Put() '或' Write() '調(diào)用'WriteOptions。Sync=true',是否意味著所有之前的寫操作都是持久的?
A: Yes, but only for all previous writes with WriteOptions.disableWAL=false.
是的,但只適用于所有以前寫入' writeoptions . disableWAL =false '。
Q: I disabled write-ahead-log and rely on DB::Flush() to persist the data. It works well for single family. Can I do the same if I have multiple column families?
我禁用了WAL,并依賴于' DB::Flush() '來持久化數(shù)據(jù)。它適用于單個column family。如果我有多個列族,我能做同樣的事情嗎?
A: Yes. Set option.atomic_flush=true to enable atomic flush across multiple column families.
是的。設(shè)置”選項。Atomic_flush =true '啟用跨多個列族的原子刷新。
Q: What's the best way to delete a range of keys?
刪除一系列鍵的最好方法是什么?
A: See https://github.com/facebook/rocksdb/wiki/DeleteRange .
參見https://github.com/facebook/rocksdb/wiki/DeleteRange。
Q: What are column families used for?
列族的用途是什么?
A: The most common reasons of using column families:
使用列族最常見的原因是:
- Use different compaction setting, comparators, compression types, merge operators, or compaction filters in different parts of data
在數(shù)據(jù)的不同部分使用不同的壓縮設(shè)置、比較器、壓縮類型、合并操作符或壓縮過濾器 - Drop a column family to delete its data
刪除列族可刪除其數(shù)據(jù) - One column family to store metadata and another one to store the data.
一個列族用于存儲元數(shù)據(jù),另一個列族用于存儲數(shù)據(jù)。
Q: What's the difference between storing data in multiple column family and in multiple rocksdb database?
將數(shù)據(jù)存儲在多個列族和多個rocksdb數(shù)據(jù)庫之間的區(qū)別是什么?
A: The main differences will be backup, atomic writes and performance of writes. The advantage of using multiple databases: database is the unit of backup or checkpoint. It's easier to copy a database to another host than a column family. Advantages of using multiple column families:
主要的區(qū)別在于備份、原子寫入和寫入的性能。使用多數(shù)據(jù)庫的優(yōu)點(diǎn):數(shù)據(jù)庫是備份或檢查點(diǎn)的單位。將數(shù)據(jù)庫復(fù)制到另一個主機(jī)上比將列族復(fù)制到另一個主機(jī)上更容易。使用多列族的優(yōu)點(diǎn):
- write batches are atomic across multiple column families on one database. You can't achieve this using multiple RocksDB databases
在一個數(shù)據(jù)庫上的多個列族之間,batch寫入是原子的。你無法通過使用多個RocksDB數(shù)據(jù)庫實(shí)現(xiàn)這一點(diǎn) - If you issue sync writes to WAL, too many databases may hurt the performance.
如果向WAL發(fā)出同步寫操作,過多的數(shù)據(jù)庫可能會損害性能。
Q: Is RocksDB really “l(fā)ockless” in reads?
RocksDB的讀取真的是“無鎖”的嗎?
A: Reads might hold mutex in the following situations:
在以下情況下,讀取可能會持有互斥鎖:
- access the sharded block cache
訪問分片塊緩存 - access table cache if
options.max_open_files != -1
訪問表緩存如果'選項。max_open_files ! = 1 - if a read happens just after flush or compaction finishes, it may briefly hold the global mutex to fetch the latest metadata of the LSM tree.
如果讀取發(fā)生在刷新或壓縮完成之后,它可能會短暫地持有全局互斥,以獲取LSM樹的最新元數(shù)據(jù)。 - the memory allocators RocksDB relies on (e.g. jemalloc), may sometimes hold locks. These locks are only held rarely, or in fine granularity.
RocksDB所依賴的內(nèi)存分配器(如jemalloc)有時會持有鎖。這些鎖很少被持有,或者以很細(xì)的粒度持有。
Q: If I update multiple keys, should I issue multiple Put(), or put them in one write batch and issue Write()?
如果我更新多個鍵,我應(yīng)該發(fā)出多個' Put() ',或把它們放在一個寫批處理和發(fā)出' write () ' ?
A: Using WriteBatch() to batch more keys usually performs better than single Put().
使用' WriteBatch() '來批處理更多的鍵通常比單一的' Put() '性能更好。
Q: What's the best practice to iterate all the keys?
什么是迭代所有鍵的最佳實(shí)踐?
A: If it's a small or read-only database, just create an iterator and iterate all the keys. Otherwise consider to recreate iterators once a while, because an iterator will hold all the resources from being released. If you need to read from consistent view, create a snapshot and iterate using it.
如果是小型或只讀數(shù)據(jù)庫,只需創(chuàng)建一個迭代器并迭代所有鍵。否則,考慮每隔一段時間重新創(chuàng)建一次迭代器,因為迭代器將保存所有被釋放的資源。如果需要從一致的視圖中讀取,則創(chuàng)建快照并使用它進(jìn)行迭代。
Q: I have different key spaces. Should I separate them using prefixes, or use different column families?
我有不同的鍵空間。我應(yīng)該使用前綴來分隔它們,還是使用不同的列族?
A: If each key space is reasonably large, it's a good idea to put them in different column families. If it can be small, then you should consider to pack multiple key spaces into one column family, to avoid the trouble of maintaining too many column families.
如果每個鍵空間都相當(dāng)大,那么最好將它們放在不同的列族中。如果它可以很小,那么您應(yīng)該考慮將多個鍵空間打包到一個列族中,以避免維護(hù)太多列族的麻煩。
Q: Is the performance of iterator Next() the same as Prev()?
迭代器的性能' Next() '與' Prev() '相同?
A: The performance of reversed iteration is usually much worse than forward iteration. There are various reasons for that:
反向迭代的性能通常比前向迭代差得多。原因有很多:
- delta encoding in data blocks is more friendly to
Next()
數(shù)據(jù)塊中的增量編碼對' Next() '更友好 - the skip list used in the memtable is single-direction, so
Prev()is another binary search
memtable中使用的skip list是單方向的,所以' Prev() '是另一個二分查找 - the internal key order is optimized for
Next().
內(nèi)部鍵序為' Next() '進(jìn)行了優(yōu)化。
Q: If I want to retrieve 10 keys from RocksDB, is it better to batch them and use MultiGet() versus issuing 10 individual Get() calls?
如果我想從RocksDB獲取10個鍵值,那么是使用“MultiGet()”批量處理它們,還是使用10個單獨(dú)的“Get()”調(diào)用呢?
A: There are potential performance benefits in using MultiGet(). See https://github.com/facebook/rocksdb/wiki/MultiGet-Performance .
使用“MultiGet()”有潛在的性能優(yōu)勢。參見https://github.com/facebook/rocksdb/wiki/MultiGet-Performance
Q: If I have multiple column families and call the DB functions without a column family handle, what the result will be?
如果我有多個列族,并調(diào)用DB函數(shù)沒有列族句柄,結(jié)果將是什么?
A: It will operate only the default column family.
它將只操作默認(rèn)的列族。
Q: Can I reuse ReadOptions, WriteOptions, etc, across multiple threads?
我可以重用' ReadOptions ', ' WriteOptions '等,跨多個線程?
A: As long as they are const, you are free to reuse them.
只要它們是const,您就可以自由地重用它們。
Feature Support
Q: Can I cancel a specific compaction?
我可以取消特定的壓縮嗎?
A: No, you can't cancel one specific compaction.
不,你不能取消一個特定的壓縮。
Q: Can I close the DB when a manual compaction is in progress?
當(dāng)手動壓縮正在進(jìn)行時,我可以關(guān)閉數(shù)據(jù)庫嗎?
A: No, it's not safe to do that. However, you call CancelAllBackgroundWork(db, true) in another thread to abort the running compactions, so that you can close the DB sooner. Since 6.5, you can also speed it up using DB::DisableManualCompaction().
不,那樣做不安全。但是,你可以在另一個線程中調(diào)用CancelAllBackgroundWork(db, true)來中止正在運(yùn)行的壓縮,這樣你就可以更快地關(guān)閉db。從6.5開始,你也可以使用' DB::DisableManualCompaction() '來加速它。
Q: Is it safe to directly copy an open RocksDB instance?
直接復(fù)制打開的RocksDB實(shí)例安全嗎?
A: No, unless the RocksDB instance is opened in read-only mode.
不可以,除非RocksDB實(shí)例以只讀模式打開。
Q: Does RocksDB support replication?
RocksDB是否支持復(fù)制?
A: No, RocksDB does not directly support replication. However, it offers some APIs that can be used as building blocks to support replication. For instance, GetUpdatesSince() allows developers to iterate though all updates since a specific point in time.
See https://github.com/facebook/rocksdb/wiki/Replication-Helpers
不,RocksDB并不直接支持復(fù)制。但是,它提供了一些api,可以用作支持復(fù)制的構(gòu)建塊。例如,' GetUpdatesSince() '允許開發(fā)人員迭代自特定時間點(diǎn)以來的所有更新。
參見https://github.com/facebook/rocksdb/wiki/Replication-Helpers
Q: Does RocksDB support group commit?
RocksDB是否支持group commit?
A: Yes. Multiple write requests issued by multiple threads may be grouped together. One of the threads writes WAL log for those write requests in one single write request and fsync once if configured.
是的。由多個線程發(fā)出的多個寫請求可以組合在一起。其中一個線程在一個寫請求中為這些寫請求寫WAL日志,如果配置了fsync,則會寫一次。
Q: Is it possible to scan/iterate over keys only? If so, is that more efficient than loading keys and values?
是否可能只掃描/迭代key?如果是這樣,是不是比加載鍵和值更有效?
A: No it is usually not more efficient. RocksDB's values are normally stored inline with keys. When a user iterates over the keys, the values are already loaded in memory, so skipping the value won't save much. In BlobDB, keys and large values are stored separately so it maybe beneficial to only iterate keys, but it is not supported yet. We may add the support in the future.
不,它通常不會更有效率。RocksDB的值通常與鍵一起存儲。當(dāng)用戶遍歷鍵時,值已經(jīng)加載到內(nèi)存中,因此跳過值不會節(jié)省太多。在BlobDB中,鍵和大值是分開存儲的,所以只迭代鍵可能會有好處,但目前還不支持。我們以后可能會增加支持。
Q: Is the transaction object thread-safe?
事務(wù)對象是否線程安全?
A: No it's not. You can't issue multiple operations to the same transaction concurrently. (Of course, you can execute multiple transactions in parallel, which is the point of the feature.)
不是的。您不能同時向同一事務(wù)發(fā)出多個操作。(當(dāng)然,您可以并行地執(zhí)行多個事務(wù),這是該特性的重點(diǎn)。)
Q: After iterator moves away from a key/value, is the memory pointed by those key/value still kept?
當(dāng)?shù)鲝囊粋€鍵/值移動后,這些鍵/值指向的內(nèi)存仍然保留嗎?
A: No, they can be freed, unless you set ReadOptions.pin_data = true and your setting supports this feature.
不,它們可以被釋放,除非你設(shè)置了“ReadOptions”。Pin_data = true ',您的設(shè)置支持此特性。
Q: Can I programmatically read data from an SST file?
我可以通過編程方式從SST文件讀取數(shù)據(jù)嗎?
A: We don't support it right now. But you can dump the data using sst_dump. Since version 6.5, you'll be able to do it using SstFileReader.
我們現(xiàn)在不支持它。但是您可以使用' sst_dump '轉(zhuǎn)儲數(shù)據(jù)。從6.5版開始,您就可以使用SstFileReader來實(shí)現(xiàn)這個功能。
Q: RocksDB repair: when can I use it? Best-practices?
RocksDB repair:我什么時候可以使用它?最佳實(shí)踐?
A: Check https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer
參見https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer
Configuration and Tuning
Q: What's the default value of the block cache?
塊緩存的默認(rèn)值是多少?
A: 8MB. That's too low for most use cases, so it's likely that you need to set your own value.
8 mb。對于大多數(shù)用例來說,這個值都太低了,所以您可能需要設(shè)置自己的值。
Q: Are bloom filter blocks of SST files always loaded to memory, or can they be loaded from disk?
Are bloom filter blocks of SST files always loaded to memory, or can they be loaded from disk?
A: The behavior is configurable. When BlockBaseTableOptions::cache_index_and_filter_blocks is set to true, then bloom filters and index block will be loaded into a LRU cache only when related Get() requests are issued. In the other case where cache_index_and_filter_blocks is set to false, then RocksDB will try to keep the index block and bloom filter in memory up to DBOptions::max_open_files number of SST files.
該行為是可配置的。當(dāng)' BlockBaseTableOptions::cache_index_and_filter_blocks '設(shè)置為true時,bloom過濾器和索引塊將被加載到LRU緩存中,只有當(dāng)相關(guān)的' Get() '請求被發(fā)出時。在另一種情況下,' cache_index_and_filter_blocks '被設(shè)置為false,那么RocksDB會嘗試將索引塊和bloom過濾器在內(nèi)存中保持在' DBOptions::max_open_files '的SST文件數(shù)量。
Q: Is it safe to configure different prefix extractor for different column family?
為不同的列族配置不同的前綴提取器安全嗎?
A: Yes.
安全。
Q: Can I change the prefix extractor?
我可以更改前綴提取器嗎
A: No. Once you've specified a prefix extractor, you cannot change it. However, you can disable it by specifying a null value.
不。一旦指定了前綴提取器,就不能更改它。但是,您可以通過指定一個空值禁用它。
Q: How to configure RocksDB to use multiple disks?
如何配置RocksDB使用多個磁盤?
A: You can create a single filesystem (ext3, xfs, etc.) on multiple disks. Then, you can run RocksDB on that single file system.
您可以在多個磁盤上創(chuàng)建單個文件系統(tǒng)(ext3、xfs等)。然后,你可以在單一的文件系統(tǒng)上運(yùn)行RocksDB。
Some tips when using disks:
使用磁盤時的一些提示:
- If RAID is used, use larger RAID stripe size (64kb is too small, 1MB would be excellent).
如果使用RAID,則使用更大的RAID條帶(64kb太小,1MB最好)。 - Consider enabling compaction read-ahead by specifying
ColumnFamilyOptions::compaction_readahead_sizeto at least 2MB.
考慮通過指定' ColumnFamilyOptions::compaction_readahead_size '至少為2MB來啟用壓縮預(yù)讀。 - If workload is write-heavy, have enough compaction threads to keep the disks busy
如果工作負(fù)載寫量大,則需要有足夠的壓縮線程來保持磁盤繁忙 - Consider enabling async write behind for compaction
考慮啟用異步寫后置壓縮
Q: Can I open RocksDB with a different compression type and still read old data?
我是否可以使用不同的壓縮類型打開RocksDB并讀取舊數(shù)據(jù)?
A: Yes, since RocksDB stored the compression information in each SST file and performs decompression accordingly, you can change the compression and the db will still be able to read existing files. In addition, you can also specify a different compression for the last level by specifying ColumnFamilyOptions::bottommost_compression.
是的,因為RocksDB將壓縮信息存儲在每個SST文件中,并進(jìn)行相應(yīng)的解壓縮,所以你可以改變壓縮方式,而db仍然能夠讀取現(xiàn)有的文件。此外,您還可以通過指定' ColumnFamilyOptions::bottommost_compression '為最后一層指定不同的壓縮。
Q: Can I put log files and sst files in different directories? How about information logs?
我可以把日志文件和sst文件放在不同的目錄嗎?信息日志呢?
A: Yes. WAL files can be placed in a separate directory by specifying DBOptions::wal_dir, information logs can as well be written in a separate directory by using DBOptions::db_log_dir.
是的??梢酝ㄟ^指定' DBOptions::wal_dir '將WAL文件放在一個單獨(dú)的目錄中,也可以使用' DBOptions::db_log_dir '將信息日志寫入一個單獨(dú)的目錄中。
Q: If I use non-default comparators or merge operators, can I still use ldb tool?
如果我使用非默認(rèn)的比較器或合并操作符,我還可以使用' ldb '工具嗎?
A: You cannot use the regular ldb tool in this case. However, you can build your custom ldb tool by passing your own options using this function rocksdb::LDBTool::Run(argc, argv, options) and compile it.
在這種情況下,你不能使用常規(guī)的“l(fā)db”工具。然而,你可以通過傳遞你自己的選項來構(gòu)建你的自定義' ldb '工具,使用這個函數(shù)' rocksdb::LDBTool::Run(argc, argv, options) '并編譯它。
Q: What will happen if I open RocksDB with a different compaction style?
如果我用不同的compaction style打開RocksDB會發(fā)生什么?
A: When opening a RocksDB database with a different compaction style or compaction settings, one of the following scenarios will happen:
答:當(dāng)你打開一個RocksDB數(shù)據(jù)庫,使用不同的壓縮樣式或壓縮設(shè)置時,會出現(xiàn)以下情況:
- The database will refuse to open if the new configuration is incompatible with the current LSM layout.
如果新配置與當(dāng)前LSM布局不兼容,則數(shù)據(jù)庫將拒絕打開。 - If the new configuration is compatible with the current LSM layout, then RocksDB will continue and open the database. However, in order to make the new options take full effect, it might require a full compaction.
如果新的配置與當(dāng)前的LSM布局兼容,那么RocksDB將繼續(xù)并打開數(shù)據(jù)庫。但是,為了使新選項完全生效,可能需要一次full compaction。
Consider to use the migration helper function OptionChangeMigration(), which will compact the files to satisfy the new compaction style if needed.
考慮使用遷移幫助函數(shù)' OptionChangeMigration() ',如果需要,它將壓縮文件以滿足新的壓縮樣式。
Q: Does RocksDB have columns? If it doesn't have column, why there are column families?
RocksDB是否有columns?如果它沒有columns,為什么會有column families?
A: No, RocksDB doesn't have columns. See https://github.com/facebook/rocksdb/wiki/Column-Families for what is column family.
不,RocksDB沒有columns。請參閱https://github.com/facebook/rocksdb/wiki/Column-Families了解什么是column family。
Q: How to estimate space can be reclaimed If I issue a full manual compaction?
如何估計空間可以回收,如果我發(fā)出一個完整的手動壓縮?
A: There is no easy way to predict it accurately, especially when there is a compaction filter. If the database size is steady, DB property rocksdb.estimate-live-data-size is the best estimation.
沒有簡單的方法來準(zhǔn)確預(yù)測它,特別是當(dāng)有一個壓縮過濾器時。如果數(shù)據(jù)庫大小是穩(wěn)定的,那么DB屬性' rocksdb.estimate-live-data-size '是最好的估計。
Q: What's the difference between a snapshot, a checkpoint and a backup?
snapshot、checkpoint和backup之間的區(qū)別是什么?**
A: Snapshot is a logical concept. Users can query data using program interface, but underlying compactions still rewrite existing files.
快照是一個邏輯概念。用戶可以使用程序接口查詢數(shù)據(jù),但底層壓縮仍然重寫現(xiàn)有文件。
A checkpoint will create a physical mirror of all the database files using the same Env. This operation is very cheap if the file system hard-link can be used to create mirrored files.
檢查點(diǎn)將使用相同的' Env '創(chuàng)建所有數(shù)據(jù)庫文件的物理鏡像。如果文件系統(tǒng)硬鏈接可以用來創(chuàng)建鏡像文件,那么這個操作是非常容易的。
A backup can move the physical database files to another Env (like HDFS). The backup engine also supports incremental copy between different backups.
備份可以將物理數(shù)據(jù)庫文件移動到另一個' Env '(如HDFS)。備份引擎還支持不同備份之間的增量拷貝。
Q: Which compression type should I use?
我應(yīng)該使用哪種壓縮類型?
A: Start with LZ4 (or Snappy, if LZ4 is not available) for all levels for good performance. If you want to further reduce data size, try to use ZStandard (or Zlib, if ZStandard is not available) in the bottommost level. See https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#compression
從LZ4(或者Snappy,如果LZ4不可用的話)開始,在所有l(wèi)evel中都能獲得良好的性能。如果您想進(jìn)一步減少數(shù)據(jù)大小,請嘗試在最底層使用ZStandard(或者Zlib,如果ZStandard不可用的話)。參考https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#compression
Q: Is compaction needed if no key is deleted or overwritten?
如果沒有刪除或覆蓋鍵,是否需要壓縮?
A: Even if there is no need to clear out-of-date data, compaction is needed to ensure read performance.
即使不需要清除過時的數(shù)據(jù),也需要壓縮以確保讀取性能。
Q: After a write following option.disableWAL=true, I write another record with options.sync=true, will it persist the previous write too?
在write 操作中設(shè)置 'option.disableWAL=true ',參數(shù)為options.sync=true,它會堅持之前的寫嗎?
A: No. After the program crashes, writes with option.disableWAL=true will be lost, if they are not flushed to SST files.
不。在程序崩潰后,使用option.disableWAL=true'寫入內(nèi)容將丟失,如果它們沒有被刷新到SST文件。
Q: What is options.target_file_size_multiplier useful for?
options.target_file_size_multiplier的作用?
A: It's a rarely used feature. For example, you can use it to reduce the number of the SST files.
這是一個很少使用的功能。例如,可以減少SST文件的數(shù)量。
Q: I observed burst write I/Os. How can I eliminate that?
我觀察到了寫I/O的突發(fā)。我如何消除它?
A: Try to use the rate limiter: See https://github.com/facebook/rocksdb/wiki/Rate-Limiter
嘗試使用 rate limiter : 請參閱https://github.com/facebook/rocksdb/wiki/Rate-Limiter
Q: Can I change the compaction filter without reopening the DB?
我可以在不重新打開數(shù)據(jù)庫的情況下更改壓縮過濾器嗎?
A: It's not supported. However, you can achieve it by implementing your CompactionFilterFactory which returns different compaction filters.
它不支持。然而,你可以通過實(shí)現(xiàn)你的“CompactionFilterFactory”來實(shí)現(xiàn)它,它會返回不同的壓縮過濾器。
Q: How many column families can a single db support?
一個db可以支持多少column families?
A: Users should be able to run at least thousands of column families without seeing any error. However, too many column families don't usually perform well. We don't recommend users to use more than a few hundreds of column families.
用戶應(yīng)該能夠運(yùn)行至少數(shù)千個列族而不會看到任何錯誤。但是,太多的列族通常都不能很好地執(zhí)行。我們不建議用戶使用超過幾百個列族。
Q: Can I reuse DBOptions or ColumnFamilyOptions to open multiple DBs or column families?
我可以重復(fù)使用' DBOptions '或' ColumnFamilyOptions '打開多個數(shù)據(jù)庫或column families?
A: Yes. Internally, RocksDB always makes a copy to those options, so you can freely change them and reuse these objects.
是的。在內(nèi)部,RocksDB總是會復(fù)制這些選項,所以你可以自由地更改它們并重用這些對象。
Portability
Q: Can I run RocksDB and store the data on HDFS?
我可以運(yùn)行RocksDB并將數(shù)據(jù)存儲在HDFS上嗎
A: Yes, by using the Env returned by NewHdfsEnv(), RocksDB will store data on HDFS. However, the file lock is currently not supported in HDFS Env.
是的,通過使用' NewHdfsEnv() '返回的Env, RocksDB會將數(shù)據(jù)存儲在HDFS上。但是,HDFS Env目前不支持文件鎖。
Q: Does RocksJava support all the features?
RocksJava支持所有的特性嗎?
A: We are working toward making RocksJava feature compatible. However, you're more than welcome to submit pull request if you find something is missing
我們正在努力使RocksJava特性兼容。然而,如果你發(fā)現(xiàn)一些東西是缺失的,歡迎發(fā)起pull request?
Backup
Q: Can I preserve a “snapshot” of RocksDB and later roll back the DB state to it?
我是否可以保留RocksDB的“快照”,然后回滾到它的DB狀態(tài)?
A: Yes, via the BackupEngine or [[Checkpoints]].
是的,通過BackupEngine或[[檢查點(diǎn)]]。
Q: Does BackupableDB create a point-in-time snapshot of the database?
' BackupableDB '是否創(chuàng)建數(shù)據(jù)庫的時間點(diǎn)快照?
A: Yes when BackupOptions::backup_log_files = true or flush_before_backup = true when calling CreateNewBackup().
是的。當(dāng)BackupOptions::backup_log_files = true 或者 flush_before_backup = true 會調(diào)用 CreateNewBackup().
Q: Does the backup process affect accesses to the database in the mean while?
備份過程是否影響對數(shù)據(jù)庫的訪問?
A: No, you can keep reading and writing to the database at the same time.
不,您可以同時對數(shù)據(jù)庫進(jìn)行讀寫操作。
Q: How can I configure RocksDB to backup to HDFS?
如何配置RocksDB備份到HDFS?
A: Use BackupableDB and set backup_env to the return value of NewHdfsEnv().
使用' BackupableDB '并將backup_env設(shè)置為' NewHdfsEnv() '的返回值。
Failure Handling
Q: Does RocksDB throw exceptions?
RocksDB是否會拋出異常?
A: No, RocksDB returns rocksdb::Status to indicate any error. However, RocksDB does not catch exceptions thrown by STL or other dependencies. For instance, so it's possible that you will see std::bad_malloc when memory allocation fails, or similar exceptions in other situations.
不,RocksDB返回' RocksDB::Status '來表示任何錯誤。然而,RocksDB并不捕捉STL或其他依賴項拋出的異常。例如,當(dāng)內(nèi)存分配失敗時,您可能會看到' std::bad_malloc ',或者在其他情況下出現(xiàn)類似的異常。
Q: How RocksDB handles read or write I/O errors?
RocksDB如何處理讀寫I/O錯誤?
A: If the I/O errors happen in the foreground operations such as Get() and Write(), then RocksDB will return rocksdb::IOError status. If the error happens in background threads and options.paranoid_checks=true, we will switch to the read-only mode. All the writes will be rejected with the status code representing the background error.
如果I/O錯誤發(fā)生在前臺操作,比如' Get() '和' Write() ',那么RocksDB將返回' RocksDB::IOError '狀態(tài)。如果錯誤發(fā)生在后臺線程和'選項。id_checks=true ',我們將切換到只讀模式。所有的寫操作都將被拒絕,狀態(tài)碼表示后臺錯誤。
Q: How to distinguish type of exceptions thrown by RocksJava?
如何區(qū)分RocksJava拋出的異常類型?
A: Yes, RocksJava throws RocksDBException for every RocksDB related exceptions.
是的,RocksJava會對每一個RocksDB相關(guān)的異常拋出' RocksDBException '。
Failure Recovery
Q: If my process crashes, can it corrupt the database?
如果我的進(jìn)程崩潰,它會破壞數(shù)據(jù)庫嗎?
A: No, but data in the un-flushed memtables might be lost if [[Write Ahead Log]] (WAL) is disabled.
沒有,但是如果[[Write Ahead Log]] (WAL)被禁用,未刷新memtable中的數(shù)據(jù)可能會丟失。
Q: If my machine crashes and rebooted, will RocksDB preserve the data?
如果我的機(jī)器崩潰并重新啟動,RocksDB會保存數(shù)據(jù)嗎?
A: Data is synced when you issue a sync write (write with WriteOptions.sync=true), call DB::SyncWAL(), or when memtables are flushed.
當(dāng)你發(fā)出一個同步寫入(寫入' WriteOptions.sync=true '),調(diào)用' DB::SyncWAL() ',或者當(dāng)memtables被刷新時,數(shù)據(jù)是同步的。
Q: How to know the number of keys stored in a RocksDB database?
如何知道RocksDB數(shù)據(jù)庫中存儲的鍵的數(shù)量?
A: Use GetIntProperty(cf_handle, "rocksdb.estimate-num-keys") to obtain an estimated number of keys stored in a column family, or use GetAggregatedIntProperty(“rocksdb.estimate-num-keys", &num_keys) to obtain an estimated number of keys stored in the whole RocksDB database.
使用' GetIntProperty(cf_handle, " RocksDB . estimated -num-keys") '來獲取一個列族中存儲的鍵的估計數(shù)量,或者使用' GetAggregatedIntProperty(" RocksDB . estimated -num-keys", &num_keys)來獲取整個RocksDB數(shù)據(jù)庫中存儲的鍵的估計數(shù)量。
Q: Why GetIntProperty can only return an estimated number of keys in a RocksDB database?
為什么gettintproperty只能返回RocksDB數(shù)據(jù)庫中估計的鍵數(shù)?**
A: Obtaining an accurate number of keys in any LSM databases like RocksDB is a challenging problem as they have duplicate keys and deletion entries (i.e., tombstones) that will require a full compaction in order to get an accurate number of keys. In addition, if the RocksDB database contains merge operators, it will also make the estimated number of keys less accurate.
在任何LSM數(shù)據(jù)庫(如RocksDB)中獲取準(zhǔn)確的鍵數(shù)都是一個具有挑戰(zhàn)性的問題,因為它們有重復(fù)的鍵和刪除條目(即tombstone),需要進(jìn)行完整的壓縮才能獲得準(zhǔn)確的鍵數(shù)。此外,如果RocksDB數(shù)據(jù)庫包含合并操作符,也會降低估計鍵數(shù)的準(zhǔn)確性。
Resource Management
Q: How much resource does an iterator hold and when will these resource be released?
迭代器持有多少資源,何時釋放這些資源?
A: Iterators hold both data blocks and memtables in memory. The resource each iterator holds are:
迭代器在內(nèi)存中同時保存數(shù)據(jù)塊和memtable。每個迭代器持有的資源是:
- The data blocks that the iterator is currently pointing to. See https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#blocks-pinned-by-iterators
迭代器當(dāng)前指向的數(shù)據(jù)塊。看到https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB blocks-pinned-by-iterators - The memtables that existed when the iterator was created, even after the memtables have been flushed.
在創(chuàng)建迭代器時存在的memtable,即使memtable已經(jīng)被刷新。 - All the SST files on disk that existed when the iterator was created, even if they are compacted.
創(chuàng)建迭代器時磁盤上存在的所有SST文件,即使它們被壓縮。
These resources will be released when the iterator is deleted.
當(dāng)?shù)鞅粍h除時,這些資源將被釋放。
Q: How to estimate total size of index and filter blocks in a DB?
如何估計索引和過濾器塊的總大小?
A: For an offline DB, "sst_dump --show_properties --command=none" will show you the index and filter size for a specific sst file. You can sum them up for all DB. For a running DB, you can fetch from DB property kAggregatedTableProperties. Or calling DB::GetPropertiesOfAllTables() and sum up the index and filter block size of individual files.
對于離線DB, ' "sst_dump——show_properties——command=none"將顯示特定sst文件的索引和過濾器大小。你可以把所有的DB加起來。對于一個運(yùn)行中的DB,你可以從DB屬性' kAggregatedTableProperties '中獲取?;蛘哒{(diào)用' DB::GetPropertiesOfAllTables() ',并將單個文件的索引和過濾塊大小相加。
Q: Can RocksDB tell us the total number of keys in the database? Or the total number of keys within a range?
RocksDB能否告訴我們數(shù)據(jù)庫中key的總數(shù)?或者一個范圍內(nèi)鍵的總數(shù)?**
A: RocksDB can estimate number of keys through DB property “rocksdb.estimate-num-keys”. Note this estimation can be far off when there are merge operators, existing keys overwritten, or deleting non-existing keys.
RocksDB可以通過數(shù)據(jù)庫屬性“RocksDB .estimate-num-keys”來估計鍵的數(shù)量。注意,當(dāng)存在合并操作符、覆蓋現(xiàn)有鍵或刪除非現(xiàn)有鍵時,這種估計可能相差很遠(yuǎn)。
The best way to estimate total number of keys within a range is to first estimate size of a range by calling DB::GetApproximateSizes(), and then estimate number of keys from that.
估計一個范圍內(nèi)鍵總數(shù)的最佳方法是,首先通過調(diào)用' DB::GetApproximateSizes(),估計一個范圍的大小,然后據(jù)此估計鍵的數(shù)量。
Others
Q: Who is using RocksDB?
誰在使用RocksDB?
A: https://github.com/facebook/rocksdb/blob/main/USERS.md
Q: How should I implement multiple data shards/partitions.
我應(yīng)該如何實(shí)現(xiàn)多個數(shù)據(jù)碎片/分區(qū)。
A: You can use one RocksDB database per shard/partition. Multiple RocksDB instances could be run as separate processes or within a single process. When multiple instances of RocksDB are used within the single process, some resources (like thread pool, block cache, rate limiter etc..) could be shared between those RocksDB instances (See https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#support-for-multiple-embedded-databases-in-the-same-process)
你可以在每個shard/partition中使用一個RocksDB數(shù)據(jù)庫。多個RocksDB實(shí)例可以作為單獨(dú)的進(jìn)程運(yùn)行,也可以在單個進(jìn)程中運(yùn)行。當(dāng)一個進(jìn)程中使用多個RocksDB實(shí)例時,一些資源(如線程池、塊緩存、速率限制等)可以在這些RocksDB實(shí)例之間共享(見https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#support-for-multiple-embedded-databases-in-the-same-process)。
Q: DB operations fail because of out-of-space. How can I unblock myself?
DB操作失敗,原因是空間不足。我怎樣才能解除?
A: First clear up some free space. The DB will automatically start accepting operations once enough free space is available. The only exception is if 2PC is enabled and the WAL sync fails (in this case, the DB needs to be reopened). See [[Background Error Handling]] for more details.
首先清理一些空閑空間。一旦有足夠的可用空間,DB將自動開始接受操作。唯一的例外是,如果2PC被啟用,WAL同步失敗(在這種情況下,DB需要重新打開)。詳見[[后臺錯誤處理]]。