Storage Format

文檔簡介(0.9.0)

Data in Druid is stored in a custom column format known as a segment. Segments are composed of different types of columns. Column.java and the classes that extend it is a great place to looking into the storage format.

基本類

ValueType

枚舉類,包含四個可選項:

  1. Float
  2. Long
  3. String
  4. Complex

IndexedInts

主要有三個方法:

int size();
int get(int index);
void fill(int index, int[] toFill);

實現(xiàn)類主要有:

  1. EmptyIndexedInts
  2. IntBufferIndexedInts
  3. ListBasedIndexedInts
  4. VSizeIndexedInts

size() 指的是該 Buffer 下還有多少個元素可讀或可寫;
get(index) 讀取該 Buffer 下的 index 個元素;
fill()將對應(yīng)的 Channel 數(shù)據(jù)填充到該 Buffer,目前都不支持該方法.
其中,ListBasedIndexedInts采用的存儲是 List<Integer>.
可以看出,部分是采用 Java NIO 操作 native memory.

ColumnCapabilities

屬性:

private ValueType type = null;
private boolean dictionaryEncoded = false;  // 是否字典編碼
private boolean runLengthEncoded = false;  // 是否 runLength 編碼,runLength 是虛構(gòu)的,可忽略
private boolean hasInvertedIndexes = false;  // 是否倒排索引
private boolean hasSpatialIndexes = false;  // 是否稀疏索引
private boolean hasMultipleValues = false;  // 是否有多值

DictionaryEncodedColumn

基本方法:

public int length();  // 一個字典編碼列的總長度
public boolean hasMultipleValues();  // 是否有多值的情況
public int getSingleValueRow(int rowNum);  // 獲取某行的單值
public IndexedInts getMultiValueRow(int rowNum);  // 獲取某行的多值
public String lookupName(int id);  // 通過 id 索引獲取對應(yīng)行的值,注意,null and empty 都會轉(zhuǎn)化成 null
public int lookupId(String name);  // 
public int getCardinality();  // 獲取基數(shù),字典長度

唯一實現(xiàn)類SimpleDictionaryEncodedColumn,有三個屬性:

private final IndexedInts column;
private final IndexedMultivalue<IndexedInts> multiValueColumn;
private final CachingIndexed<String> cachedLookups;

其中有意思的是 cachedLookups,存儲的是字典。

CachingIndexed

字典的具體實現(xiàn)類,實現(xiàn)了 Indexed接口,其它的實現(xiàn)類主要有:

  1. GenericIndexed
  2. ArrayIndexed
  3. BufferIndexed
  4. ListIndexed
  5. VSizeIndexed

CachingIndexed 是 wrapping a given GenericIndexed,同時使用一個 LRUMap SizedLRUMap<Integer, T>來存儲 cachedValues.

GenericIndexed

A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
V1 Storage Format:

  • byte 1: version (0x1)
  • byte 2 == 0x1 => allowReverseLookup
  • bytes 3-6 => numBytesUsed
  • bytes 7-10 => numElements
  • bytes 10-((numElements * 4) + 10): integers representing 'end' offsets of byte serialized values
  • bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value

屬性有:

private final ByteBuffer theBuffer;  // 內(nèi)置的 ByteBuffer 存儲
private final ObjectStrategy<T> strategy;
private final boolean allowReverseLookup;
private final int size;  // theBuffer 的當(dāng)前 int 值
private final int valuesOffset;
private final BufferIndexed bufferIndexed;  // 內(nèi)部類, BufferIndexed

Column 類

接口,詳見實現(xiàn)類

SimpleColumn 類

屬性:


private final ColumnCapabilitiescapabilities;

private final SupplierdictionaryEncodedColumn;

private final SupplierrunLengthColumn;

private final SuppliergenericColumn;

private final SuppliercomplexColumn;

private final SupplierbitmapIndex;

private final SupplierspatialIndex;

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容