文檔簡介(0.9.0)
Data in Druid is stored in a custom column format known as a segment. Segments are composed of different types of columns. Column.java and the classes that extend it is a great place to looking into the storage format.
基本類
ValueType
枚舉類,包含四個可選項:
- Float
- Long
- String
- Complex
IndexedInts
主要有三個方法:
int size();
int get(int index);
void fill(int index, int[] toFill);
實現(xiàn)類主要有:
- EmptyIndexedInts
- IntBufferIndexedInts
- ListBasedIndexedInts
- VSizeIndexedInts
size() 指的是該 Buffer 下還有多少個元素可讀或可寫;
get(index) 讀取該 Buffer 下的 index 個元素;
fill()將對應(yīng)的 Channel 數(shù)據(jù)填充到該 Buffer,目前都不支持該方法.
其中,ListBasedIndexedInts采用的存儲是 List<Integer>.
可以看出,部分是采用 Java NIO 操作 native memory.
ColumnCapabilities
屬性:
private ValueType type = null;
private boolean dictionaryEncoded = false; // 是否字典編碼
private boolean runLengthEncoded = false; // 是否 runLength 編碼,runLength 是虛構(gòu)的,可忽略
private boolean hasInvertedIndexes = false; // 是否倒排索引
private boolean hasSpatialIndexes = false; // 是否稀疏索引
private boolean hasMultipleValues = false; // 是否有多值
DictionaryEncodedColumn
基本方法:
public int length(); // 一個字典編碼列的總長度
public boolean hasMultipleValues(); // 是否有多值的情況
public int getSingleValueRow(int rowNum); // 獲取某行的單值
public IndexedInts getMultiValueRow(int rowNum); // 獲取某行的多值
public String lookupName(int id); // 通過 id 索引獲取對應(yīng)行的值,注意,null and empty 都會轉(zhuǎn)化成 null
public int lookupId(String name); //
public int getCardinality(); // 獲取基數(shù),字典長度
唯一實現(xiàn)類SimpleDictionaryEncodedColumn,有三個屬性:
private final IndexedInts column;
private final IndexedMultivalue<IndexedInts> multiValueColumn;
private final CachingIndexed<String> cachedLookups;
其中有意思的是 cachedLookups,存儲的是字典。
CachingIndexed
字典的具體實現(xiàn)類,實現(xiàn)了 Indexed接口,其它的實現(xiàn)類主要有:
- GenericIndexed
- ArrayIndexed
- BufferIndexed
- ListIndexed
- VSizeIndexed
CachingIndexed 是 wrapping a given GenericIndexed,同時使用一個 LRUMap SizedLRUMap<Integer, T>來存儲 cachedValues.
GenericIndexed
A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
V1 Storage Format:
- byte 1: version (0x1)
- byte 2 == 0x1 => allowReverseLookup
- bytes 3-6 => numBytesUsed
- bytes 7-10 => numElements
- bytes 10-((numElements * 4) + 10): integers representing 'end' offsets of byte serialized values
- bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value
屬性有:
private final ByteBuffer theBuffer; // 內(nèi)置的 ByteBuffer 存儲
private final ObjectStrategy<T> strategy;
private final boolean allowReverseLookup;
private final int size; // theBuffer 的當(dāng)前 int 值
private final int valuesOffset;
private final BufferIndexed bufferIndexed; // 內(nèi)部類, BufferIndexed
Column 類
接口,詳見實現(xiàn)類
SimpleColumn 類
屬性:
private final ColumnCapabilitiescapabilities;
private final SupplierdictionaryEncodedColumn;
private final SupplierrunLengthColumn;
private final SuppliergenericColumn;
private final SuppliercomplexColumn;
private final SupplierbitmapIndex;
private final SupplierspatialIndex;