1. 時(shí)序數(shù)據(jù)庫TSDB
1.1. 時(shí)序數(shù)據(jù)
- 通常按時(shí)間順序到達(dá)(可以按規(guī)則或不規(guī)則的時(shí)間間隔到達(dá))
- 數(shù)據(jù)量大
- 具有時(shí)效性,越新的數(shù)據(jù)越有價(jià)值

1.2. 時(shí)序數(shù)據(jù)庫
-
發(fā)展史
發(fā)展史 DB-Engine-Rank


2. Influxdb介紹
2.1. 優(yōu)勢
- 易用性:類SQL語句
- 功能齊全
- TICK生態(tài)
- 數(shù)據(jù)壓縮率高
- 讀寫性能高
2.2. 哪些公司在使用

2.3. Evolution and Thinks
Evolution
- 版本0.9.0之前,基于LevelDB的LSMTree方案
- 版本0.9.0~0.9.4,基于BoltDB的B+tree方案
- 版本0.9.5~1.2,基于自研的WAL + TSMFile方案
- 版本1.3~至今,基于自研的WAL + TSMFile + TSIFile方案
Some Thinks
- 時(shí)序數(shù)據(jù)在降采樣后會(huì)存在大批量的數(shù)據(jù)刪除,LevelDB的LSMTree刪除代價(jià)過高
- 單機(jī)環(huán)境存放大量數(shù)據(jù)時(shí)不能占用過多文件句柄,LevelDB會(huì)隨著時(shí)間增長產(chǎn)生大量小文件
- 數(shù)據(jù)存儲(chǔ)需要熱備份,LevelDB只能冷備
- 大數(shù)據(jù)場景下寫吞吐量要跟得上,BoltDB的B+tree寫操作吞吐量成瓶頸
- 存儲(chǔ)需具備良好的壓縮性能,BoltDB不支持壓縮
2.4. Goal:高效寫入,高壓縮比
- 采用無模式設(shè)計(jì),便于管理不連續(xù)數(shù)據(jù)。也意味著不支持某些數(shù)據(jù)庫功能,例如沒有交叉表連接
- 不能存儲(chǔ)重復(fù)數(shù)據(jù),可能會(huì)在極少數(shù)情況下覆蓋數(shù)據(jù)
- 限制數(shù)據(jù)刪除和更新,從而增加查詢和寫入性能
刪除功能,不能只根據(jù)tag刪除,須攜帶timestamp篩選刪除
更新功能,不支持update,可以通過insert相同timestamp的數(shù)據(jù)點(diǎn)
- 存儲(chǔ)壓縮比例高達(dá)10%
2.5. Terms

- database: 數(shù)據(jù)庫,measurement集合
- measurement:指標(biāo)對象,也即一個(gè)數(shù)據(jù)源對象。每個(gè)measurement可以擁有一個(gè)或多個(gè)指標(biāo)值
- tag:概念等同于大多數(shù)時(shí)序數(shù)據(jù)庫中的tags, 通常通過tags可以唯一標(biāo)示數(shù)據(jù)源。每個(gè)tag的key和value必須都是字符串
- field:數(shù)據(jù)源記錄的具體指標(biāo)值。每一種指標(biāo)被稱作一個(gè)“field”,指標(biāo)值就是 “field”對應(yīng)的“value”
- timestamp:數(shù)據(jù)的時(shí)間戳。在InfluxDB中,理論上時(shí)間戳可以精確到 納秒(ns)級別
- series:retention policy、measurement和tag set的集合

2.6. Functions

2.7. Continuous Queries and Retention Policies
- Continuous Query (CQ),是在數(shù)據(jù)庫內(nèi)部自動(dòng)周期性運(yùn)行的一個(gè)查詢
- Retention Policy (RP),是InfluxDB數(shù)據(jù)架構(gòu)的一部分,它描述了InfluxDB保存數(shù)據(jù)的時(shí)間。單個(gè)數(shù)據(jù)庫中可以有多個(gè)RPs,但是每個(gè)Measurement的RPs是唯一的
2.8. Test Case
數(shù)據(jù)
以10秒的間隔,來追蹤餐廳通過電話和網(wǎng)站訂購食品的訂單數(shù)量。我們會(huì)把這些數(shù)據(jù)存在food_data數(shù)據(jù)庫里,其measurement為orders,fields分別為phone和website,如圖所示。

問題
假定在長時(shí)間的運(yùn)行中,我們只關(guān)心每三十分鐘通過手機(jī)和網(wǎng)站訂購的平均數(shù)量,我們希望用RPs和CQs實(shí)現(xiàn)下面的需求:
- 自動(dòng)將十秒間隔數(shù)據(jù)聚合到30分鐘的間隔數(shù)據(jù)
- 自動(dòng)刪除兩個(gè)小時(shí)以上的原始10秒間隔數(shù)據(jù)
- 自動(dòng)刪除超過52周的30分鐘間隔數(shù)據(jù)
Answer
- 準(zhǔn)備數(shù)據(jù)庫,以及Retention Policy
CREATE DATABASE "food_data"
CREATE RETENTION POLICY "two_hours" ON "food_data" DURATION 2h REPLICATION 1 DEFAULT
CREATE RETENTION POLICY "a_year" ON "food_data" DURATION 52w REPLICATION 1
- 創(chuàng)建Continuous Query
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)
END
- 寫入數(shù)據(jù)
INSERT orders phone=10,website=30 ...
- 結(jié)果數(shù)據(jù)
在orders里面是10秒鐘間隔的裸數(shù)據(jù),保存時(shí)間為2小時(shí)
在downsampled_orders里面是30分鐘的聚合數(shù)據(jù),保存時(shí)間為52周
2.9. Hardware sizing guidelines
- Low load recommendations,CPU: 2-4 cores,RAM: 2-4 GB,IOPS: 500
- Moderate load recommendations,CPU: 4-6 cores,RAM: 8-32 GB,IOPS: 500-1000
- High load recommendations,CPU: 8+ cores,RAM: 32+ GB,IOPS: 1000+

3. The eco-system for InfluxDB
- Telegraf, Time-Series Data Collector
- InfluxDB, Time-Series Data Storage
- Chronograf, Time-Series Data Visualization
- Kapacitor, Time-Series Data Processing

3.1. Telegraf
Telegraf is the open source server agent to help you collect metrics from your stacks, sensors and systems.
- monitoring the host filesystem
- monitoring docker containers
- supporting 200+ inputs, such as mysql, redis, mongodb, nginx, kubernetes.
3.2. StatsD
A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP or TCP and sends aggregates to one or more pluggable backend services (e.g., Graphite).
- 計(jì)數(shù)器counter,例如user.logins:10|c // user.logins+10
- 計(jì)時(shí)器timer,例如foo:100|ms // foo 100ms
- 標(biāo)量gauge,例如age:+1|g // age+1
- 集合set,例如user:1|s user:2|s user:1|s // 2個(gè)user
3.3. Chronograf
Chronograf is the user interface and administrative component of the InfluxDB 1.x platform.


3.4. Grafana
The open platform for beautiful analytics and monitoring.
4. Refers
- InfluxDB官方文檔:https://docs.influxdata.com/platform/introduction
- InfluxDB中文文檔:https://jasper-zhang1.gitbooks.io/influxdb/content/
- 時(shí)序數(shù)據(jù)庫技術(shù)體系解析:http://hbasefly.com/category/%e6%97%b6%e5%ba%8f%e6%95%b0%e6%8d%ae%e5%ba%93/
- 如何寫一個(gè)時(shí)序數(shù)據(jù)庫:https://fabxc.org/tsdb/
