時(shí)序數(shù)據(jù)庫(kù)InfluxDB使用詳解

InfluxDB是一個(gè)開源的時(shí)序數(shù)據(jù)庫(kù),使用GO語(yǔ)言開發(fā),特別適合用于處理和分析資源監(jiān)控?cái)?shù)據(jù)這種時(shí)序相關(guān)數(shù)據(jù)。而InfluxDB自帶的各種特殊函數(shù)如求標(biāo)準(zhǔn)差,隨機(jī)取樣數(shù)據(jù),統(tǒng)計(jì)數(shù)據(jù)變化比等,使數(shù)據(jù)統(tǒng)計(jì)和實(shí)時(shí)分析變得十分方便。在我們的容器資源監(jiān)控系統(tǒng)中,就采用了InfluxDB存儲(chǔ)cadvisor的監(jiān)控?cái)?shù)據(jù)。本文對(duì)InfluxDB的基本概念和一些特色功能做一個(gè)詳細(xì)介紹,內(nèi)容主要是翻譯整理自官網(wǎng)文檔,如有錯(cuò)漏,請(qǐng)指正。

1 安裝配置

這里說(shuō)一下使用docker容器運(yùn)行influxdb的步驟,物理機(jī)安裝請(qǐng)參照官方文檔。拉取鏡像文件后運(yùn)行即可,當(dāng)前最新版本是1.3.5。啟動(dòng)容器時(shí)設(shè)置掛載的數(shù)據(jù)目錄和開放端口。InfluxDB的操作語(yǔ)法InfluxQL與SQL基本一致,也提供了一個(gè)類似mysql-client的名為influx的CLI。InfluxDB本身是支持分布式部署多副本存儲(chǔ)的,本文介紹都是針對(duì)的單節(jié)點(diǎn)單副本。

# docker pull influxdb
# docker run -idt --name influxdb -p 8086:8086 -v /Users/ssj/influxdb:/var/lib/influxdb influxdb
f216e9be15bff545befecb30d1d275552026216a939cc20c042b17419e3bde31
# docker exec -it influxdb /bin/bash 
root@f216e9be15bf:/# influx
Connected to http://localhost:8086 version 1.3.5
InfluxDB shell version: 1.3.5
> create database cadvisor  ## 創(chuàng)建數(shù)據(jù)庫(kù)cadvisor
> show databases           
name: databases
name
----
_internal
cadvisor
> CREATE USER testuser WITH PASSWORD 'testpwd' ## 創(chuàng)建用戶和設(shè)置密碼
> GRANT ALL PRIVILEGES ON cadvisor TO testuser ## 授權(quán)數(shù)據(jù)庫(kù)給指定用戶
> CREATE RETENTION POLICY "cadvisor_retention" ON "cadvisor" DURATION 30d REPLICATION 1 DEFAULT ## 創(chuàng)建默認(rèn)的數(shù)據(jù)保留策略,設(shè)置保存時(shí)間30天,副本為1

2 重要概念

influxdb里面有一些重要概念:database,timestamp,field key, field value, field set,tag key,tag value,tag set,measurement, retention policy ,series,point。結(jié)合下面的例子數(shù)據(jù)來(lái)說(shuō)明這幾個(gè)概念:

name: census
-————————————
time                     butterflies     honeybees     location   scientist
2015-08-18T00:00:00Z      12                23           1         langstroth
2015-08-18T00:00:00Z      1                 30           1         perpetua
2015-08-18T00:06:00Z      11                28           1         langstroth
2015-08-18T00:06:00Z      3                 28           1         perpetua
2015-08-18T05:54:00Z      2                 11           2         langstroth
2015-08-18T06:00:00Z      1                 10           2         langstroth
2015-08-18T06:06:00Z      8                 23           2         perpetua
2015-08-18T06:12:00Z      7                 22           2         perpetua

timestamp

既然是時(shí)間序列數(shù)據(jù)庫(kù),influxdb的數(shù)據(jù)都有一列名為time的列,里面存儲(chǔ)UTC時(shí)間戳。

field key,field value,field set

butterflies和honeybees兩列數(shù)據(jù)稱為字段(fields),influxdb的字段由field key和field value組成。其中butterflies和honeybees為field key,它們?yōu)閟tring類型,用于存儲(chǔ)元數(shù)據(jù)。

而butterflies這一列的數(shù)據(jù)12-7為butterflies的field value,同理,honeybees這一列的23-22為honeybees的field value。field value可以為string,float,integer或boolean類型。field value通常都是與時(shí)間關(guān)聯(lián)的。

field key和field value對(duì)組成的集合稱之為field set。如下:

butterflies = 12 honeybees = 23
butterflies = 1 honeybees = 30
butterflies = 11 honeybees = 28
butterflies = 3 honeybees = 28
butterflies = 2 honeybees = 11
butterflies = 1 honeybees = 10
butterflies = 8 honeybees = 23
butterflies = 7 honeybees = 22

在influxdb中,字段必須存在。注意,字段是沒(méi)有索引的。如果使用字段作為查詢條件,會(huì)掃描符合查詢條件的所有字段值,性能不及tag。類比一下,fields相當(dāng)于SQL的沒(méi)有索引的列。

tag key,tag value,tag set

location和scientist這兩列稱為標(biāo)簽(tags),標(biāo)簽由tag key和tag value組成。location這個(gè)tag key有兩個(gè)tag value:1和2,scientist有兩個(gè)tag value:langstroth和perpetua。tag key和tag value對(duì)組成了tag set,示例中的tag set如下:

location = 1, scientist = langstroth
location = 2, scientist = langstroth
location = 1, scientist = perpetua
location = 2, scientist = perpetua

tags是可選的,但是強(qiáng)烈建議你用上它,因?yàn)閠ag是有索引的,tags相當(dāng)于SQL中的有索引的列。tag value只能是string類型 如果你的常用場(chǎng)景是根據(jù)butterflies和honeybees來(lái)查詢,那么你可以將這兩個(gè)列設(shè)置為tag,而其他兩列設(shè)置為field,tag和field依據(jù)具體查詢需求來(lái)定。

measurement

measurement是fields,tags以及time列的容器,measurement的名字用于描述存儲(chǔ)在其中的字段數(shù)據(jù),類似mysql的表名。如上面例子中的measurement為census。measurement相當(dāng)于SQL中的表,本文中我在部分地方會(huì)用表來(lái)指代measurement。

retention policy

retention policy指數(shù)據(jù)保留策略,示例數(shù)據(jù)中的retention policy為默認(rèn)的autogen。它表示數(shù)據(jù)一直保留永不過(guò)期,副本數(shù)量為1。你也可以指定數(shù)據(jù)的保留時(shí)間,如30天。

series

series是共享同一個(gè)retention policy,measurement以及tag set的數(shù)據(jù)集合。示例中數(shù)據(jù)有4個(gè)series,如下:

Arbitrary series number Retention policy Measurement Tag set
series 1 autogen census location = 1,scientist = langstroth
series 2 autogen census location = 2,scientist = langstroth
series 3 autogen census location = 1,scientist = perpetua
series 4 autogen census location = 2,scientist = perpetua

point

point則是同一個(gè)series中具有相同時(shí)間的field set,points相當(dāng)于SQL中的數(shù)據(jù)行。如下面就是一個(gè)point:

name: census
-----------------
time                  butterflies    honeybees   location    scientist
2015-08-18T00:00:00Z       1            30           1        perpetua

database

上面提到的結(jié)構(gòu)都存儲(chǔ)在數(shù)據(jù)庫(kù)中,示例的數(shù)據(jù)庫(kù)為my_database。一個(gè)數(shù)據(jù)庫(kù)可以有多個(gè)measurement,retention policy, continuous queries以及user。influxdb是一個(gè)無(wú)模式的數(shù)據(jù)庫(kù),可以很容易的添加新的measurement,tags,fields等。而它的操作卻和傳統(tǒng)的數(shù)據(jù)庫(kù)一樣,可以使用類SQL語(yǔ)言查詢和修改數(shù)據(jù)。

influxdb不是一個(gè)完整的CRUD數(shù)據(jù)庫(kù),它更像是一個(gè)CR-ud數(shù)據(jù)庫(kù)。它優(yōu)先考慮的是增加和讀取數(shù)據(jù)而不是更新和刪除數(shù)據(jù)的性能,而且它阻止了某些更新和刪除行為使得創(chuàng)建和讀取數(shù)據(jù)更加高效。

3 特色函數(shù)

influxdb函數(shù)分為聚合函數(shù),選擇函數(shù),轉(zhuǎn)換函數(shù),預(yù)測(cè)函數(shù)等。除了與普通數(shù)據(jù)庫(kù)一樣提供了基本操作函數(shù)外,還提供了一些特色函數(shù)以方便數(shù)據(jù)統(tǒng)計(jì)計(jì)算,下面會(huì)一一介紹其中一些常用的特色函數(shù)。

  • 聚合函數(shù):FILL(), INTEGRAL()SPREAD(), STDDEV(),MEAN(), MEDIAN()等。
  • 選擇函數(shù): SAMPLE(), PERCENTILE(), FIRST(), LAST(), TOP(), BOTTOM()等。
  • 轉(zhuǎn)換函數(shù): DERIVATIVE(), DIFFERENCE()等。
  • 預(yù)測(cè)函數(shù):HOLT_WINTERS()。

先從官網(wǎng)導(dǎo)入測(cè)試數(shù)據(jù)(注:這里測(cè)試用的版本是1.3.1,最新版本是1.3.5):

$ curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
$ influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
$ influx -precision rfc3339 -database NOAA_water_database
Connected to http://localhost:8086 version 1.3.1
InfluxDB shell 1.3.1
> show measurements
name: measurements
name
----
average_temperature
distincts
h2o_feet
h2o_pH
h2o_quality
h2o_temperature

> show series from h2o_feet;
key
---
h2o_feet,location=coyote_creek
h2o_feet,location=santa_monica

下面的例子都以官方示例數(shù)據(jù)庫(kù)來(lái)測(cè)試,這里只用部分?jǐn)?shù)據(jù)以方便觀察。measurement為h2o_feet,tag key為location,field key有level descriptionwater_level兩個(gè)。

> SELECT * FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 level description    location     water_level
----                 -----------------    --------     -----------
2015-08-18T00:00:00Z between 6 and 9 feet coyote_creek 8.12
2015-08-18T00:00:00Z below 3 feet         santa_monica 2.064
2015-08-18T00:06:00Z between 6 and 9 feet coyote_creek 8.005
2015-08-18T00:06:00Z below 3 feet         santa_monica 2.116
2015-08-18T00:12:00Z between 6 and 9 feet coyote_creek 7.887
2015-08-18T00:12:00Z below 3 feet         santa_monica 2.028
2015-08-18T00:18:00Z between 6 and 9 feet coyote_creek 7.762
2015-08-18T00:18:00Z below 3 feet         santa_monica 2.126
2015-08-18T00:24:00Z between 6 and 9 feet coyote_creek 7.635
2015-08-18T00:24:00Z below 3 feet         santa_monica 2.041
2015-08-18T00:30:00Z between 6 and 9 feet coyote_creek 7.5
2015-08-18T00:30:00Z below 3 feet         santa_monica 2.051

GROUP BY,F(xiàn)ILL()

如下語(yǔ)句中GROUP BY time(12m),* 表示以每12分鐘和tag(location)分組(如果是GROUP BY time(12m)則表示僅每12分鐘分組,GROUP BY 參數(shù)只能是time和tag)。然后fill(200)表示如果這個(gè)時(shí)間段沒(méi)有數(shù)據(jù),以200填充,mean(field_key)求該范圍內(nèi)數(shù)據(jù)的平均值(注意:這是依據(jù)series來(lái)計(jì)算。其他還有SUM求和,MEDIAN求中位數(shù))。LIMIT 7表示限制返回的point(記錄數(shù))最多為7條,而SLIMIT 1則是限制返回的series為1個(gè)。

注意這里的時(shí)間區(qū)間,起始時(shí)間為整點(diǎn)前包含這個(gè)區(qū)間第一個(gè)12m的時(shí)間,比如這里為 2015-08-17T:23:48:00Z,第一條為 2015-08-17T23:48:00Z <= t < 2015-08-18T00:00:00Z這個(gè)區(qū)間的location=coyote_creekwater_level的平均值,這里沒(méi)有數(shù)據(jù),于是填充的200。第二條為 2015-08-18T00:00:00Z <= t < 2015-08-18T00:12:00Z區(qū)間的location=coyote_creekwater_level平均值,這里為 (8.12+8.005)/ 2 = 8.0625,其他以此類推。

GROUP BY time(10m)則表示以10分鐘分組,起始時(shí)間為包含這個(gè)區(qū)間的第一個(gè)10m的時(shí)間,即 2015-08-17T23:40:00Z。默認(rèn)返回的是第一個(gè)series,如果要計(jì)算另外那個(gè)series,可以在SQL語(yǔ)句后面加上 SOFFSET 1

那如果時(shí)間小于數(shù)據(jù)本身采集的時(shí)間間隔呢,比如GROUP BY time(10s)呢?這樣的話,就會(huì)按10s取一個(gè)點(diǎn),沒(méi)有數(shù)值的為空或者FILL填充,對(duì)應(yīng)時(shí)間點(diǎn)有數(shù)據(jù)則保持不變。

## GROUP BY time(12m)
> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(200) LIMIT 7 SLIMIT 1
name: h2o_feet
tags: location=coyote_creek
time                 mean
----                 ----
2015-08-17T23:48:00Z 200
2015-08-18T00:00:00Z 8.0625
2015-08-18T00:12:00Z 7.8245
2015-08-18T00:24:00Z 7.5675

## GROUP BY time(10m),SOFFSET設(shè)置為1
> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(10m),* fill(200) LIMIT 7 SLIMIT 1 SOFFSET 1
name: h2o_feet
tags: location=santa_monica
time                 mean
----                 ----
2015-08-17T23:40:00Z 200
2015-08-17T23:50:00Z 200
2015-08-18T00:00:00Z 2.09
2015-08-18T00:10:00Z 2.077
2015-08-18T00:20:00Z 2.041
2015-08-18T00:30:00Z 2.051

INTEGRAL(field_key, unit)

計(jì)算數(shù)值字段值覆蓋的曲面的面積值并得到面積之和。測(cè)試數(shù)據(jù)如下:

> SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'

name: h2o_feet
time                   water_level
----                   -----------
2015-08-18T00:00:00Z   2.064
2015-08-18T00:06:00Z   2.116
2015-08-18T00:12:00Z   2.028
2015-08-18T00:18:00Z   2.126
2015-08-18T00:24:00Z   2.041
2015-08-18T00:30:00Z   2.051

使用INTERGRAL計(jì)算面積。注意,這個(gè)面積就是這些點(diǎn)連接起來(lái)后與時(shí)間圍成的不規(guī)則圖形的面積,注意unit默認(rèn)是以1秒計(jì)算,所以下面語(yǔ)句計(jì)算結(jié)果為3732.66=2.028*1800+分割出來(lái)的梯形和三角形面積。如果unit改為1分,則結(jié)果為3732.66/60 = 62.211。unit為2分,則結(jié)果為3732.66/120 = 31.1055。以此類推。

# unit為默認(rèn)的1秒
> SELECT INTEGRAL("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 integral
----                 --------
1970-01-01T00:00:00Z 3732.66

# unit為1分
> SELECT INTEGRAL("water_level", 1m) FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 integral
----                 --------
1970-01-01T00:00:00Z 62.211

SPREAD(field_key)

計(jì)算數(shù)值字段的最大值和最小值的差值。

> SELECT SPREAD("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(18) LIMIT 3 SLIMIT 1 SOFFSET 1
name: h2o_feet
tags: location=santa_monica
time                 spread
----                 ------
2015-08-17T23:48:00Z 18
2015-08-18T00:00:00Z 0.052000000000000046
2015-08-18T00:12:00Z 0.09799999999999986

STDDEV(field_key)

計(jì)算字段的標(biāo)準(zhǔn)差。influxdb用的是貝塞爾修正的標(biāo)準(zhǔn)差計(jì)算公式 ,如下:

  • mean=(v1+v2+...+vn)/n;
  • stddev = math.sqrt(
    ((v1-mean)2 + (v2-mean)2 + ...+(vn-mean)2)/(n-1)
    )
> SELECT STDDEV("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),* fill(18) SLIMIT 1;
name: h2o_feet
tags: location=coyote_creek
time                 stddev
----                 ------
2015-08-17T23:48:00Z 18
2015-08-18T00:00:00Z 0.08131727983645186
2015-08-18T00:12:00Z 0.08838834764831845
2015-08-18T00:24:00Z 0.09545941546018377

PERCENTILE(field_key, N)

選取某個(gè)字段中大于N%的這個(gè)字段值。

如果一共有4條記錄,N為10,則10%*4=0.4,四舍五入為0,則查詢結(jié)果為空。N為20,則 20% * 4 = 0.8,四舍五入為1,選取的是4個(gè)數(shù)中最小的數(shù)。如果N為40,40% * 4 = 1.6,四舍五入為2,則選取的是4個(gè)數(shù)中第二小的數(shù)。由此可以看出N=100時(shí),就跟MAX(field_key)是一樣的,而當(dāng)N=50時(shí),與MEDIAN(field_key)在字段值為奇數(shù)個(gè)時(shí)是一樣的。

> SELECT PERCENTILE("water_level",20) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m)
name: h2o_feet
time                 percentile
----                 ----------
2015-08-17T23:48:00Z 
2015-08-18T00:00:00Z 2.064
2015-08-18T00:12:00Z 2.028
2015-08-18T00:24:00Z 2.041

> SELECT PERCENTILE("water_level",40) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m)
name: h2o_feet
time                 percentile
----                 ----------
2015-08-17T23:48:00Z 
2015-08-18T00:00:00Z 2.116
2015-08-18T00:12:00Z 2.126
2015-08-18T00:24:00Z 2.051

SAMPLE(field_key, N)

隨機(jī)返回field key的N個(gè)值。如果語(yǔ)句中有GROUP BY time(),則每組數(shù)據(jù)隨機(jī)返回N個(gè)值。

> SELECT SAMPLE("water_level",2) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z';
name: h2o_feet
time                 sample
----                 ------
2015-08-18T00:00:00Z 2.064
2015-08-18T00:12:00Z 2.028

> SELECT SAMPLE("water_level",2) FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m);
name: h2o_feet
time                 sample
----                 ------
2015-08-18T00:06:00Z 2.116
2015-08-18T00:06:00Z 8.005
2015-08-18T00:12:00Z 7.887
2015-08-18T00:18:00Z 7.762
2015-08-18T00:24:00Z 7.635
2015-08-18T00:30:00Z 2.051

CUMULATIVE_SUM(field_key)

計(jì)算字段值的遞增和。

> SELECT CUMULATIVE_SUM("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:30:00Z';
name: h2o_feet
time                 cumulative_sum
----                 --------------
2015-08-18T00:00:00Z 8.12
2015-08-18T00:00:00Z 10.184
2015-08-18T00:06:00Z 18.189
2015-08-18T00:06:00Z 20.305
2015-08-18T00:12:00Z 28.192
2015-08-18T00:12:00Z 30.22
2015-08-18T00:18:00Z 37.982
2015-08-18T00:18:00Z 40.108
2015-08-18T00:24:00Z 47.742999999999995
2015-08-18T00:24:00Z 49.78399999999999
2015-08-18T00:30:00Z 57.28399999999999
2015-08-18T00:30:00Z 59.334999999999994

DERIVATIVE(field_key, unit) 和 NON_NEGATIVE_DERIVATIVE(field_key, unit)

計(jì)算字段值的變化比。unit默認(rèn)為1s,即計(jì)算的是1秒內(nèi)的變化比。

如下面的第一個(gè)數(shù)據(jù)計(jì)算方法是 (2.116-2.064)/(6*60) = 0.00014..,其他計(jì)算方式同理。雖然原始數(shù)據(jù)是6m收集一次,但是這里的變化比默認(rèn)是按秒來(lái)計(jì)算的。如果要按6m計(jì)算,則設(shè)置unit為6m即可。

> SELECT DERIVATIVE("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.00014444444444444457
2015-08-18T00:12:00Z -0.00024444444444444465
2015-08-18T00:18:00Z 0.0002722222222222218
2015-08-18T00:24:00Z -0.000236111111111111
2015-08-18T00:30:00Z 0.00002777777777777842

> SELECT DERIVATIVE("water_level", 6m) FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z'
name: h2o_feet
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:12:00Z -0.08800000000000008
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:24:00Z -0.08499999999999996
2015-08-18T00:30:00Z 0.010000000000000231

而DERIVATIVE結(jié)合GROUP BY time,以及mean可以構(gòu)造更加復(fù)雜的查詢,如下所示:

> SELECT DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(12m), *
name: h2o_feet
tags: location=coyote_creek
time                 derivative
----                 ----------
2015-08-18T00:12:00Z -0.11900000000000022
2015-08-18T00:24:00Z -0.12849999999999984

name: h2o_feet
tags: location=santa_monica
time                 derivative
----                 ----------
2015-08-18T00:12:00Z -0.00649999999999995
2015-08-18T00:24:00Z -0.015499999999999847

這個(gè)計(jì)算其實(shí)是先根據(jù)GROUP BY time求平均值,然后對(duì)這個(gè)平均值再做變化比的計(jì)算。因?yàn)閿?shù)據(jù)是按12分鐘分組的,而變化比的unit是6分鐘,所以差值除以2(12/6)才得到變化比。如第一個(gè)值是 (7.8245-8.0625)/2 = -0.1190。

> SELECT mean("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(12m), *
name: h2o_feet
tags: location=coyote_creek
time                 mean
----                 ----
2015-08-18T00:00:00Z 8.0625
2015-08-18T00:12:00Z 7.8245
2015-08-18T00:24:00Z 7.5675

name: h2o_feet
tags: location=santa_monica
time                 mean
----                 ----
2015-08-18T00:00:00Z 2.09
2015-08-18T00:12:00Z 2.077
2015-08-18T00:24:00Z 2.0460000000000003

NON_NEGATIVE_DERIVATIVEDERIVATIVE不同的是它只返回的是非負(fù)的變化比:

> SELECT DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE location='santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(6m), *
name: h2o_feet
tags: location=santa_monica
time                 derivative
----                 ----------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:12:00Z -0.08800000000000008
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:24:00Z -0.08499999999999996
2015-08-18T00:30:00Z 0.010000000000000231

> SELECT NON_NEGATIVE_DERIVATIVE(mean("water_level"), 6m) FROM "h2o_feet" WHERE location='santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' group by time(6m), *
name: h2o_feet
tags: location=santa_monica
time                 non_negative_derivative
----                 -----------------------
2015-08-18T00:06:00Z 0.052000000000000046
2015-08-18T00:18:00Z 0.09799999999999986
2015-08-18T00:30:00Z 0.010000000000000231

4 連續(xù)查詢

4.1 基本語(yǔ)法

連續(xù)查詢(CONTINUOUS QUERY,簡(jiǎn)寫為CQ)是指定時(shí)自動(dòng)在實(shí)時(shí)數(shù)據(jù)上進(jìn)行的InfluxQL查詢,查詢結(jié)果可以存儲(chǔ)到指定的measurement中?;菊Z(yǔ)法格式如下:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
BEGIN
  <cq_query>
END

cq_query格式:
SELECT <function[s]> INTO <destination_measurement> FROM <measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<tag_key[s]>]

CQ操作的是實(shí)時(shí)數(shù)據(jù),它使用本地服務(wù)器的時(shí)間戳、GROUP BY time()時(shí)間間隔以及InfluxDB預(yù)先設(shè)置好的時(shí)間范圍來(lái)確定什么時(shí)候開始查詢以及查詢覆蓋的時(shí)間范圍。注意CQ語(yǔ)句里面的WHERE條件是沒(méi)有時(shí)間范圍的,因?yàn)镃Q會(huì)根據(jù)GROUP BY time()自動(dòng)確定時(shí)間范圍。

CQ執(zhí)行的時(shí)間間隔和GROUP BY time()的時(shí)間間隔一樣,它在InfluxDB預(yù)先設(shè)置的時(shí)間范圍的起始時(shí)刻執(zhí)行。如果GROUP BY time(1h),則單次查詢的時(shí)間范圍為 now()-GROUP BY time(1h)now(),也就是說(shuō),如果當(dāng)前時(shí)間為17點(diǎn),這次查詢的時(shí)間范圍為 16:00到16:59.99999。

下面看幾個(gè)示例,示例數(shù)據(jù)如下,這是數(shù)據(jù)庫(kù)transportation中名為bus_data的measurement,每15分鐘統(tǒng)計(jì)一次乘客數(shù)和投訴數(shù)。數(shù)據(jù)文件bus_data.txt如下:

# DDL
CREATE DATABASE transportation

# DML
# CONTEXT-DATABASE: transportation 

bus_data,complaints=9 passengers=5 1472367600
bus_data,complaints=9 passengers=8 1472368500
bus_data,complaints=9 passengers=8 1472369400
bus_data,complaints=9 passengers=7 1472370300
bus_data,complaints=9 passengers=8 1472371200
bus_data,complaints=7 passengers=15 1472372100
bus_data,complaints=7 passengers=15 1472373000
bus_data,complaints=7 passengers=17 1472373900
bus_data,complaints=7 passengers=20 1472374800

導(dǎo)入數(shù)據(jù),命令如下:

root@f216e9be15bf:/# influx -import -path=bus_data.txt -precision=s
root@f216e9be15bf:/# influx -precision=rfc3339 -database=transportation
Connected to http://localhost:8086 version 1.3.5
InfluxDB shell version: 1.3.5
> select * from bus_data
name: bus_data
time                 complaints passengers
----                 ---------- ----------
2016-08-28T07:00:00Z 9          5
2016-08-28T07:15:00Z 9          8
2016-08-28T07:30:00Z 9          8
2016-08-28T07:45:00Z 9          7
2016-08-28T08:00:00Z 9          8
2016-08-28T08:15:00Z 7          15
2016-08-28T08:30:00Z 7          15
2016-08-28T08:45:00Z 7          17
2016-08-28T09:00:00Z 7          20

示例1 自動(dòng)縮小取樣存儲(chǔ)到新的measurement中

對(duì)單個(gè)字段自動(dòng)縮小取樣并存儲(chǔ)到新的measurement中。

CREATE CONTINUOUS QUERY "cq_basic" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

這個(gè)CQ的意思就是對(duì)bus_data每小時(shí)自動(dòng)計(jì)算取樣數(shù)據(jù)的平均乘客數(shù)并存儲(chǔ)到 average_passengers中。那么在2016-08-28這天早上會(huì)執(zhí)行如下流程:

At 8:00 cq_basic 執(zhí)行查詢,查詢時(shí)間范圍 time >= '7:00' AND time < '08:00'.
cq_basic寫入一條記錄到 average_passengers:
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
At 9:00 cq_basic 執(zhí)行查詢,查詢時(shí)間范圍 time >= '8:00' AND time < '9:00'.
cq_basic寫入一條記錄到 average_passengers:
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   13.75

# Results
> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例2 自動(dòng)縮小取樣并存儲(chǔ)到新的保留策略(Retention Policy)中

CREATE CONTINUOUS QUERY "cq_basic_rp" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "transportation"."three_weeks"."average_passengers" FROM "bus_data" GROUP BY time(1h)
END

與示例1類似,不同的是保留的策略不是autogen,而是改成了three_weeks(創(chuàng)建保留策略語(yǔ)法 CREATE RETENTION POLICY "three_weeks" ON "transportation" DURATION 3w REPLICATION 1)。

> SELECT * FROM "transportation"."three_weeks"."average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例3 使用后向引用(backreferencing)自動(dòng)縮小取樣并存儲(chǔ)到新的數(shù)據(jù)庫(kù)中

CREATE CONTINUOUS QUERY "cq_basic_br" ON "transportation"
BEGIN
  SELECT mean(*) INTO "downsampled_transportation"."autogen".:MEASUREMENT FROM /.*/ GROUP BY time(30m),*
END

使用后向引用語(yǔ)法自動(dòng)縮小取樣并存儲(chǔ)到新的數(shù)據(jù)庫(kù)中。語(yǔ)法 :MEASUREMENT 用來(lái)指代后面的表,而 /.*/則是分別查詢所有的表。這句CQ的含義就是每30分鐘自動(dòng)查詢transportation的所有表(這里只有bus_data一個(gè)表),并將30分鐘內(nèi)數(shù)字字段(passengers和complaints)求平均值存儲(chǔ)到新的數(shù)據(jù)庫(kù) downsampled_transportation中。

最終結(jié)果如下:

> SELECT * FROM "downsampled_transportation."autogen"."bus_data"
name: bus_data
--------------
time                   mean_complaints   mean_passengers
2016-08-28T07:00:00Z   9                 6.5
2016-08-28T07:30:00Z   9                 7.5
2016-08-28T08:00:00Z   8                 11.5
2016-08-28T08:30:00Z   7                 16

示例4 自動(dòng)縮小取樣以及配置CQ的時(shí)間范圍

CREATE CONTINUOUS QUERY "cq_basic_offset" ON "transportation"
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h,15m)
END

與前面幾個(gè)示例不同的是,這里的GROUP BY time(1h, 15m)指定了一個(gè)時(shí)間偏移,也就是說(shuō) cq_basic_offset執(zhí)行的時(shí)間不再是整點(diǎn),而是往后偏移15分鐘。執(zhí)行流程如下:

At 8:15 cq_basic_offset 執(zhí)行查詢的時(shí)間范圍 time >= '7:15' AND time < '8:15'.
name: average_passengers
------------------------
time                   mean
2016-08-28T07:15:00Z   7.75
At 9:15 cq_basic_offset 執(zhí)行查詢的時(shí)間范圍 time >= '8:15' AND time < '9:15'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:15:00Z   16.75

最終結(jié)果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:15:00Z   7.75
2016-08-28T08:15:00Z   16.75

4.2 高級(jí)語(yǔ)法

InfluxDB連續(xù)查詢的高級(jí)語(yǔ)法如下:

CREATE CONTINUOUS QUERY <cq_name> ON <database_name>
RESAMPLE EVERY <interval> FOR <interval>
BEGIN
  <cq_query>
END

與基本語(yǔ)法不同的是,多了RESAMPLE關(guān)鍵字。高級(jí)語(yǔ)法里CQ的執(zhí)行時(shí)間和查詢時(shí)間范圍則與RESAMPLE里面的兩個(gè)interval有關(guān)系。

高級(jí)語(yǔ)法中CQ以EVERY interval的時(shí)間間隔執(zhí)行,執(zhí)行時(shí)查詢的時(shí)間范圍則是FOR interval來(lái)確定。如果FOR interval為2h,當(dāng)前時(shí)間為17:00,則查詢的時(shí)間范圍為15:00-16:59.999999。RESAMPLE的EVERY和FOR兩個(gè)關(guān)鍵字可以只有一個(gè)。

示例的數(shù)據(jù)表如下,比之前的多了幾條記錄為了示例3和示例4的測(cè)試:

name: bus_data
--------------
time                   passengers
2016-08-28T06:30:00Z   2
2016-08-28T06:45:00Z   4
2016-08-28T07:00:00Z   5
2016-08-28T07:15:00Z   8
2016-08-28T07:30:00Z   8
2016-08-28T07:45:00Z   7
2016-08-28T08:00:00Z   8
2016-08-28T08:15:00Z   15
2016-08-28T08:30:00Z   15
2016-08-28T08:45:00Z   17
2016-08-28T09:00:00Z   20

示例1 只配置執(zhí)行時(shí)間間隔

CREATE CONTINUOUS QUERY "cq_advanced_every" ON "transportation"
RESAMPLE EVERY 30m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

這里配置了30分鐘執(zhí)行一次CQ,沒(méi)有指定FOR interval,于是查詢的時(shí)間范圍還是GROUP BY time(1h)指定的一個(gè)小時(shí),執(zhí)行流程如下:

At 8:00, cq_advanced_every 執(zhí)行時(shí)間范圍 time >= '7:00' AND time < '8:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
At 8:30, cq_advanced_every 執(zhí)行時(shí)間范圍 time >= '8:00' AND time < '9:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   12.6667
At 9:00, cq_advanced_every 執(zhí)行時(shí)間范圍 time >= '8:00' AND time < '9:00'.
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   13.75

需要注意的是,這里的 8點(diǎn)到9點(diǎn)這個(gè)區(qū)間執(zhí)行了兩次,第一次執(zhí)行時(shí)時(shí)8:30,平均值是 (8+15+15)/ 3 = 12.6667,而第二次執(zhí)行時(shí)間是9:00,平均值是 (8+15+15+17) / 4=13.75,而且最后第二個(gè)結(jié)果覆蓋了第一個(gè)結(jié)果。InfluxDB如何處理重復(fù)的記錄可以參見(jiàn)這個(gè)文檔。

最終結(jié)果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75

示例2 只配置查詢時(shí)間范圍

CREATE CONTINUOUS QUERY "cq_advanced_for" ON "transportation"
RESAMPLE FOR 1h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

只配置了時(shí)間范圍,而沒(méi)有配置EVERY interval。這樣,執(zhí)行的時(shí)間間隔與GROUP BY time(30m)一樣為30分鐘,而查詢的時(shí)間范圍為1小時(shí),由于是按30分鐘分組,所以每次會(huì)寫入兩條記錄。執(zhí)行流程如下:

At 8:00 cq_advanced_for 查詢時(shí)間范圍:time >= '7:00' AND time < '8:00'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
At 8:30 cq_advanced_for 查詢時(shí)間范圍:time >= '7:30' AND time < '8:30'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
At 9:00 cq_advanced_for 查詢時(shí)間范圍:time >= '8:00' AND time < '9:00'.
寫入兩條記錄。
name: average_passengers
------------------------
time                   mean
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

需要注意的是,cq_advanced_for每次寫入了兩條記錄,重復(fù)的記錄會(huì)被覆蓋。

最終結(jié)果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

示例3 同時(shí)配置執(zhí)行時(shí)間間隔和查詢時(shí)間范圍

CREATE CONTINUOUS QUERY "cq_advanced_every_for" ON "transportation"
RESAMPLE EVERY 1h FOR 90m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

這里配置了執(zhí)行間隔為1小時(shí),而查詢范圍90分鐘,最后分組是30分鐘,每次插入了三條記錄。執(zhí)行流程如下:

At 8:00 cq_advanced_every_for 查詢時(shí)間范圍 time >= '6:30' AND time < '8:00'.
插入三條記錄
name: average_passengers
------------------------
time                   mean
2016-08-28T06:30:00Z   3
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
At 9:00 cq_advanced_every_for 查詢時(shí)間范圍 time >= '7:30' AND time < '9:00'.
插入三條記錄
name: average_passengers
------------------------
time                   mean
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

最終結(jié)果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T06:30:00Z   3
2016-08-28T07:00:00Z   6.5
2016-08-28T07:30:00Z   7.5
2016-08-28T08:00:00Z   11.5
2016-08-28T08:30:00Z   16

示例4 配置查詢時(shí)間范圍和FILL填充

CREATE CONTINUOUS QUERY "cq_advanced_for_fill" ON "transportation"
RESAMPLE FOR 2h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) fill(1000)
END

在前面值配置查詢時(shí)間范圍的基礎(chǔ)上,加上FILL填充空的記錄。執(zhí)行流程如下:

At 6:00, cq_advanced_for_fill 查詢時(shí)間范圍:time >= '4:00' AND time < '6:00',沒(méi)有數(shù)據(jù),不填充。

At 7:00, cq_advanced_for_fill 查詢時(shí)間范圍:time >= '5:00' AND time < '7:00'. 寫入兩條記錄,沒(méi)有數(shù)據(jù)的時(shí)間點(diǎn)填充1000。
------------------------
time                   mean
2016-08-28T05:00:00Z   1000          <------ fill(1000)
2016-08-28T06:00:00Z   3             <------ average of 2 and 4

[…] At 11:00, cq_advanced_for_fill 查詢時(shí)間范圍:time >= '9:00' AND time < '11:00'.寫入兩條記錄,沒(méi)有數(shù)據(jù)的點(diǎn)填充1000。
name: average_passengers
------------------------
2016-08-28T09:00:00Z   20            <------ average of 20
2016-08-28T10:00:00Z   1000          <------ fill(1000)     

At 12:00, cq_advanced_for_fill 查詢時(shí)間范圍:time >= '10:00' AND time < '12:00'。沒(méi)有數(shù)據(jù),不填充。

最終結(jié)果:

> SELECT * FROM "average_passengers"
name: average_passengers
------------------------
time                   mean
2016-08-28T05:00:00Z   1000
2016-08-28T06:00:00Z   3
2016-08-28T07:00:00Z   7
2016-08-28T08:00:00Z   13.75
2016-08-28T09:00:00Z   20
2016-08-28T10:00:00Z   1000

5 參考資料

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • 什么是InfluxDB? InfluxDB介紹 InfluxDB是一款用Go語(yǔ)言編寫的開源分布式時(shí)序、事件和指標(biāo)數(shù)...
    BilyLuo閱讀 3,391評(píng)論 0 18
  • 隨著線上服務(wù)的全面docker化,對(duì)docker容器的監(jiān)控就很重要了。SA的監(jiān)控系統(tǒng)是物理機(jī)的監(jiān)控,在一個(gè)物理機(jī)跑...
    __七把刀__閱讀 14,359評(píng)論 3 22
  • Spring Cloud為開發(fā)人員提供了快速構(gòu)建分布式系統(tǒng)中一些常見(jiàn)模式的工具(例如配置管理,服務(wù)發(fā)現(xiàn),斷路器,智...
    卡卡羅2017閱讀 136,694評(píng)論 19 139
  • 早上7點(diǎn)鐘,我們打車來(lái)到了廈門市兒童醫(yī)院。掛了急診號(hào)交了錢,護(hù)士給鄭兒量了體溫39度,吃了退燒藥。 等待叫號(hào)就診,...
    簡(jiǎn)寧思靜閱讀 242評(píng)論 0 0
  • 今天的課程非常重要 講的是高手成交法則 蘇格拉底成交法和顧問(wèn)式成交法 想想以前做元器件產(chǎn)品經(jīng)理時(shí) 也用到過(guò)一部分 ...
    秋秋_d241閱讀 227評(píng)論 0 0

友情鏈接更多精彩內(nèi)容