寫在前面
時(shí)序數(shù)據(jù)場(chǎng)景: 物聯(lián)網(wǎng)(IoT)、監(jiān)控系統(tǒng)、應(yīng)用日志、用戶行為記錄等場(chǎng)景
傳統(tǒng)方法: btree索引
- 建測(cè)試表
drop table if exists tbl;
create unlogged table tbl( -- 為了方便測(cè)試使用unlogged table
sid int, -- 傳感器ID
v1 float, -- v1維度的值
v2 float, -- v2維度的值
v3 float, -- v3維度的值
v4 float, -- v4維度的值
v5 float, -- v5維度的值
ts timestamp -- 記錄上報(bào)時(shí)間戳
);
create index on tbl using btree (sid,ts);
寫場(chǎng)景
- 壓測(cè)腳本 --
假設(shè)有1萬個(gè)傳感器
-- t1.sql
\set sid random(1,10000)
insert into tbl values (:sid,random()*100,random()*100,random()*100,random()*100,random()*100, now());
- 壓測(cè)
pgbench -M prepared -n -r -P 1 -f ./t1.sql -c 10 -j 10 -T 120
transaction type: ./t1.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 14539863
latency average = 0.082 ms
latency stddev = 0.300 ms
initial connection time = 14.393 ms
tps = 121175.788589 (without initial connection time)
statement latencies in milliseconds:
0.000 \set sid random(1,10000)
0.082 insert into tbl values (:sid, random()*100,random()*100,random()*100,random()*100,random()*100, now());
讀場(chǎng)景
- 壓測(cè)腳本 --
假設(shè)一次讀取約1萬條數(shù)據(jù)進(jìn)行聚合
select min(ts), max(ts) from (
select * from tbl where sid=1 order by ts limit 10000
) t;
min | max
----------------------------+----------------------------
2023-09-01 06:46:33.643741 | 2023-09-01 06:48:33.542128
(1 row)
- 壓測(cè)
pgbench -M prepared -n -r -P 1 -f ./t2.sql -c 10 -j 10 -T 120
transaction type: ./t2.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 77681
latency average = 15.446 ms
latency stddev = 6.741 ms
initial connection time = 22.071 ms
tps = 647.375584 (without initial connection time)
statement latencies in milliseconds:
0.001 \set sid random(1,10000)
15.452 select count(*), min(v1), max(v2), min(v3), max(v4), avg(v5) from tbl where sid=:sid and ts between '2023-09-01 06:46:33.643741
存儲(chǔ)占用
-
\dt+-> 1172MB -
di+-> 620MB -
count-> 14539863
新方法: BRIN索引
- 建表->
只修改索引
create index on tbl1 using brin (sid,ts);
重復(fù)實(shí)驗(yàn)步驟, 對(duì)照
| 場(chǎng)景 | 技術(shù) | 索引占用空間 | 寫入性能 | 按時(shí)間段查詢分析性能(每次分析約1萬條) |
|---|---|---|---|---|
| IOT | 傳統(tǒng)方法使用Btree索引 14539863 條記錄 | 620 MB | 121175 TPS | 647 TPS |
| IOT | 新方法使用BRIN索引 21043922 條記錄 | 120 KB | 175385 TPS | 2592 TPS |
寫在最后
- 合適的業(yè)務(wù)場(chǎng)景選擇合適的存儲(chǔ)方案
- 多進(jìn)行數(shù)據(jù)實(shí)驗(yàn), 積累技術(shù)預(yù)判力
除了 BRIN索引, 還是有更多方案:
- 根據(jù)場(chǎng)景, 基于 sid 分區(qū)
create index on tbl1 using btree (sid,ts);
cluster tbl1 using tbl1_sid_ts_idx1;
drop index tbl1_sid_ts_idx1;
- 使用 pg 的 timescale 擴(kuò)展, 阿里云支持, 也有單獨(dú)的云服務(wù)
- 使用TS類DB, 各廠商都有, 比如開源的 influxdb