最近在做大數(shù)據(jù)監(jiān)控平臺(tái)的方案調(diào)研,做了一些開(kāi)源解決方案的嘗試,今天分享一下基于Telegraf+InfluxDB+Grafana的監(jiān)控平臺(tái)整體部署過(guò)程。文章開(kāi)始會(huì)簡(jiǎn)單介紹下 TICK 技術(shù)棧,接下來(lái)就是本次方案各個(gè)組件的安裝部署了。希望對(duì)正在調(diào)研大數(shù)據(jù)監(jiān)控平臺(tái)或?qū)ΡO(jiān)控系統(tǒng)感興趣的同學(xué)有所幫助。
我們知道這種監(jiān)控平臺(tái)的數(shù)據(jù)特征一般都是時(shí)間序列數(shù)據(jù)(簡(jiǎn)稱(chēng) 時(shí)序數(shù)據(jù)),那么相應(yīng)的這些數(shù)據(jù)最好是存儲(chǔ)在時(shí)序數(shù)據(jù)庫(kù)中,目前主流的時(shí)序數(shù)據(jù)庫(kù)有InfluxDB、OpenTSDB、Graphite、TimescaleDB等。其中,InfluxDB是目前監(jiān)控領(lǐng)域使用較多的時(shí)序數(shù)據(jù)庫(kù),并且基于InfluxDB有一套完善的開(kāi)源解決方案 —— TICK Stack,如下圖所示:

TICK Stack 是 InfluxData 公司提供的包括采集、存儲(chǔ)、展示及監(jiān)控告警在內(nèi)的一體化解決方案,包含以下 4 個(gè)核心組件:
- Telegraf:Time-Series Data Collector
- InfluxDB:Time-Series Data Storage
- Chronograf:Time-Series Data Visualization
- Kapacitor:Time-Series Data Processing
今天我們選用 TICK Stack 中的 Telegraf 與 InfluxDB,配合另一個(gè)常用的數(shù)據(jù)可視化組件 Grafana,即前文所說(shuō)的 Telegraf+InfluxDB+Grafana,實(shí)現(xiàn)對(duì)我們大數(shù)據(jù)平臺(tái)的基礎(chǔ)指標(biāo)監(jiān)控,包括但不限于CPU/Mem/Net/Disk/Diskio等。接下來(lái)主要介紹下各個(gè)組件的安裝部署,請(qǐng)閱讀下文
一、InfluxDB
InfluxDB是目前IoT監(jiān)控、DevOps監(jiān)控等領(lǐng)域最主流的開(kāi)源時(shí)序數(shù)據(jù)庫(kù),屬于TICK Stack的核心組件。
優(yōu)點(diǎn):Go語(yǔ)言編寫(xiě),沒(méi)有任何第三方依賴(lài)。
1 安裝influxdb
# wget https://dl.influxdata.com/influxdb/releases/influxdb-1.7.7.x86_64.rpm
# yum install -y influxdb-1.7.7.x86_64.rpm
2 啟動(dòng)influxdb
# systemctl start influxdb
3 操作influxdb
下面演示創(chuàng)建一個(gè)名為“telegraf”的數(shù)據(jù)庫(kù),及名為“telegraf”的普通用戶(hù)、“admin”的管理員用戶(hù):
# influx
Connected to http://localhost:8086 version 1.7.7
InfluxDB shell version: 1.7.7
> create database telegraf
> show databases
name: databases
name
----
_internal
telegraf
> create user "admin" with password 'admin' with all privileges
> create user "telegraf" with password 'telegraf'
> show users;
user admin
---- -----
telegraf false
admin true
> exit
4 查看influxdb配置
# more /etc/influxdb/influxdb.conf
...
[data]
# The directory where the TSM storage engine stores TSM files.
dir = "/var/lib/influxdb/data"
# The directory where the TSM storage engine stores WAL files.
wal-dir = "/var/lib/influxdb/wal"
...
二、Telegraf
Telegraf 是一個(gè)插件驅(qū)動(dòng)的輕量級(jí)數(shù)據(jù)采集工具,用于收集系統(tǒng)和服務(wù)的各項(xiàng)指標(biāo)。支持多種輸入與輸出插件,其中輸入端支持直接獲取操作系統(tǒng)的各項(xiàng)指標(biāo)數(shù)據(jù),從第三方API獲取指標(biāo)數(shù)據(jù),甚至可以通過(guò)statsd和Kafka獲取指標(biāo)數(shù)據(jù);輸出端可以將采集的指標(biāo)發(fā)送到各種數(shù)據(jù)存儲(chǔ),服務(wù)或消息隊(duì)列中,支持InfluxDB,Graphite,OpenTSDB,Datadog,Librato,Kafka,MQTT等。
優(yōu)點(diǎn):Go語(yǔ)言編寫(xiě),沒(méi)有任何第三方依賴(lài)。
1 安裝Telegraf
# wget https://dl.influxdata.com/telegraf/releases/telegraf-1.11.2-1.x86_64.rpm
# yum install -y telegraf-1.11.2-1.x86_64.rpm
2 配置Telegraf,這里修改outputs.influxdb的配置項(xiàng)
# vi /etc/telegraf/telegraf.conf
[[outputs.influxdb]]
## The full HTTP or UDP URL for your InfluxDB instance.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
# urls = ["unix:///var/run/influxdb.sock"]
# urls = ["udp://127.0.0.1:8089"]
urls = ["http://127.0.0.1:8086"]
## The target database for metrics; will be created as needed.
## For UDP url endpoint database needs to be configured on server side.
database = "telegraf"
## The value of this tag will be used to determine the database. If this
## tag is not set the 'database' option is used as the default.
# database_tag = ""
## If true, no CREATE DATABASE queries will be sent. Set to true when using
## Telegraf with a user without permissions to create databases or when the
## database already exists.
# skip_database_creation = false
## Name of existing retention policy to write to. Empty string writes to
## the default retention policy. Only takes effect when using HTTP.
# retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all".
## Only takes effect when using HTTP.
# write_consistency = "any"
## Timeout for HTTP messages.
timeout = "5s"
## HTTP Basic Auth
username = "telegraf"
password = "telegraf"
3 啟動(dòng)Telegraf
# systemctl start telegraf
4 查看influxdb數(shù)據(jù)
# influx
> use telegraf
Using database telegraf
> show measurements
name: measurements
name
----
cpu
disk
diskio
kernel
mem
processes
swap
system
> exit
注意:influxdb 自1.2版本之后關(guān)閉了自帶的 web 界面,安裝之前的方式訪問(wèn) web 界面將會(huì)報(bào) "404 page not found",如果想用 web 界面訪問(wèn)influxdb,建議使用第三方工具,或者使用低版本influxdb的web界面訪問(wèn)。
三、Grafana
Grafana是目前比較流行的開(kāi)源可視化組件,支持多種數(shù)據(jù)源,包括InfluxDB、OpenTSDB、Graphite、Prometheus、Elasticsearch等主流的時(shí)序數(shù)據(jù)庫(kù),以及MySQL、PostgreSQL等關(guān)系數(shù)據(jù)庫(kù)等。
優(yōu)點(diǎn):Go語(yǔ)言編寫(xiě),自帶用戶(hù)管理、告警等功能。
1 安裝Grafana
# wget https://dl.grafana.com/oss/release/grafana-6.2.5-1.x86_64.rpm
# yum install -y grafana-6.2.5-1.x86_64.rpm
2 啟動(dòng)Grafana
# systemctl start grafana-server
3 訪問(wèn)Grafana
Grafana的默認(rèn)http端口為3000,默認(rèn)管理員用戶(hù)密碼為admin/admin,因此訪問(wèn)Grafana只需訪問(wèn) http://IP:3000 即可,初始訪問(wèn)的時(shí)候會(huì)提示修改密碼。首頁(yè)如下:


4 查看Grafana配置
# more /etc/grafana/grafana.ini
...
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
;data = /var/lib/grafana
# Temporary files in `data` directory older than given duration will be removed
;temp_data_lifetime = 24h
# Directory where grafana can store logs
;logs = /var/log/grafana
...
# The http port to use
;http_port = 3000
...
5 界面配置Grafana訪問(wèn)influxdb
進(jìn)入Grafana界面后,首先是添加數(shù)據(jù)源:Data Sources --> Add data source,這里選擇influxdb作為數(shù)據(jù)源;然后是新建可視化面板:Dashboards --> Manage --> New dashboard,簡(jiǎn)單配置展示項(xiàng)后數(shù)據(jù)就可以展示出來(lái)了。頁(yè)面操作比較簡(jiǎn)單,具體細(xì)節(jié)不多贅述,自行進(jìn)一步熟悉Grafana界面即可。



至此,我們演示了相關(guān)組件的安裝部署與基本使用,成功展示了采集的指標(biāo)數(shù)據(jù)。本文介紹了 TICK Stack,以及基于 Telegraf+InfluxDB+Grafana的監(jiān)控平臺(tái)搭建。
往期文章精選
◆干貨 | Elasticsearch 索引設(shè)計(jì)實(shí)戰(zhàn)指南
◆超越數(shù)據(jù)湖和數(shù)據(jù)倉(cāng)庫(kù)的新范式:LakeHouse
◆HBase 集成 Phoenix 構(gòu)建二級(jí)索引實(shí)踐
◆貝殼找房基于 Flink 的實(shí)時(shí)平臺(tái)建設(shè)
如果您喜歡這篇文章,點(diǎn)【在看】與轉(zhuǎn)發(fā)都是一種鼓勵(lì),期待得到您的認(rèn)可 ?(^_-)