概述
為了在生產(chǎn)環(huán)境中部署EFK,我們需要準(zhǔn)備好相應(yīng)的資源,如內(nèi)存、持久化、固定ES節(jié)點(diǎn)等。固定節(jié)點(diǎn),安裝官方文檔,使用taint進(jìn)行配置;大部分客戶那里并沒有像ceph rbd這樣的存儲,一般只有nas,但是nas并不能滿足es,在文件系統(tǒng)和性能都不滿足,而且es要求使用storageclass,那么從性能的角度來看,使用local-volume是比較合適的了。
Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.
部署 Local-volume storageclass
- 創(chuàng)建local-storage項(xiàng)目
oc new-project local-storage
- 安裝Local Storage operator
Operators → OperatorHub → Local Storage Operator → Click Install → 選擇 local-storage namespace →
點(diǎn)擊 Subscribe.
- 查看pod狀態(tài)
# oc -n local-storage get pods
NAME READY STATUS RESTARTS AGE
local-storage-operator-7cd4799b4b-6bzg4 1/1 Running 0 12h
- 給3個es節(jié)點(diǎn)加一塊盤(我這里是sdb 50G,建議200G),然后創(chuàng)建 localvolume.yaml:
通過指定 nodeSelector 選擇 es 節(jié)點(diǎn),配置指定硬盤設(shè)備和文件系統(tǒng)以及 storageClass。
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
name: "local-disks"
namespace: "local-storage"
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker02.ocp44.cluster1.com
- worker03.ocp44.cluster1.com
- worker04.ocp44.cluster1.com
storageClassDevices:
- storageClassName: "local-sc"
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sdb
- 創(chuàng)建
oc create -f localvolume.yaml
- 檢查pod
# oc get all -n local-storage
NAME READY STATUS RESTARTS AGE
pod/local-disks-local-diskmaker-7p448 1/1 Running 0 43m
pod/local-disks-local-diskmaker-grkjx 1/1 Running 0 43m
pod/local-disks-local-diskmaker-lmknj 1/1 Running 0 43m
pod/local-disks-local-provisioner-5s9nk 1/1 Running 0 43m
pod/local-disks-local-provisioner-hv42l 1/1 Running 0 43m
pod/local-disks-local-provisioner-tzlkt 1/1 Running 0 43m
pod/local-storage-operator-7cd4799b4b-6bzg4 1/1 Running 0 12h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/local-storage-operator ClusterIP 172.30.93.34 <none> 60000/TCP 12h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/local-disks-local-diskmaker 3 3 3 3 3 <none> 11h
daemonset.apps/local-disks-local-provisioner 3 3 3 3 3 <none> 11h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/local-storage-operator 1/1 1 1 12h
NAME DESIRED CURRENT READY AGE
replicaset.apps/local-storage-operator-7cd4799b4b 1 1 1 12h
- 查看pv
# oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-2337578c 50Gi RWO Delete Available local-sc 4m42s
local-pv-77162aba 50Gi RWO Delete Available local-sc 4m38s
local-pv-cc7b7951 50Gi RWO Delete Available local-sc 4m46s
- pv 內(nèi)容
oc get pv local-pv-2337578c -oyaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker02.ocp44.cluster1.com-e1f9a639-6872-43d7-b53c-d6255b3d7976
creationTimestamp: "2020-05-25T15:29:46Z"
finalizers:
- kubernetes.io/pv-protection
labels:
storage.openshift.com/local-volume-owner-name: local-disks
storage.openshift.com/local-volume-owner-namespace: local-storage
name: local-pv-2337578c
resourceVersion: "5661501"
selfLink: /api/v1/persistentvolumes/local-pv-2337578c
uid: 7f72ebb4-7212-4f0f-9f1a-d0af103ed70e
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
local:
fsType: xfs
path: /mnt/local-storage/local-sc/sdb
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker02.ocp44.cluster1.com
persistentVolumeReclaimPolicy: Delete
storageClassName: local-sc
volumeMode: Filesystem
status:
phase: Available
- 查看storageclass
# oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-sc kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 11h
- 查看storageclass內(nèi)容
# oc get sc local-sc -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2020-05-25T04:09:31Z"
labels:
local.storage.openshift.io/owner-name: local-disks
local.storage.openshift.io/owner-namespace: local-storage
name: local-sc
resourceVersion: "5273371"
selfLink: /apis/storage.k8s.io/v1/storageclasses/local-sc
uid: 0c625dad-3879-43b1-9b0a-f0606de91e5a
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
部署 Elasticsearch Operator
Operators → OperatorHub → Elasticsearch Operator → 點(diǎn)擊 Install → Installation Mode 選擇 All namespaces → Installed Namespace 選擇 openshift-operators-redhat → 選擇 Enable operator recommended cluster monitoring on this namespace → 選擇一個 Update Channel and Approval Strategy → 點(diǎn)擊 Subscribe → 驗(yàn)證 Operators → Installed Operators page → 確認(rèn) Elasticsearch Operator 的狀態(tài)是 Succeeded.
部署 Cluster Logging Operator
Operators → OperatorHub → Cluster Logging Operators → 點(diǎn)擊 Install → Installation Mode 選擇 specific namespace on the cluster → Installed Namespace 選擇 openshift-logging → 選擇 Enable operator recommended cluster monitoring on this namespace → 選擇一個 Update Channel and Approval Strategy → Subscribe → 去Installed Operators驗(yàn)證狀態(tài) → 去 Workloads → Pods 查看狀態(tài)
安裝EFK
Administration → Custom Resource Definitions → Custom Resource Definitions → ClusterLogging → Custom Resource Definition Overview page → Instances → click Create ClusterLogging,使用以下內(nèi)容:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: "openshift-logging"
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 3
storage:
storageClassName: local-sc
size: 48G
resources:
limits:
cpu: "4"
memory: "16Gi"
requests:
cpu: "4"
memory: "16Gi"
redundancyPolicy: "SingleRedundancy"
visualization:
type: "kibana"
kibana:
replicas: 1
curation:
type: "curator"
curator:
schedule: "30 3 * * *"
collection:
logs:
type: "fluentd"
fluentd: {}
es這里主要配置一下節(jié)點(diǎn)數(shù)量、sc名稱、存儲大小、資源配額(內(nèi)存盡量大些),三節(jié)點(diǎn)下,副本模式,除了主分片,一個副本就夠了,否則存儲會占用很大,看具體情況了。curator這里配置每天3:30做一次清理,默認(rèn)是清理30天以前的數(shù)據(jù),具體可以配置某些索引或者某些項(xiàng)目索引:https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-curator.html
補(bǔ)充說明
EFK固定節(jié)點(diǎn)
EFK可以通過設(shè)置 taint/tolerations 或者 nodeSelector來控制節(jié)點(diǎn)運(yùn)行在什么節(jié)點(diǎn),但是我這里通過使用local-volume已經(jīng)實(shí)現(xiàn)了節(jié)點(diǎn)綁定,所以就不需要再進(jìn)行節(jié)點(diǎn)綁定了,使用taint/tolerations有個問題得注意,在給node打上taint后,有些infra pod會被驅(qū)逐,比如dns pod、machine-config-daemon pod,這些pod是沒有tolerations 我們打的taint,但是查了下這些 pod 的operator沒有對應(yīng) tolerations 的配置,雖然可以通過這些pod的ds直接修改,不會被還原,但是這樣的做法還是不標(biāo)準(zhǔn),有可能出問題。
- tolerations
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
name: "instance"
namespace: openshift-logging
spec:
managementState: "Managed"
logStore:
type: "elasticsearch"
elasticsearch:
nodeCount: 1
tolerations:
- key: "logging"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
resources:
limits:
memory: 8Gi
requests:
cpu: 100m
memory: 1Gi
storage: {}
redundancyPolicy: "ZeroRedundancy"
visualization:
type: "kibana"
kibana:
tolerations:
- key: "logging"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
resources:
limits:
memory: 2Gi
requests:
cpu: 100m
memory: 1Gi
replicas: 1
curation:
type: "curator"
curator:
tolerations:
- key: "logging"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
schedule: "*/5 * * * *"
collection:
logs:
type: "fluentd"
fluentd:
tolerations:
- key: "logging"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
resources:
limits:
memory: 2Gi
requests:
cpu: 100m
memory: 1Gi
- nodeSelector
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
....
spec:
collection:
logs:
fluentd:
resources: null
type: fluentd
curation:
curator:
nodeSelector:
node-role.kubernetes.io/infra: ''
resources: null
schedule: 30 3 * * *
type: curator
logStore:
elasticsearch:
nodeCount: 3
nodeSelector:
node-role.kubernetes.io/infra: ''
redundancyPolicy: SingleRedundancy
resources:
limits:
cpu: 500m
memory: 16Gi
requests:
cpu: 500m
memory: 16Gi
storage: {}
type: elasticsearch
managementState: Managed
visualization:
kibana:
nodeSelector:
node-role.kubernetes.io/infra: ''
proxy:
resources: null
replicas: 1
resources: null
type: kibana
....
補(bǔ)充說明
在固定幾個節(jié)點(diǎn)給ES用后,這些節(jié)點(diǎn)還是有可能會被普通的應(yīng)用 pod 所使用,所以可以給真正的應(yīng)用節(jié)點(diǎn)打上app標(biāo)簽,然后通過給project 模板注入nodeSelector,這樣新建的project就可以使用真正的應(yīng)用節(jié)點(diǎn),不用在deployment之類的配置nodeSelector了。
如果ES使用的是ceph rbd這樣的存儲,那么就需要使用nodeSelector或者taint了,否則es會飄。prometheus同理。
參考鏈接
https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-tolerations.html
https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-moving-nodes.html
https://docs.openshift.com/container-platform/4.4/applications/projects/configuring-project-creation.html
https://docs.openshift.com/container-platform/4.4/networking/configuring-networkpolicy.html#nw-networkpolicy-creating-default-networkpolicy-objects-for-a-new-project
https://access.redhat.com/solutions/4946861