OCP4.4 部署EFK-使用local-volume持久化

概述

為了在生產(chǎn)環(huán)境中部署EFK,我們需要準(zhǔn)備好相應(yīng)的資源,如內(nèi)存、持久化、固定ES節(jié)點(diǎn)等。固定節(jié)點(diǎn),安裝官方文檔,使用taint進(jìn)行配置;大部分客戶那里并沒有像ceph rbd這樣的存儲,一般只有nas,但是nas并不能滿足es,在文件系統(tǒng)和性能都不滿足,而且es要求使用storageclass,那么從性能的角度來看,使用local-volume是比較合適的了。

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

部署 Local-volume storageclass

  • 創(chuàng)建local-storage項(xiàng)目
oc new-project local-storage
  • 安裝Local Storage operator
Operators → OperatorHub → Local Storage Operator → Click Install → 選擇 local-storage namespace →
點(diǎn)擊 Subscribe.
  • 查看pod狀態(tài)
# oc -n local-storage get pods
NAME                                      READY   STATUS    RESTARTS   AGE
local-storage-operator-7cd4799b4b-6bzg4   1/1     Running   0          12h
  • 給3個es節(jié)點(diǎn)加一塊盤(我這里是sdb 50G,建議200G),然后創(chuàng)建 localvolume.yaml:

通過指定 nodeSelector 選擇 es 節(jié)點(diǎn),配置指定硬盤設(shè)備和文件系統(tǒng)以及 storageClass。

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "local-storage" 
spec:
  nodeSelector: 
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker02.ocp44.cluster1.com
          - worker03.ocp44.cluster1.com
          - worker04.ocp44.cluster1.com
  storageClassDevices:
    - storageClassName: "local-sc"
      volumeMode: Filesystem 
      fsType: xfs 
      devicePaths: 
        - /dev/sdb
  • 創(chuàng)建
oc create -f localvolume.yaml
  • 檢查pod
# oc get all -n local-storage
NAME                                          READY   STATUS    RESTARTS   AGE
pod/local-disks-local-diskmaker-7p448         1/1     Running   0          43m
pod/local-disks-local-diskmaker-grkjx         1/1     Running   0          43m
pod/local-disks-local-diskmaker-lmknj         1/1     Running   0          43m
pod/local-disks-local-provisioner-5s9nk       1/1     Running   0          43m
pod/local-disks-local-provisioner-hv42l       1/1     Running   0          43m
pod/local-disks-local-provisioner-tzlkt       1/1     Running   0          43m
pod/local-storage-operator-7cd4799b4b-6bzg4   1/1     Running   0          12h

NAME                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)     AGE
service/local-storage-operator   ClusterIP   172.30.93.34   <none>        60000/TCP   12h

NAME                                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/local-disks-local-diskmaker     3         3         3       3            3           <none>          11h
daemonset.apps/local-disks-local-provisioner   3         3         3       3            3           <none>          11h

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/local-storage-operator   1/1     1            1           12h

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/local-storage-operator-7cd4799b4b   1         1         1       12h
  • 查看pv
# oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
local-pv-2337578c   50Gi       RWO            Delete           Available           local-sc                4m42s
local-pv-77162aba   50Gi       RWO            Delete           Available           local-sc                4m38s
local-pv-cc7b7951   50Gi       RWO            Delete           Available           local-sc                4m46s
  • pv 內(nèi)容
oc get pv local-pv-2337578c -oyaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: local-volume-provisioner-worker02.ocp44.cluster1.com-e1f9a639-6872-43d7-b53c-d6255b3d7976
  creationTimestamp: "2020-05-25T15:29:46Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    storage.openshift.com/local-volume-owner-name: local-disks
    storage.openshift.com/local-volume-owner-namespace: local-storage
  name: local-pv-2337578c
  resourceVersion: "5661501"
  selfLink: /api/v1/persistentvolumes/local-pv-2337578c
  uid: 7f72ebb4-7212-4f0f-9f1a-d0af103ed70e
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  local:
    fsType: xfs
    path: /mnt/local-storage/local-sc/sdb
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker02.ocp44.cluster1.com
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-sc
  volumeMode: Filesystem
status:
  phase: Available
  • 查看storageclass
# oc get sc
NAME                                PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-sc                            kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  11h
  • 查看storageclass內(nèi)容
# oc get sc local-sc -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2020-05-25T04:09:31Z"
  labels:
    local.storage.openshift.io/owner-name: local-disks
    local.storage.openshift.io/owner-namespace: local-storage
  name: local-sc
  resourceVersion: "5273371"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/local-sc
  uid: 0c625dad-3879-43b1-9b0a-f0606de91e5a
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

部署 Elasticsearch Operator

Operators → OperatorHub → Elasticsearch Operator → 點(diǎn)擊 Install → Installation Mode 選擇 All namespaces → Installed Namespace 選擇 openshift-operators-redhat → 選擇 Enable operator recommended cluster monitoring on this namespace → 選擇一個 Update Channel and Approval Strategy → 點(diǎn)擊 Subscribe → 驗(yàn)證 Operators → Installed Operators page → 確認(rèn) Elasticsearch Operator 的狀態(tài)是 Succeeded.

部署 Cluster Logging Operator

Operators → OperatorHub → Cluster Logging Operators → 點(diǎn)擊 Install → Installation Mode 選擇 specific namespace on the cluster → Installed Namespace 選擇 openshift-logging → 選擇 Enable operator recommended cluster monitoring on this namespace → 選擇一個 Update Channel and Approval Strategy → Subscribe → 去Installed Operators驗(yàn)證狀態(tài) → 去 Workloads → Pods 查看狀態(tài)

安裝EFK

Administration → Custom Resource Definitions → Custom Resource Definitions → ClusterLogging → Custom Resource Definition Overview page → Instances → click Create ClusterLogging,使用以下內(nèi)容:

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance" 
  namespace: "openshift-logging"
spec:
  managementState: "Managed"  
  logStore:
    type: "elasticsearch"  
    elasticsearch:
      nodeCount: 3
      storage:
        storageClassName: local-sc 
        size: 48G
      resources: 
        limits:
          cpu: "4"
          memory: "16Gi"
        requests:
          cpu: "4"
          memory: "16Gi"
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"  
    kibana:
      replicas: 1
  curation:
    type: "curator"  
    curator:
      schedule: "30 3 * * *"
  collection:
    logs:
      type: "fluentd"  
      fluentd: {}

es這里主要配置一下節(jié)點(diǎn)數(shù)量、sc名稱、存儲大小、資源配額(內(nèi)存盡量大些),三節(jié)點(diǎn)下,副本模式,除了主分片,一個副本就夠了,否則存儲會占用很大,看具體情況了。curator這里配置每天3:30做一次清理,默認(rèn)是清理30天以前的數(shù)據(jù),具體可以配置某些索引或者某些項(xiàng)目索引:https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-curator.html

補(bǔ)充說明

EFK固定節(jié)點(diǎn)

EFK可以通過設(shè)置 taint/tolerations 或者 nodeSelector來控制節(jié)點(diǎn)運(yùn)行在什么節(jié)點(diǎn),但是我這里通過使用local-volume已經(jīng)實(shí)現(xiàn)了節(jié)點(diǎn)綁定,所以就不需要再進(jìn)行節(jié)點(diǎn)綁定了,使用taint/tolerations有個問題得注意,在給node打上taint后,有些infra pod會被驅(qū)逐,比如dns pod、machine-config-daemon pod,這些pod是沒有tolerations 我們打的taint,但是查了下這些 pod 的operator沒有對應(yīng) tolerations 的配置,雖然可以通過這些pod的ds直接修改,不會被還原,但是這樣的做法還是不標(biāo)準(zhǔn),有可能出問題。

  • tolerations
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: openshift-logging
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 1
      tolerations: 
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 8Gi
        requests:
          cpu: 100m
          memory: 1Gi
      storage: {}
      redundancyPolicy: "ZeroRedundancy"
  visualization:
    type: "kibana"
    kibana:
      tolerations: 
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 100m
          memory: 1Gi
      replicas: 1
  curation:
    type: "curator"
    curator:
      tolerations: 
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 200Mi
        requests:
          cpu: 100m
          memory: 100Mi
      schedule: "*/5 * * * *"
  collection:
    logs:
      type: "fluentd"
      fluentd:
        tolerations: 
        - key: "logging"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 6000
        resources:
          limits:
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 1Gi
  • nodeSelector
apiVersion: logging.openshift.io/v1
kind: ClusterLogging

....

spec:
  collection:
    logs:
      fluentd:
        resources: null
      type: fluentd
  curation:
    curator:
      nodeSelector: 
          node-role.kubernetes.io/infra: ''
      resources: null
      schedule: 30 3 * * *
    type: curator
  logStore:
    elasticsearch:
      nodeCount: 3
      nodeSelector: 
          node-role.kubernetes.io/infra: ''
      redundancyPolicy: SingleRedundancy
      resources:
        limits:
          cpu: 500m
          memory: 16Gi
        requests:
          cpu: 500m
          memory: 16Gi
      storage: {}
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      nodeSelector: 
          node-role.kubernetes.io/infra: '' 
      proxy:
        resources: null
      replicas: 1
      resources: null
    type: kibana

....

補(bǔ)充說明

在固定幾個節(jié)點(diǎn)給ES用后,這些節(jié)點(diǎn)還是有可能會被普通的應(yīng)用 pod 所使用,所以可以給真正的應(yīng)用節(jié)點(diǎn)打上app標(biāo)簽,然后通過給project 模板注入nodeSelector,這樣新建的project就可以使用真正的應(yīng)用節(jié)點(diǎn),不用在deployment之類的配置nodeSelector了。

如果ES使用的是ceph rbd這樣的存儲,那么就需要使用nodeSelector或者taint了,否則es會飄。prometheus同理。

參考鏈接

https://docs.openshift.com/container-platform/4.4/logging/config/cluster-logging-tolerations.html
https://docs.openshift.com/container-platform/4.4/logging/cluster-logging-moving-nodes.html
https://docs.openshift.com/container-platform/4.4/applications/projects/configuring-project-creation.html
https://docs.openshift.com/container-platform/4.4/networking/configuring-networkpolicy.html#nw-networkpolicy-creating-default-networkpolicy-objects-for-a-new-project
https://access.redhat.com/solutions/4946861

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容