基于kubemark的性能測試集群搭建指南

目標版本: kubernetes 1.17.5

集群系統(tǒng): centos 7

注: 不同版本的kubernetes性能測試有差異,該指南不保證在其他版本有效


    前言: kubemark是一個節(jié)點模擬工具,目的是用于測試大規(guī)模集群下kubernetes接口時延和調(diào)度的性能。

我們需要搭建A,B兩個集群,B是我們需要測試的集群,A是跑kubemark負載的集群,
kubemark容器在A中以deployment的形式部署,每個副本都會以節(jié)點的形式注冊到B中,
部署完成后在A中有多少個kubemark的副本,B集群就有多少個node節(jié)點

現(xiàn)假設(shè)已有兩個可用集群A,B

  1. 編譯并打包kubemark鏡像

  2. 下載kurbernetes源碼,版本需要和B集群一致

  3. 編譯kubemark二進制文件

./hack/build-go.sh cmd/kubemark/

cp $GOPATH/src/k8s.io/kubernetes/_output/bin/kubemark $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/

  1. 構(gòu)建kubemark鏡像

cd $GOPATH/src/k8s.io/kubernetes/cluster/images/kubemark/

make build

  1. 創(chuàng)建namespace、configmap、secret、rbac

kubectl create ns kubemark

kubectl create cm node-configmap --from-literal=content.type="" --from-file=kernel.monitor="kernel-monitor.json" -n kubemark (./test/kubemark/resources/kernel-monitor.json)

kubectl create secret generic kubeconfig --type=Opaque --from-file=kubelet.kubeconfig=kubemark.kubeconfig --from-file=kubeproxy.kubeconfig=kubemark.kubeconfig --from-file=npd.kubeconfig=kubemark.kubeconfig --from-file=heapster.kubeconfig=kubemark.kubeconfig --from-file=cluster_autoscaler.kubeconfig=kubemark.kubeconfig --from-file=dns.kubeconfig=kubemark.kubeconfig(kubermark.kubeconfig是B集群的kubeconfig /root/.kube/config)

kubectl apply -f addons/ -n kubemark(./test/kubemark/resources/manifests/addons)

  1. 創(chuàng)建kubemark負載

kubectl apply -f hollow-node_template.yaml -n kubemark

參考配置:

apiVersion: v1

kind: ReplicationController

metadata:

name: hollow-node

labels:

name: hollow-node

spec:

replicas: 200

selector:

name: hollow-node

template:

metadata:

labels:

name: hollow-node

spec:

initContainers:

- name: init-inotify-limit

image: busybox

command: ['sysctl', '-w', 'fs.inotify.max_user_instances=1000']

securityContext:

privileged: true

volumes:

- name: kubeconfig-volume

secret:

secretName: kubeconfig

- name: kernelmonitorconfig-volume

configMap:

name: node-configmap

- name: logs-volume

hostPath:

path: /var/log

- name: no-serviceaccount-access-to-real-master

emptyDir: {}

containers:

- name: hollow-kubelet

image: test.cargo.io/release/kubemark:latest

ports:

- containerPort: 4194

- containerPort: 10250

- containerPort: 10255

env:

- name: CONTENT_TYPE

valueFrom:

configMapKeyRef:

name: node-configmap

key: content.type

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubelet-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

securityContext:

privileged: true

- name: hollow-proxy

image: test.cargo.io/release/kubemark:latest

env:

- name: CONTENT_TYPE

valueFrom:

configMapKeyRef:

name: node-configmap

key: content.type

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /kubemark --morph=proxy --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubeproxy.kubeconfig $(CONTENT_TYPE) --alsologtostderr 1>>/var/log/kubeproxy-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

- name: hollow-node-problem-detector

image: test.cargo.io/release/node-problem-detector:v0.8.0

env:

- name: NODE_NAME

valueFrom:

fieldRef:

fieldPath: metadata.name

command:

- /bin/sh

- -c

- /node-problem-detector --system-log-monitors=/config/kernel.monitor --apiserver-override="https://192.168.0.16:6443?inClusterConfig=false&auth=/kubeconfig/npd.kubeconfig" --alsologtostderr 1>>/var/log/npd-$(NODE_NAME).log 2>&1

volumeMounts:

- name: kubeconfig-volume

mountPath: /kubeconfig

readOnly: true

- name: kernelmonitorconfig-volume

mountPath: /config

readOnly: true

- name: no-serviceaccount-access-to-real-master

mountPath: /var/run/secrets/kubernetes.io/serviceaccount

readOnly: true

- name: logs-volume

mountPath: /var/log

resources:

requests:

cpu: 20m

memory: 50M

securityContext:

privileged: true

  1. 運行e2e性能用例

這里有一個大坑,網(wǎng)上的搜到的文章大多數(shù)都是編譯e2e的二進制文件直接運行

make WHAT="test/e2e/e2e.test"

./e2e.test --kube-master=192.168.0.16 --host=https://192.168.0.16:6443 --ginkgo.focus="\[Performance\]" --provider=local --kubeconfig=kubemark.kubeconfig --num-nodes=10 --v=3 --ginkgo.failFast --e2e-output-dir=. --report-dir=.

但其實e2e的性能用例已經(jīng)被移出主庫了 https://github.com/kubernetes/kubernetes/pull/83322,所以在2019.10.1之后出的版本用上面的命令是無法運行性能測試的

  1. 利用perf-tests庫進行性能測試(切換到B集群的master節(jié)點運行)

  2. 下載對應(yīng)kubernetes版本的perf-tests源碼 https://github.com/kubernetes/perf-tests

  3. 運行測試命令(節(jié)點數(shù)需要>=100,否則需要改動job.yaml里面的參數(shù))

./run-e2e.sh --testconfig=job.yaml --kubeconfig=config.yaml --provider=local --masterip=192.168.0.16,192.168.0.23,192.168.0.40 --mastername=kube-master-1,kube-master-2,kube-master-3 --master-internal-ip=192.168.0.16,192.168.0.23,192.168.0.40 --enable-prometheus-server --tear-down-prometheus-server=false

問題調(diào)試:

  • 如果targets的端口需要修改,直接編輯service并修改相應(yīng)的endpoints

  • 收集etcd-metrics的時候會報錯: etcdmetrics: failed to collect etcd database size,是由于腳本直接通過2379無法拉取數(shù)據(jù),需要修改源碼:

    perf-tests/clusterloader2/pkg/measurement/common/etcd_metrics.go "curl http://localhost:2379/metrics" -> "curl -L https://localhost:2379/metrics --key /etc/kubernetes/etcd/etcd.key --cert /etc/kubernetes/etcd/etcd.crt --insecure""

  • prometheus的target列表中有幾個接口可能沒法直接收集數(shù)據(jù),會報401的權(quán)限問題,解決辦法如下

    編輯對應(yīng)的servicemonitor,在endpoints字段里加上bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

spec:

endpoints:

- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

interval: 5s

port: apiserver

scheme: https

tlsConfig:

insecureSkipVerify: true

  kubelet加上--authentication-token-webhook=true,--authorization-mode=Webhook并重啟
  把system:kubelet-api-admin的ClusterRole綁定到prometheus-k8s的ServiceAccount上面

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

name: prometheus-k8s-1

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

name: system:kubelet-api-admin

subjects:

- kind: ServiceAccount

name: prometheus-k8s

namespace: monitoring

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容