本篇使用StorageClass来持久化数据,搭建Statefulset的Prometheus联邦集群,对于数据持久化,方案众多,如Thanos、M3DB、InfluxDB、VictorMetric等,根据自己的需求​进行选择,后面会详细讲解针对数据持久化的具体细节。

部署一个对外可以访问的Prometheus,首先要创建Prometheus所在的Namespace,然后在创建Prometheus使用的RBAC规则,创建Prometheus的 ConfigMap 来保存配置文件。创建SVC绑定固定集群IP,创建Statefulset有状态的Prometheus容器的Pod,最后创建Ingress 实现外部域名访问Prometheus。

如果Kubernetes版本比较旧的话,为了便于测试,可以进行升级一下,使用 sealos 自动部署工具快速一键部署高可用集群,对于是否使用kuboard,针对自己需求去部署。

环境

我的本地环境使用的 sealos 一键部署,主要是为了便于测试。

Ubuntu 18.041.17.7sealos-k8s-m1192.168.1.151node-exporter prometheus-federate-0
Ubuntu 18.041.17.7sealos-k8s-m2192.168.1.152node-exporter grafana alertmanager-0
Ubuntu 18.041.17.7sealos-k8s-m3192.168.1.150node-exporter alertmanager-1
Ubuntu 18.041.17.7sealos-k8s-node1192.168.1.153node-exporter prometheus-0 kube-state-metrics
Ubuntu 18.041.17.7sealos-k8s-node2192.168.1.154node-exporter prometheus-1
Ubuntu 18.041.17.7sealos-k8s-node2192.168.1.155node-exporter prometheus-2
# 给master跟node加标签
# prometheus
kubectl label node sealos-k8s-node1 k8s-app=prometheus
kubectl label node sealos-k8s-node2 k8s-app=prometheus
kubectl label node sealos-k8s-node3 k8s-app=prometheus
# federate
kubectl label node sealos-k8s-m1 k8s-app=prometheus-federate
# alertmanager
kubectl label node sealos-k8s-m2 k8s-app=alertmanager
kubectl label node sealos-k8s-m3 k8s-app=alertmanager

#创建对应的部署目录
mkdir /data/manual-deploy/ && cd /data/manual-deploy/
mkdir alertmanager  grafana  ingress-nginx  kube-state-metrics  node-exporter  prometheus

部署 Prometheus

创建Prometheus的storageclass配置文件

cat prometheus-data-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-lpv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

创建Prometheus的sc的pv配置文件,同时指定了调度节点。

# 在需要调度的Prometheus的node上创建目录与赋权
mkdir /data/prometheus
chown -R 65534:65534 /data/prometheus

cat prometheus-federate-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node1
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-1
spec:
  capacity:
    storage: 20Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node2
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-lpv-2
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: prometheus-lpv
  local:
    path: /data/prometheus
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-node3

创建Prometheus的RBAC文件。

cat prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1 # api的version
kind: ClusterRole # 类型
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources: # 资源
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus # 自定义名字
  namespace: kube-system # 命名空间
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef: # 选择需要绑定的Role
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects: # 对象
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system

创建Prometheus的configmap配置文件。

cat prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-system
data:
  prometheus.yml: |
    global:
      scrape_interval:     30s
      evaluation_interval: 30s
      external_labels:
        cluster: "01"
    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
    - job_name: 'kubernetes-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      metric_relabel_configs:
      - action: replace
        source_labels: [id]
        regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
        target_label: rkt_container_name
        replacement: '${2}-${1}'
      - action: replace
        source_labels: [id]
        regex: '^/system\.slice/(.+)\.service$'
        target_label: systemd_service_name
        replacement: '${1}'
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
      - source_labels: [__address__]
        action: replace
        target_label: instance
        regex: (.+):(.+)
        replacement: $1

创建Prometheus的Statefulset配置文件。

cat prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: kube-system
  labels:
    k8s-app: prometheus
    kubernetes.io/cluster-service: "true"
spec:
  serviceName: "prometheus"
  podManagementPolicy: "Parallel"
  replicas: 3
  selector:
    matchLabels:
      k8s-app: prometheus
  template:
    metadata:
      labels:
        k8s-app: prometheus
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - prometheus
            topologyKey: "kubernetes.io/hostname"
      priorityClassName: system-cluster-critical
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: prometheus-server-configmap-reload
        image: "jimmidyson/configmap-reload:v0.4.0"
        imagePullPolicy: "IfNotPresent"
        args:
          - --volume-dir=/etc/config
          - --webhook-url=http://localhost:9090/-/reload
        volumeMounts:
          - name: config-volume
            mountPath: /etc/config
            readOnly: true
        resources:
          limits:
            cpu: 10m
            memory: 10Mi
          requests:
            cpu: 10m
            memory: 10Mi
      - image: prom/prometheus:v2.20.0
        imagePullPolicy: IfNotPresent
        name: prometheus
        command:
          - "/bin/prometheus"
        args:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
          - "--storage.tsdb.retention=24h"
          - "--web.console.libraries=/etc/prometheus/console_libraries"
          - "--web.console.templates=/etc/prometheus/consoles"
          - "--web.enable-lifecycle"
        ports:
          - containerPort: 9090
            protocol: TCP
        volumeMounts:
          - mountPath: "/prometheus"
            name: prometheus-data
          - mountPath: "/etc/prometheus"
            name: config-volume
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 9090
          initialDelaySeconds: 30
          timeoutSeconds: 30
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: 9090
          initialDelaySeconds: 30
          timeoutSeconds: 30
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 1000m
            memory: 2500Mi
        securityContext:
            runAsUser: 65534
            privileged: true
      serviceAccountName: prometheus
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
  volumeClaimTemplates:
    - metadata:
        name: prometheus-data
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "prometheus-lpv"
        resources:
          requests:
            storage: 5Gi

创建Prometheus的svc配置文件

cat prometheus-service-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: kube-system
spec:
  ports:
    - name: prometheus
      port: 9090
      targetPort: 9090
  selector:
    k8s-app: prometheus
  clusterIP: None

部署创建好的Prometheus的相关资源文件

cd /data/manual-deploy/prometheus
ls
prometheus-configmap.yaml # Configmap
prometheus-data-pv.yaml # PVC
prometheus-data-storageclass.yaml # SC
prometheus-rbac.yaml # RBAC
prometheus-service-statefulset.yaml # SVC
prometheus-statefulset.yaml # Statefulset
# 部署应用
kubectl apply -f .

验证已经部署的Prometheus的pv与pvc的绑定关系以及部署状态

kubectl get pv
NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS     REASON   AGE
prometheus-lpv-0   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-1   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
prometheus-lpv-2   10Gi       RWO            Retain           Available           prometheus-lpv            6m28s
kubectl -n kube-system get pvc
NAME                           STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS     AGE
prometheus-data-prometheus-0   Bound    prometheus-lpv-0   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-1   Bound    prometheus-lpv-2   10Gi       RWO            prometheus-lpv   2m16s
prometheus-data-prometheus-2   Bound    prometheus-lpv-1   10Gi       RWO            prometheus-lpv   2m16s

kubectl -n kube-system get pod prometheus-{0..2}
NAME           READY   STATUS    RESTARTS   AGE
prometheus-0   2/2     Running   0          3m16s
prometheus-1   2/2     Running   0          3m16s
prometheus-2   2/2     Running   0          3m16s

部署 Node Exporter

创建Demonset的node-exporter文件

cd /data/manual-deploy/node-exporter/
cat node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: quay.io/prometheus/node-exporter:v1.0.0
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9100
    protocol: TCP
  selector:
    k8s-app: node-exporter

部署

cd /data/manual-deploy/node-exporter/
kubectl apply -f node-exporter.yaml

验证状态

kubectl -n kube-system get pod |grep node-exporter
node-exporter-45s2q                    2/2     Running   0          6h43m
node-exporter-f4rrw                    2/2     Running   0          6h43m
node-exporter-hvtzj                    2/2     Running   0          6h43m
node-exporter-nlvfq                    2/2     Running   0          6h43m
node-exporter-qbd2q                    2/2     Running   0          6h43m
node-exporter-zjrh4                    2/2     Running   0          6h43m

部署 kube-state-metrics

kubelet已经集成了cAdvisor已知可以收集系统级别的CPU、Memory、Network、Disk、Container等指标信息,但是却不能采集到Kubernetes的资源对象的指标信息,如:Pod的数量以及状态等等。因此我们需要kube-state-metrics,来帮助我们完成这些采集操作。

kube-state-metrics是通过轮询的方式对Kubernetes API进行操作,然后返回有关资源对象指标的Metrics信息:CronJob、DaemonSet、Deployment、Job、LimitRange、Node、PersistentVolume 、PersistentVolumeClaim、 Pod、Pod Disruption Budget、ReplicaSet、ReplicationController、ResourceQuota、Service、StatefulSet、Namespace、Horizontal Pod Autoscaler、Endpoint、Secret、ConfigMap、Ingress、CertificateSigningRequest

cd /data/manual-deploy/kube-state-metrics/
cat kube-state-metrics-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: kube-system
  name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["apps"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - ingresses
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
- apiGroups: ["policy"]
  resources:
  - poddisruptionbudgets
  verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
  resources:
  - certificatesigningrequests
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system

创建kube-state-metrics的deployment文件

cat kube-state-metrics-deloyment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.6.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: k8s.gcr.io/addon-resizer:1.8.4
        resources:
          limits:
            cpu: 150m
            memory: 50Mi
          requests:
            cpu: 150m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - /pod_nanny
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics
---
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics

部署

kubectl apply -f kube-state-metrics-rbac.yaml
kubectl apply -f kube-state-metrics-deloyment.yaml

验证

kubectl -n kube-system get pod |grep kube-state-metrics
kube-state-metrics-657d8d6669-bqbs8        2/2     Running   0          4h

kube-state-metrics的service中指定了annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以自动发现

kube-state-metrics在svc填写配置的时候指定annotation: prometheus.io/scrape: "true", job: kubernetes-service-endpoints可以实现自动发现。

部署 Alertmanager 集群

创建目录、赋权

k8s-m2
mkdir /data/alertmanager
chown -R 65534:65534 /data/alertmanager
k8s-m3
mkdir /data/alertmanager
chown -R 65534:65534 /data/alertmanager
cd /data/manual-deploy/alertmanager/
cat alertmanager-data-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alertmanager-lpv
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

创建Alertmanager的pv配置文件

cat alertmanager-data-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-0
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m2
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: alertmanager-pv-1
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: alertmanager-lpv
  local:
    path: /data/alertmanager
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - sealos-k8s-m3

创建Alertmanager的configmap配置文件

cat alertmanager-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: 'yo@qq.com'
      smtp_auth_username: '345@qq.com'
      smtp_auth_password: 'bhgb'
      smtp_hello: '警报邮件'
      smtp_require_tls: false
    route:
      group_by: ['alertname', 'cluster']
      group_wait: 30s
      group_interval: 30s
      repeat_interval: 12h
      receiver: default

      routes:
      - receiver: email
        group_wait: 10s
        match:
          team: ops
    receivers:
    - name: 'default'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true
    - name: 'email'
      email_configs:
      - to: '9935226@qq.com'
        send_resolved: true

创建Alertmanager的statefulset文件,我这里部署的是集群模式,如果需要单体库模式,将replicas改为1,去掉集群参数即可。

cat alertmanager-statefulset-cluster.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: alertmanager
  namespace: kube-system
  labels:
    k8s-app: alertmanager
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v0.21.0
spec:
  serviceName: "alertmanager-operated"
  replicas: 2
  selector:
    matchLabels:
      k8s-app: alertmanager
      version: v0.21.0
  template:
    metadata:
      labels:
        k8s-app: alertmanager
        version: v0.21.0
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - alertmanager
            topologyKey: "kubernetes.io/hostname"
      containers:
        - name: prometheus-alertmanager
          image: "prom/alertmanager:v0.21.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - "--config.file=/etc/config/alertmanager.yml"
            - "--storage.path=/data"
            - "--cluster.listen-address=${POD_IP}:9094"
            - "--web.listen-address=:9093"
            - "--cluster.peer=alertmanager-0.alertmanager-operated:9094"
            - "--cluster.peer=alertmanager-1.alertmanager-operated:9094"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          ports:
            - containerPort: 9093
              name: web
              protocol: TCP
            - containerPort: 9094
              name: mesh-tcp
              protocol: TCP
            - containerPort: 9094
              name: mesh-udp
              protocol: UDP
          readinessProbe:
            httpGet:
              path: /#/status
              port: 9093
            initialDelaySeconds: 30
            timeoutSeconds: 60
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: storage-volume
              mountPath: "/data"
              subPath: ""
          resources:
            limits:
              cpu: 1000m
              memory: 500Mi
            requests:
              cpu: 10m
              memory: 50Mi
        - name: prometheus-alertmanager-configmap-reload
          image: "jimmidyson/configmap-reload:v0.4.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://localhost:9093/-/reload
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
              readOnly: true
          resources:
            limits:
              cpu: 10m
              memory: 10Mi
            requests:
              cpu: 10m
              memory: 10Mi
          securityContext:
              runAsUser: 0
              privileged: true
      volumes:
        - name: config-volume
          configMap:
            name: alertmanager-config
  volumeClaimTemplates:
    - metadata:
        name: storage-volume
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "alertmanager-lpv"
        resources:
          requests:
            storage: 5Gi

创建Alertmanager的operated-service配置文件

cat alertmanager-operated-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-operated
  namespace: kube-system
  labels:
    app.kubernetes.io/name: alertmanager-operated
    app.kubernetes.io/component: alertmanager
spec:
  type: ClusterIP
  clusterIP: None
  sessionAffinity: None
  selector:
    k8s-app: alertmanager
  ports:
    - name: web
      port: 9093
      protocol: TCP
      targetPort: web
    - name: tcp-mesh
      port: 9094
      protocol: TCP
      targetPort: tcp-mesh
    - name: udp-mesh
      port: 9094
      protocol: UDP
      targetPort: udp-mesh

部署

cd /data/manual-deploy/alertmanager/
ls
alertmanager-configmap.yaml
alertmanager-data-pv.yaml
alertmanager-data-storageclass.yaml
alertmanager-operated-service.yaml
alertmanager-service-statefulset.yaml
alertmanager-statefulset-cluster.yaml
kubectl apply -f .

OK ,到此我们已经手动在k8s中的kube-system中以statefulset方式部署了Prometheus与Alertmanager,下一篇我们部署grafana与ingress-nginx的相关部署。

09-09 01:48