本文介绍了在多可用区Kubernetes集群中挂载云盘的实现方案,核心原理为:

volumeBindingMode配置参考:卷绑定模式说明

多可用区集群

部署集群的时候使用多可用区方案来保证集群的高可用已得到大家的共识,这样集群在某个可用区出现异常时仍然可以保持集群对外提供正常服务,阿里云ACK服务提供了多可用区集群部署方案,可参考使用文档

容器存储卷为用户数据持久化提供了实现方案,通过将外置存储挂载到容器内部的方式为应用提供存储空间,在容器被销毁后存储卷中的数据依然可以继续被其他的应用使用。常用的存储卷类型有块存储、文件存储、对象存储,其对可用区的敏感性如下:

可见,只有云盘存储卷对可用区有强制要求,需要在集群内部根据数据卷可用区调度Pod,PV(云盘)的可用区和Pod被调度的可用区必须一致才可以挂载成功。k8s实现了针对数据卷的调度器,会根据Pod使用的PV信息把Pod调度到合适的可用区;

下面针对阿里云云盘讲述一下如何在多可用环境实现块存储数据卷的可用区调度。

云盘多可用区使用方案

使用StatefulSet在多可用区集群挂载云盘

关于应用负载,Pod中挂载云盘数据卷的场景有如下特点:

即:在应用使用云盘的场景中,需要使用StatefulSet进行编排,并使用volumeClaimTemplates配置挂载云盘。多可用区场景亦是如此。

通过volumeClaimTemplates为每个Pod配置的PVC名字规则:

PVC的名字 = {volumeClaimTemplates名字} + "-" + {StatefulSet名字} + "-" + 序号

例如如下配置的StatefulSet:

apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web
spec:
  replicas: 3
***
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
***

为3个Pod分别创建配置的pvc名字为:disk-ssd-web-0、disk-ssd-web-1、disk-ssd-web-2;

多可用区集群挂载云盘方案

没有挂载云盘数据卷的情况下,pod启动过程为:

考虑挂载云盘卷调度方案1:

考虑挂载云盘卷调度方案2:

上面两种挂载云盘数据卷方案中:

方案1:先确定了PV(云盘)可用区信息,并将云盘可用区信息作为Pod调度的一个参量,即Pod随着云盘走

方案2:先确定了Pod运行的目标节点,再动态创建PV(云盘)并绑定,即云盘跟着Pod走

Pod跟着云盘走:

此方案的关键点是:Pod调度前确定创建好PV,并确定PV的可用区信息。在静态存储卷或动态存储卷中都可以实现此方案。

部署前需要规划好应用期望运行的可用区,并手动(静态卷)/自动(动态卷)创建好云盘、PV对象;如果一个应用期望运行在多个可用区,则需要在多个可用区申请云盘并创建多个PV。每个PV对象中需要添加相应的可用区调度信息。

调度信息可以通过在PV的Label中添加如下配置(由VolumeZonePredicate调度器进行调度):

  labels:
    failure-domain.beta.kubernetes.io/zone: cn-hangzhou-b
    failure-domain.beta.kubernetes.io/region: cn-hangzhou

调度信息也可以通过在PV的nodeAffinity中添加如下配置(由VolumeBindingPredicate调度器进行调度):

  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.diskplugin.csi.alibabacloud.com/zone
          operator: In
          values:
          - cn-shenzhen-a

下面示例是在一个包含三可用区节点的集群进行:

# kubectl describe node | grep failure-domain.beta.kubernetes.io/zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c

静态数据卷

静态存储卷指PV对象、云盘实例需要管理员手动创建,并将可用区等信息配置到数据卷中。在使用静态存储卷前需要先创建好云盘、pv。

本示例在StatefulSet应用中启动三个Pod,每个pod挂载一个云盘数据卷,且分别运行在cn-beijing-a、cn-beijing-b、cn-beijing-c三个可用区;期望:StatefulSet名字为 web,volumeClaimTemplates名字为disk-ssd;

分别按照下面模板创建pvc、pv:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: disk-ssd-web-0
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 25Gi
  selector:
    matchLabels:
      alicloud-pvname: pv-disk-ssd-web-0
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-disk-ssd-web-0
  labels:
    alicloud-pvname: pv-disk-ssd-web-0
spec:
  capacity:
    storage: 25Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: diskplugin.csi.alibabacloud.com
    volumeHandle: d-2zeeujx1zexxkbc8ny4b
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.diskplugin.csi.alibabacloud.com/zone
          operator: In
          values:
          - cn-beijing-a

其中pvc、pv、可用区、云盘ID对应关系:

# kubectl get pvc | grep disk-ssd-web
disk-ssd-web-0   Bound    pv-disk-ssd-web-0                           25Gi       RWO                                59s
disk-ssd-web-1   Bound    pv-disk-ssd-web-1                           25Gi       RWO                                56s
disk-ssd-web-2   Bound    pv-disk-ssd-web-2                           25Gi       RWO                                54s

# kubectl get pv | grep disk-ssd-web
pv-disk-ssd-web-0                           25Gi       RWO            Retain           Bound    default/disk-ssd-web-0                                2m43s
pv-disk-ssd-web-1                           25Gi       RWO            Retain           Bound    default/disk-ssd-web-1                                2m40s
pv-disk-ssd-web-2                           25Gi       RWO            Retain           Bound    default/disk-ssd-web-2                                2m38s

部署下面StatefulSet模板:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: disk-ssd
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: csi-disk-topology
      resources:
        requests:
          storage: 20Gi

执行下面命令获取Pod信息,上述模板启动的三个Pod分别运行在a、b、c三个可用区:

# kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          14m
web-1   1/1     Running   0          13m
web-2   1/1     Running   0          13m

# kubectl describe pod | grep Node
Node:               cn-beijing.172.16.1.101/172.16.1.101
Node:               cn-beijing.172.16.2.87/172.16.2.87
Node:               cn-beijing.172.16.3.197/172.16.3.197

# kubectl describe node cn-beijing.172.16.1.101 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a
# kubectl describe node cn-beijing.172.16.2.87 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b
# kubectl describe node cn-beijing.172.16.3.197 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c

分别删除三个pod,验证重启后的Pod依然落在相应可用区:

# kubectl delete pod --all
pod "web-0" deleted
pod "web-1" deleted
pod "web-2" deleted

# kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          61s
web-1   1/1     Running   0          41s
web-2   1/1     Running   0          21s

# kubectl describe pod | grep Node
Node:               cn-beijing.172.16.1.101/172.16.1.101
Node:               cn-beijing.172.16.2.87/172.16.2.87
Node:               cn-beijing.172.16.3.197/172.16.3.197

# kubectl describe node cn-beijing.172.16.1.101 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a
# kubectl describe node cn-beijing.172.16.2.87 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b
# kubectl describe node cn-beijing.172.16.3.197 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c

动态数据卷

创建支持多可用区的StorageClass,如下配置:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: csi-disk-multizone
provisioner: diskplugin.csi.alibabacloud.com
parameters:
    type: cloud_ssd
    zoneId: cn-beijing-a,cn-beijing-b,cn-beijing-c
reclaimPolicy: Delete

zoneId:配置多个可用区,生成多个云盘时会在多个可用区之间循环创建;

按照下面StatefulSet配置创建应用:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: disk-ssd
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: csi-disk-topology
      resources:
        requests:
          storage: 20Gi

查看生成的Pod、PVC、PV信息:

# kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          2m2s
web-1   1/1     Running   0          84s
web-2   1/1     Running   0          52s

# kubectl get pvc
NAME             STATUS   VOLUME                                      CAPACITY   ACCESS MODES   STORAGECLASS         AGE
disk-ssd-web-0   Bound    disk-9e6a6f65-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            csi-disk-multizone   2m6s
disk-ssd-web-1   Bound    disk-b5071f37-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            csi-disk-multizone   88s
disk-ssd-web-2   Bound    disk-c81b6163-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            csi-disk-multizone   56s

# kubectl get pv
NAME                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS         REASON   AGE
disk-9e6a6f65-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            Delete           Bound    default/disk-ssd-web-0   csi-disk-multizone            116s
disk-b5071f37-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            Delete           Bound    default/disk-ssd-web-1   csi-disk-multizone            85s
disk-c81b6163-f3fc-11e9-a7a7-00163e165b60   20Gi       RWO            Delete           Bound    default/disk-ssd-web-2   csi-disk-multizone            39s

查看Pod、pv所在的可用区,分别落在3个可用区:

# kubectl describe pod web-0 | grep Node
Node:               cn-beijing.172.16.1.101/172.16.1.101
# kubectl describe node cn-beijing.172.16.1.101 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a

# kubectl describe pod web-1 | grep Node
Node:               cn-beijing.172.16.2.87/172.16.2.87
# kubectl describe node cn-beijing.172.16.2.87 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b

# kubectl describe pod web-2 | grep Node
Node:               cn-beijing.172.16.3.197/172.16.3.197
# kubectl describe node cn-beijing.172.16.3.197 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c

# kubectl describe pv disk-9e6a6f65-f3fc-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-a]

# kubectl describe pv disk-b5071f37-f3fc-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-b]

# kubectl describe pv disk-c81b6163-f3fc-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-c]

云盘跟着Pod走:

云盘跟着Pod走是指在Pod完成调度,已经确定了运行节点所在可用区以后,再动态创建云盘、PV的方式。所以云盘跟着Pod走的方案只适用于动态数据卷。

创建WaitForFirstConsumer类型的StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: csi-disk-topology
provisioner: diskplugin.csi.alibabacloud.com
parameters:
    type: cloud_ssd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

WaitForFirstConsumer:表示使用定义了这个storageClass的pvc,在pod启动的时候先进行pod调度,再触发pv、云盘的Provision操作。

创建下面StatefulSet应用:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: disk-ssd
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: disk-ssd
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: csi-disk-topology
      resources:
        requests:
          storage: 20Gi

获取Pod、PV的信息:

# kubectl get pod
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          2m5s
web-1   1/1     Running   0          100s
web-2   1/1     Running   0          74s

# kubectl describe pod web-0 | grep Node
Node:               cn-beijing.172.16.3.197/172.16.3.197
# kubectl describe node cn-beijing.172.16.3.197 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-c

# kubectl describe pod web-1 | grep Node
Node:               cn-beijing.172.16.1.101/172.16.1.101
# kubectl describe node cn-beijing.172.16.1.101 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-a

# kubectl describe pod web-2 | grep Node
Node:               cn-beijing.172.16.2.87/172.16.2.87
# kubectl describe node cn-beijing.172.16.2.87 | grep zone
                    failure-domain.beta.kubernetes.io/zone=cn-beijing-b

# kubectl describe pv disk-d4b08afa-f3fe-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-c]
# kubectl describe pv disk-e32d5fcf-f3fe-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-a]
# kubectl describe pv disk-f2cec31a-f3fe-11e9-a7a7-00163e165b60 | grep zone
    Term 0:        topology.diskplugin.csi.alibabacloud.com/zone in [cn-beijing-b]

”云盘跟着Pod走“的方案需要澄清一下:

只有第一次启动pod的时候,且配置了”云盘跟着Pod走“策略,才会出现先调度pod再触发创建pv、云盘的场景。当pod重启的时候,这时走的流程是根据pv的zone信息进行调度,即”pod跟着云盘走“。

总结:

本文给出了2种多可用区集群使用云盘数据卷方案:

对于应用负载使用云盘数据卷的场景,更推荐使用”云盘跟着Pod走“的方案。

阅读原文

本文为云栖社区原创内容,未经允许不得转载。

10-23 19:47