问题背景

kubelet、docker和containerd 的工作目录默认都在 /var/lib 下。
但是我们学校实验室租的线上机器挂载在 / 的磁盘空间很小,挂载在 /mnt/data_mnt/ 的数据盘空间大。
应该是因为工作目录的原因,当 /占用超过 80% 时, kubelet 会认为磁盘空间不足,因为 DiskPressure 而进入 NotReady 状态。

(以下是迁移后)

root@iZhp3hqett0mw795req5b2Z:~# df -h | head
Filesystem      Size  Used Avail Use% Mounted on
udev             16G     0   16G   0% /dev
tmpfs            16G   19M   16G   1% /run
/dev/vda1        99G   48G   46G  51% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/vdb1       493G  120G  348G  26% /mnt/data_mnt
overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/54a47bbff1442f521326770cab94eb3221d82b0ff9e997c1b2efe6cad811b21b/merged
overlay          99G   48G   46G  51% /var/lib/containers/storage/overlay/a74d553e701c85c5ad25fd14a8fd30383e0dc21f4b567bc81e6b7ac74bc73524/merged

迁移

Docker

停止 Docker 服务

删除所有容器后。

systemctl stop docker

修改配置

Docker配置文件在 /etc/docker/daemon.json,增加字段设置数据目录。

参考官网文档 https://docs.docker.com/config/daemon/#daemon-data-directory

修改后示例:

{
 "registry-mirrors": [
     "https://dockerhub.azk8s.cn",
     "https://hub-mirror.c.163.com",
     "https://reg-mirror.qiniu.com"
 ],

   "builder": {
       "gc": {
         "defaultKeepStorage": "20GB",
         "enabled": true
       }
   },
   "experimental": true,
   "features": {
     "buildkit": false
   },
   "dns": ["8.8.8.8", "8.8.4.4"],
   "data-root": "/mnt/data_mnt/var/lib/docker"
}

移动文件

/var/lib/docker 复制到 /mnt/data_mnt/var/lib/docker

重新启动 Docker 服务

systemctl start docker

# 跑一个 nginx 看看
docker run -p 80:80 nginx

# 查看服务状态
systemctl status docker


● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2023-10-17 22:28:47 CST; 12h ago
     Docs: https://docs.docker.com
 Main PID: 3917580 (dockerd)
    Tasks: 25
   Memory: 1.0G
      CPU: 1min 16.247s
   CGroup: /system.slice/docker.service
           ├─ 370428 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 5050 -container-ip 172.17.0.2 -container-port 5000
           └─3917580 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Oct 18 10:11:41 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:11:41.715286425+08:00" level=error msg="Handler for POST /v1.41/containers/f66c7e907176ccd2abe010253448ab6dcab286c60f893b4cde72184215747d90/start returned error: driver 
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.451142888+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5000/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455921606+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455953643+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.455930600+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456010183+08:00" level=error msg="Upload failed: no basic auth credentials"
Oct 18 10:17:18 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:17:18.456058582+08:00" level=info msg="Attempting next endpoint for push after error: no basic auth credentials"
Oct 18 10:18:56 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:18:56.354196507+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:02 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:02.439702060+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client
Oct 18 10:19:07 iZhp3hqett0mw795req5b2Z dockerd[3917580]: time="2023-10-18T10:19:07.267669420+08:00" level=info msg="Attempting next endpoint for push after error: Get \"https://localhost:5050/v2/\": http: server gave HTTP response to HTTPS client

containerd

停止服务

systemctl stop containerd

修改配置

配置文件在 /etc/containerd/config.toml

可以看到root = "/mnt/data_mnt/var/lib/containerd",可见工作目录默认在 /var/lib/containerd

万一不小心改乱了,可以重新生成默认配置:

containerd config default > /etc/containerd/config.toml

修改后例如:

version = 2
root = "/mnt/data_mnt/var/lib/containerd"
state = "/run/containerd"
oom_score = 0

[grpc]
  address = "/run/containerd/containerd.sock"
  uid = 0
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  address = "/run/containerd/containerd-debug.sock"
  uid = 0
  gid = 0
  level = "warn"

[timeouts]
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "sealos.hub:5000/pause:3.9"
    max_container_log_line_size = -1
    max_concurrent_downloads = 20
    disable_apparmor = false
    [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "overlayfs"
      default_runtime_name = "runc"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
      [plugins."io.containerd.grpc.v1.cri".registry.configs]
          [plugins."io.containerd.grpc.v1.cri".registry.configs."sealos.hub:5000".auth]
            username = "admin"
            password = "passw0rd"

移动文件

/mnt/data_mnt/var/lib/containerd 复制到 /var/lib/containerd

重新启动服务

systemctl start containerd

systemctl status containerd

kubelet(遇到问题待解决)

停止服务

systemctl stop kubelet

修改配置

kubelet 服务的配置,我的配置在 /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

注意同一目录可能还有个文件 /etc/systemd/system/kubelet.service.d/override.conf 实际运行中会用 override.conf 覆盖 10-kubeadm.conf 的内容。

修改后内容示例:

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/mnt/data_mnt/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/mnt/data_mnt/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
Environment="KUBELET_EXTRA_ARGS= \
               \
               \
              --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --image-service-endpoint=unix:///var/run/image-cri-shim.sock"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

另外还要修改 /etc/kubernetes/kubelet.conf 中配置的密钥地址,修改后示例(部分)

# 以上省略
users:
- name: system:node:izhp3hqett0mw795req5b2z
  user:
    client-certificate: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /mnt/data_mnt/var/lib/kubelet/pki/kubelet-client-current.pem

另外还要建软连接,因为读取密钥时,是通过名为“当前”的软连接找实际特定版本的密钥,移动后就乱套了。

ln -s  kubelet-client-2023-10-07-11-14-02.pem kubelet-client-current.pem 

移动文件(遇到问题待解决)

有些文件删除不了……

root@iZhp3hqett0mw795req5b2Z:~# rm -rf /var/lib/kubelet
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~projected/kube-api-access-6jt8n': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/30c0099f-dfcc-4e6f-893e-eacc6ed44021/volumes/kubernetes.io~empty-dir/tmp-volume': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/54e7cb22-fdab-4e33-afb3-c8ba88d153a2/volumes/kubernetes.io~projected/kube-api-access-j84xs': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/d1a3fba3-3ab8-4ef9-b61c-6479b26c79f7/volumes/kubernetes.io~projected/kube-api-access-lf5tx': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/5e38f3a0-7f59-4d2e-98f4-1ec915e6ba89/volumes/kubernetes.io~projected/kube-api-access-prz4v': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/0f02517c-01c3-4b58-9f85-be169a92a31d/volumes/kubernetes.io~projected/kube-api-access-r4kxp': Device or resource busy
rm: cannot remove '/var/lib/kubelet/pods/7098d438-0a9d-40df-aee1-ec4884ba262f/volumes/kubernetes.io~projected/kube-api-access-rqtwq': Device or resource busy

重新启动服务

systemctl start kubelet

systemctl status kubelet

使用的版本

日期:2023年10月18日

版本

root@iZhp3hqett0mw795req5b2Z:~# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

root@iZhp3hqett0mw795req5b2Z:~# docker version
Client:
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.1
 Git commit:        20.10.21-0ubuntu1~18.04.3
 Built:             Thu Apr 27 05:50:21 2023
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.1
  Git commit:       20.10.21-0ubuntu1~18.04.3
  Built:            Thu Apr 27 05:36:22 2023
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.12-0ubuntu1~18.04.1
  GitCommit:        
 runc:
  Version:          1.1.4-0ubuntu1~18.04.2
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        

root@iZhp3hqett0mw795req5b2Z:~# containerd --version
containerd github.com/containerd/containerd 1.6.12-0ubuntu1~18.04.1 
10-19 06:58