问题描述
我尝试在笔记本电脑上装有Debian
OS的3台虚拟机上安装Kubernetes
和kubeadm
,其中一个作为主节点,另外两个作为工作节点.我完全按照 kubernetes.io 上的教程进行操作>建议.我使用命令kubeadm init --pod-network-cidr=10.244.0.0/16
初始化了集群,并使用相应的kube join
命令加入了工作进程.我用命令kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
安装了Flannel
作为网络覆盖.
I tried to install Kubernetes
with kubeadm
on 3 virtual machines with Debian
OS on my laptop, one as master node and the other two as worker nodes. I did exactly as the tutorials on kubernetes.io suggests. I initialized cluster with command kubeadm init --pod-network-cidr=10.244.0.0/16
and joined the workers with corresponding kube join
command. I installed Flannel
as the network overlay with command kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
.
命令kubectl get nodes
的响应看起来不错:
The repsonse of command kubectl get nodes
looks fine:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE
k8smaster Ready master 20h v1.18.3 192.168.1.100 <none> Debian GNU/Linux 10 (buster) 4.19.0-9-amd64 docker://19.3.9
k8snode1 Ready <none> 20h v1.18.3 192.168.1.101 <none> Debian GNU/Linux 10 (buster) 4.19.0-9-amd64 docker://19.3.9
k8snode2 Ready <none> 20h v1.18.3 192.168.1.102 <none> Debian GNU/Linux 10 (buster) 4.19.0-9-amd64 docker://19.3.9
命令kubectl get pods --all-namespaces
的响应未显示任何错误:
The response of command kubectl get pods --all-namespaces
doesn't show any error:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-66bff467f8-7hlnp 1/1 Running 9 20h 10.244.0.22 k8smaster <none> <none>
kube-system coredns-66bff467f8-wmvx4 1/1 Running 11 20h 10.244.0.23 k8smaster <none> <none>
kube-system etcd-k8smaster 1/1 Running 11 20h 192.168.1.100 k8smaster <none> <none>
kube-system kube-apiserver-k8smaster 1/1 Running 9 20h 192.168.1.100 k8smaster <none> <none>
kube-system kube-controller-manager-k8smaster 1/1 Running 11 20h 192.168.1.100 k8smaster <none> <none>
kube-system kube-flannel-ds-amd64-9c5rr 1/1 Running 17 20h 192.168.1.102 k8snode2 <none> <none>
kube-system kube-flannel-ds-amd64-klw2p 1/1 Running 21 20h 192.168.1.101 k8snode1 <none> <none>
kube-system kube-flannel-ds-amd64-x7vm7 1/1 Running 11 20h 192.168.1.100 k8smaster <none> <none>
kube-system kube-proxy-jdfzg 1/1 Running 11 19h 192.168.1.101 k8snode1 <none> <none>
kube-system kube-proxy-lcdvb 1/1 Running 6 19h 192.168.1.102 k8snode2 <none> <none>
kube-system kube-proxy-w6jmf 1/1 Running 11 20h 192.168.1.100 k8smaster <none> <none>
kube-system kube-scheduler-k8smaster 1/1 Running 10 20h 192.168.1.100 k8smaster <none> <none>
然后我尝试使用命令kubectl apply -f podexample.yml
创建具有以下内容的POD
:
Then i tried to create a POD
with command kubectl apply -f podexample.yml
with following content:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: nginx
image: nginx
命令kubectl get pods -o wide
显示POD
是在工作节点1上创建的,并且处于Running
状态.
Command kubectl get pods -o wide
shows that the POD
is created on worker node1 and is in Running
state.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
example 1/1 Running 0 135m 10.244.1.14 k8snode1 <none> <none>
问题是,当我尝试使用curl -I 10.244.1.14
命令连接到Pod时,我在主节点中得到以下响应:
The thing is, when i try to connect to the pod with curl -I 10.244.1.14
command i get the following response in master node:
curl: (7) Failed to connect to 10.244.1.14 port 80: Connection timed out
,但工作节点1上的同一命令成功响应:
but the same command on the worker node1 responds successfully with:
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Sat, 23 May 2020 19:45:05 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 14 Apr 2020 14:19:26 GMT
Connection: keep-alive
ETag: "5e95c66e-264"
Accept-Ranges: bytes
我认为这可能是因为kube-proxy
不在主节点上运行,但是命令ps aux | grep kube-proxy
显示了它正在运行.
I thought maybe that's because somehow kube-proxy
is not running on master node but command ps aux | grep kube-proxy
shows that it's running.
root 16747 0.0 1.6 140412 33024 ? Ssl 13:18 0:04 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=k8smaster
然后我用命令ip route
检查了内核路由表,它显示发往10.244.1.0/244
的数据包被路由到法兰绒.
Then i checked for kernel routing table with command ip route
and it shows that packets destined for 10.244.1.0/244
get routed to flannel.
default via 192.168.1.1 dev enp0s3 onlink
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
169.254.0.0/16 dev enp0s3 scope link metric 1000
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.1.0/24 dev enp0s3 proto kernel scope link src 192.168.1.100
一切对我来说都很好,我不知道该怎么办才能检查出什么问题.我想念什么吗?
Everything looks fine to me and i don't know what else should i check to see what's the problem. Am i missing something?
UPDATE1:
如果我在工作节点1上启动NGINX
容器并将其80端口映射到工作节点1主机的端口80,则可以从主节点通过命令curl -I 192.168.1.101
连接到它.此外,我没有添加任何iptable规则,并且在计算机上没有安装UFW
之类的防火墙守护程序.因此,我认为这不是防火墙问题.
If i start an NGINX
container on worker node1 and map it's 80 port to port 80 of the worker node1 host, then i can connect to it via command curl -I 192.168.1.101
from master node. Also, i didn't add any iptable rule and there is no firewall daemon like UFW
installed on the machines. So, i think it's not a firewall issue.
UPDATE2:
我重新创建了群集,并使用canal
而不是flannel
,仍然没有运气.
I recreated the cluster and used canal
instead of flannel
, still no luck.
UPDATE3:
我通过以下命令查看了运河和法兰绒原木,一切似乎都很好:
I took a look at canal and flannel logs with following commands and everything seems fine:
kubectl logs -n kube-system canal-c4wtk calico-node
kubectl logs -n kube-system canal-c4wtk kube-flannel
kubectl logs -n kube-system canal-b2fkh calico-node
kubectl logs -n kube-system canal-b2fkh kube-flannel
UPDATE4:
出于完整性考虑,此处是提到的容器的日志.
UPDATE5:
我尝试安装特定版本的kubernetes组件和docker,以检查是否存在与以下版本命令的版本不匹配有关的问题:
I tried to install specific version of kubernetes components and docker, to check if there is an issue related to versioning mismatch with following commands:
sudo apt-get install docker-ce=18.06.1~ce~3-0~debian
sudo apt-get install -y kubelet=1.12.2-00 kubeadm=1.12.2-00 kubectl=1.12.2-00 kubernetes-cni=0.6.0-00
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
但没有任何改变.
我什至在所有节点上都更新了文件/etc/bash.bashrc
以清除所有代理设置,只是为了确保它与代理无关:
i even updated file /etc/bash.bashrc
on all nodes to clear any proxy settings just to make sure it's not about proxy:
export HTTP_PROXY=
export http_proxy=
export NO_PROXY=127.0.0.0/8,192.168.0.0/16,172.0.0.0/8,10.0.0.0/8
,还在所有节点上的docker systemd文件/lib/systemd/system/docker.service
中添加了以下环境:
and also added following environments to docker systemd file /lib/systemd/system/docker.service
on all nodes:
Environment="HTTP_PROXY="
Environment="NO_PROXY="
然后重新启动所有节点,当我登录时,仍然得到curl: (7) Failed to connect to 10.244.1.12 port 80: Connection timed out
Then rebooted all nodes and when i logged in, still got curl: (7) Failed to connect to 10.244.1.12 port 80: Connection timed out
UPDATE6:
i事件试图在CentOS
机器中设置群集.以为可能与Debian
有关.我也停止并禁用了firewalld
以确保防火墙不会引起问题,但是我又得到了完全相同的结果:Failed to connect to 10.244.1.2 port 80: Connection timed out
.
i event tried to setup the cluster in CentOS
machines. thought maybe there is something related to Debian
. i also stopped and disabled firewalld
to make sure that firewall is not causing problem, but i got the exact same result again: Failed to connect to 10.244.1.2 port 80: Connection timed out
.
我现在唯一可疑的是,可能是因为VirtualBox
和虚拟机网络配置吗?虚拟机安装在连接到我的无线网络接口的Bridge Adapter
上.
The only thing that now i'm suspicious about is that maybe it's all because of VirtualBox
and virtual machines network configuration? The virtual machines are attched to a Bridge Adapter
connected to my Wireless network interface.
UPDATE7:
我进入了创建的POD,发现POD内部没有Internet连接.因此,我从NGINX
图像创建了另一个POD,该图像具有类似curl
,wget
,ping
和traceroute
的命令,并尝试了curl https://www.google.com -I
并得到了结果:curl: (6) Could not resolve host: www.google.com
.我检查了/etc/resolv.conf
文件,发现POD内的DNS服务器地址为10.96.0.10
.将DNS更改为8.8.8.8
仍然curl https://www.google.com -I
导致curl: (6) Could not resolve host: www.google.com
.尝试ping 8.8.8.8
,结果为56 packets transmitted, 0 received, 100% packet loss, time 365ms
.对于最后一步,我尝试了traceroute 8.8.8.8
并获得了以下结果:
I went inside the created POD and figured out there is no internet connectivity inside the POD. So, I created another POD from a NGINX
image that has commands like curl
, wget
, ping
and traceroute
and tried curl https://www.google.com -I
and got result: curl: (6) Could not resolve host: www.google.com
. I checked /etc/resolv.conf
file and found that the DNS server address inside the POD is 10.96.0.10
. Changed the DNS to 8.8.8.8
still curl https://www.google.com -I
results in curl: (6) Could not resolve host: www.google.com
. Tried to ping 8.8.8.8
and the result is 56 packets transmitted, 0 received, 100% packet loss, time 365ms
. For the last step i tried traceroute 8.8.8.8
and got the following result:
1 10.244.1.1 (10.244.1.1) 0.116 ms 0.056 ms 0.052 ms
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
我不知道POD中没有Internet连接这一事实与以下问题有关:我无法从部署POD的节点之外的其他节点连接到群集中的POD.
I don't know the fact that there is no internet connectivity in POD has anything to do with the problem that i can't connect to POD within the cluster from nodes other than the one that POD is deployed on.
推荐答案
Debian系统将nftables
用于iptables
后端,该后端与Kubernetes网络设置不兼容.因此,您必须使用以下命令将其设置为使用iptables-legacy而不是nftables:
Debian system uses nftables
for the iptables
backend which is not compatible with Kubernetes network setup. So you have to set it to use iptables-legacy instead of nftables with the following commands:
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
这篇关于集群内无法访问Kubernetes POD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!