为什么我的Zookeeper服务器无法重新加入法定人数?

本文介绍了为什么我的Zookeeper服务器无法重新加入法定人数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的仲裁中有三台服务器.他们正在运行ZooKeeper 3.4.5.根据 mntr 的输出，其中两个看上去运行良好.其中一个由于部署而在几天前重新启动，此后一直无法加入仲裁.日志中突出显示的几行是:

I have three servers in my quorum. They are running ZooKeeper 3.4.5. Two of them appear to be running fine based on the output from mntr. One of them was restarted a couple days ago due to a deploy, and since then has not been able to join the quorum. Some lines in the logs that stick out are:

2014-03-03 18:44:40,995 [myid:1] - INFO  [main:QuorumPeer@429] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation

和:

2014-03-03 18:44:41,233 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (2, 1)
2014-03-03 18:44:41,234 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumCnxManager@190] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-03-03 18:44:41,235 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@774] - Notification time out: 400

第一次使用Google搜索(未找到currentEpoch！")将我带到JIRA ZOOKEEPER-1653-动物园管理员由于纪元不一致而无法启动.它描述了一个错误修复程序，但没有描述不升级zookeeper即可解决该问题的方法.

Googling for the first ('currentEpoch not found!') led me to JIRA ZOOKEEPER-1653 - zookeeper fails to start because of inconsistent epoch. It describes a bug fix but doesn't describe a way to resolve the issue without upgrading zookeeper.

第二次搜索(具有较小的服务器标识符，因此删除了连接")使我进入JIRA ZOOKEEPER-1506-如果节点连接失败，请重试DNS主机名-> IP解析.这是有道理的，因为我正在为服务器使用AWS Elastic IP.解决此问题的方法似乎是进行滚动重新启动，这将导致我们暂时失去仲裁.

Googling for the second ('Have smaller server identifier, so dropping the connection') led me to JIRA ZOOKEEPER-1506 - Re-try DNS hostname -> IP resolution if node connection fails. This makes sense because I am using AWS Elastic IPs for the servers. The fix for this issue seems to be to do a rolling restart, which would cause us to temporarily lose quorum.

看起来第二个问题肯定在起作用，因为当我尝试连接第一个服务器时，我在另一个ZooKeeper服务器的日志(仍在仲裁中的日志)中看到了超时.我不确定的是，当我进行滚动重启时，第一个问题是否会消失.我想避免升级和/或进行滚动重启，但是如果必须进行滚动重启，我希望避免执行多次.有没有一种方法可以解决第一个问题而无需升级?甚至更好:是否可以在不进行滚动重启的情况下解决这两个问题?

It looks like the second issue is definitely in play because I see timeouts in the other ZooKeeper server's logs (the ones still in the quorum) when trying to connect to the first server. What I'm not sure of is if the first issue will disappear when I do a rolling restart. I would like to avoid upgrading and/or doing a rolling restart, but if I have to do a rolling restart I'd like to avoid doing it multiple times. Is there a way to fix the first issue without upgrading? Or even better: Is there a way to resolve both issues without doing a rolling restart?

感谢您的阅读和帮助！

the

为什么我的Zookeeper服务器无法重新加入法定人数?

问题描述

推荐答案