如何在发生硬故障时恢复redis集群中特定节点的哈希槽?

本文介绍了如何在发生硬故障时恢复redis集群中特定节点的哈希槽?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我正在测试 redis 集群.我有一个有 3 个主站和 3 个从站的设置.现在，如果一个节点面临硬故障(主节点和从节点都宕机)，集群仍然可以运行，除非出现故障节点提供的哈希槽.现在，在测试这样的场景时，我看到对这些哈希槽提供的键进行操作的读/写失败并出现异常，这很好(我正在使用 jedis 顺便说一句).但是，如果我使用 redis 集群作为缓存，我希望这些哈希槽由其他节点提供服务.redis-trib 实用程序中似乎没有此功能.

So I am testing out the redis cluster. I have a setup with 3 masters and 3 slaves. Now, in case a node faces hard-failure (both master and slave go down), the cluster is still functional, barring the hash slots served by the failed node. Now, while testing such a scenario, I see that reads/writes that operate on keys served by these hash slots fail with exceptions, which is fine (I'm using jedis btw). However, if I am using redis cluster as a cache, I would like these hash slots to be served by some other node. This functionality doesn't seem to be present in the redis-trib utility.

我无法重新分配集群以移动这些哈希槽，因为 ./redis-trib.rb reshard 失败并显示 [ERR] 并非所有 #{ClusterHashSlots} 槽都被节点覆盖..我也无法从集群中删除节点，因为 ./redis-trib.rb del-node 失败，[ERR] 节点 #{node} 不为空！将数据重新分片，然后重试..那么，处理我无法启动原始节点但希望这些哈希槽由其他节点提供服务的情况的最佳方法是什么(假设我什至可以在旧节点上丢失数据)?理想情况下，例如能够删除该节点(集群中的主节点和从节点，并将这些哈希槽分配给其他节点).

I cannot reshard the cluster to move these hash slots as ./redis-trib.rb reshard fails with [ERR] Not all #{ClusterHashSlots} slots are covered by nodes.. I also cannot remove the node from the cluster as ./redis-trib.rb del-node fails with [ERR] Node #{node} is not empty! Reshard data away and try again.. What is the best way then, to deal with a scenario where I cannot bring my original node up but want those hash slots to be served by some other node (assuming that I am even fine with losing data on the old node)? Ideally, something like being able to remove that node (master and slave from the cluster and assign those hash slots to some other node).

`推荐答案`

它通过将故障节点提供的所有插槽添加到一些可连接的节点来修复集群.方法是使用 cluster addslots 命令，但当然手动操作有点困难，所以我建议使用我们团队开发的工具.

It fixes the cluster by adding all slots that was served by the failed node to some connectable nodes. The approach is to use the cluster addslots command, but of course it's somehow difficult to do it manually so I suggest this tool developed by our team.

用法(在外壳中):

# it requires Python2.7; install it via pip
pip install redis-trib

# suppose one of the accessible nodes is serving at 172.0.0.1:7000
# start a cluster-mode Redis that is not involved in any cluster
# suppose its address is 172.0.0.5:8000
redis-trib.py rescue --existing-addr 172.0.0.1:7000 --new-addr 172.0.0.5:8000

此后，新节点将为所有失败的插槽提供服务，以便集群状态变为正常.

After that the new node would serve all the failed slots so that the cluster state will become ok.

                        这篇关于如何在发生硬故障时恢复redis集群中特定节点的哈希槽?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！