本文介绍了一致性散列作为缩放写入的一种方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚我是否在正确的轨道上。我正在构建一个(实时)统计/分析服务,并使用redis来存储一些集合和散列。

现在让我们假设我取得了一些成功,并且我需要向外扩展。哈希环技术看起来不错,但我有一个印象,它只适用于缓存场景。



如果一个节点出现故障,该怎么办?理论上,其密钥现在由其他节点拥有。实际上,他们不会有这些数据。它丢失了,对吧?与添加/删除节点相同。



我错过了一些基本的东西吗?这可能是一个穷人的集群?

解决方案

在集群中使用多个节点有两个原因:




  • 分片以限制存储在每个节点上的数据量
  • 复制减少读取负载并允许节点成为删除而不会丢失数据。


    这两者基本上是不同的,但您可以同时实现 - 使用一致性哈希来指向一组节点使用标准主/从设置而不是单个节点。

    如果集群是您的主数据存储而不是缓存,则需要使用不同的重新分配策略,其中包括复制数据。

    我的实现基于让客户端为散列选择64k桶中的一个,并具有将该桶映射到节点的表。最初,所有映射到节点#1。

    当节点#1变得太大时,它的从节点成为主节点#2并且表被更新以映射节点的一半到节点#2的#1密钥。此时,所有读写操作都将与新映射一起工作,您只需清理现在位于错误节点上的密钥。根据性能要求,您可以一次检查所有密钥,或随着期满系统的检查随机选择一个密钥。

    I am trying to figure out if I am on the right track. I am building a (real-time) statistics/analytics service and I use redis to store some sets and hashes.

    Now let's assume I have some success and I need to scale out. The hash ring technique looks nice, but I have an impression that it is only suited for caching scenarios.

    What if a node goes down? In theory, its keys are now owned by other nodes. In practice, they won't have the data. It is lost, right? Same with adding / removing nodes.

    Am I missing some fundamental thing? Can this be a poor man's cluster?

    解决方案

    There are two reasons to use multiple nodes in a cluster:

    • Sharding to limit the amount of data stored on each node
    • Duplication to reduce read load and allow a node to be removed without data loss.

    The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node.

    If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data.

    My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1.

    When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.

    这篇关于一致性散列作为缩放写入的一种方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-09 23:32