本文介绍了一致性级别的读/写策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据

 有3种读取数据一致性的方法:
a。 WRITE ALL + READ OoNE
b。 WRITE ONE + READ ALL
c。对于一个数据,写操作通常发生一次,但是读操作经常发生(例如,写操作通常发生一次) 。
但是照顾读一致性,是否可以合并a和b?
这是,WRITE ONE - > READ ONE - >如果没有找到 - > READ ALL。
该方法通常是否满足读/写操作发生一次?
只有在没有数据的节点上第一次读取ALL。



所以我的理解是否正确?



Wilian,感谢你的详细阐述。我想我需要描述我的用例,如下所示。我实现了一个时间轴使用可以post。用户可以按照有趣的帖子。所以通知会发送给追随者。为了节省带宽,用户在CL ONE写/读邮件。最终,用户总是可以通过阅读修复阅读一段时间后的帖子。关注者将收到添加到帖子的评论的通知,如果他们听到该帖子。这是我的问题。它必须确保关注者可以阅读评论,如果通知发送给追随者。所以我缩进使用CL ONE检查是否同步到查询的节点的注释。如果没有结果,请尝试CL ALL同步该注释。因此,其他跟随者从节点查询不打扰同步其他节点,因为CL ALL之前已经完成,这可以节省带宽和降低服务器开销。所以对于你的最后一个场景,我不在乎该值是旧的还是最新的,因为数据是根据通知进行同步。

解决方案

从 Carlo Bertuccini写道:

此问题中的案例A,B和C似乎是指三



案例A



WRITE ALL 会将数据发送到所有副本。如果复制因子(RF)为三(3),则 WRITE ALL 在向客户端报告成功写入之前写入三个副本。但是您不能看到写入发生,直到下一次读取相同的数据键。最小, READ ONE 将从上述副本中的一个中读取,并满足必要的条件: WRITE(3)+ READ(1) ; RF(3)



案例B



ONE 会将数据仅发送到单个副本。在这种情况下,获取一致性读取的唯一方法是从全部读取。协调器节点将获得所有答案,找出哪一个是最近的,然后向过时的副本发送提示,通知他们有一个较新的值。该提示异步地发生,但是只有在发生 READ ALL 之后才满足必要的条件: WRITE(1)+ READ(3) RF(3)



案例C



QUORUM 操作必须涉及 FLOOR(RF / 2)+ 1 副本。在我们的RF = 3示例中,这是 FLOOR(3/2)+ 1 == 1 + 1 == 2 。同样,一致性取决于读取和写入。在最简单的情况下,读取操作与写入操作使用的完全相同的副本,但这从来不能保证。在一般情况下,执行读取的协调器节点将与写入使用的至少一个副本对话,因此它将看到更新的值。在这种情况下,很像 READ ALL 情况,协调器节点将获得所有答案,找出哪一个是最近的,然后发送一个提示过时的副本。当然,这也满足必要的条件: WRITE(2)+ READ(2)> RF(3)



对于OP的问题...



有可能合并案例A和B?



为了确保一致性,只有当你的意思是 WRITE ALL + READ ALL ,因为您可以随时增加上述情况下的读者或作者人数。



但是,如果您需要阅读一致的数据, WRITE ONE + READ ONE 不是一个好主意,所以我的回答是:。再次,使用该语句和我们的示例RF = 3: WRITE(1)+ READ(1)> RF(3) 不成立。如果您要使用此配置,接收到没有值 不能信任的答案 - 这只是意味着一个做读取没有一个值。但是值可能存在于一个或多个其他副本中。



因此,从这个逻辑,看起来可能会执行 READ ALL 在接收到一个无值答案将解决问题。对于那个用例,但还有另一个需要考虑的:如果你从 READ ALL ... 获得一些值 >您如何知道返回的值是最新的值?这就是当我们想要一致性时。



关于编辑过的问题中的时间轴通知的用例

如果您关心阅读最近的写作,那么您需要满足问题。

如果我对您描述的场景的理解是正确的,这些是您的使用案例的要点:




  • 大多数(但不是全部?)时间轴条目将是一次写入(稍后不会修改)

  • 可以跟踪任何此类条目>
  • 任何此类条目都可以评论(有评论列表)

  • 对时间轴条目的任何评论应触发通知

  • 尝试最小化正常案例的成本(在这种情况下,以带宽衡量)

  • 我需要


    由于大多数输入是一次写入操作,而且您关心的是更多关于条目的存在,而不一定是条目的最新内容,您可以使用 WRITE ONE + READ ONE 如果您没有获得具有某些其他指示的记录,则会返回 READ ALL (例如从通知)。对于时间线条目内容,它听起来不像您的情况取决于时间线条目的用户内容的一致性



    如果你不在乎一致性,那么这个讨论是无效的;读/写任何一致性级别,并让Cassandra的异步复制和反熵特性来完成他们的工作。也就是说,虽然你的目标是最小化网络流量/成本,如果你的工作量大多读取,然后在<$> 添加的成本CL QUORUM ALL 实际上可能不是那么多。



    >

    此语句意味着您不仅要关心一组关注者 是否存在,还要关心其内容哪些用户正在关注)。您尚未详细说明如何存储/跟踪关注者,但除非您确保此数据的一致性,否则可能会有一个或多个关注者未能通知新的因为您检索到了过期版本的关注者列表。



    Cassandra非常灵活,允许每个离散读取和写入操作使用不同的一致性级别。利用这一点,并确保强烈的一致性,它需要它,放松它在那里你确信阅读最新的写对于您的应用程序的逻辑和功能不重要。


    Based on Read Operation in Cassandra at Consistency level of Quorum?

    there are 3 ways to read data consistency:
    a. WRITE ALL + READ OoNE
    b. WRITE ONE + READ ALL
    c. WRITE QUORUM + READ QUORUM
    

    For a data, the write operation usually happens once, but read operations often happens.But take care of the read consistency, is it possible to merge a and b ?This is, WRITE ONE -> READ ONE -> if not found -> READ ALL.Does the approach usually fulfill read/write operation happen once?There is only read ALL at first time on a node which has no the data.

    So Is my understanding correct?

    Wilian, thanks for exactly elaborating. I think I need to describe my use case, as bellow. I implemented a timeline uses can post to. And users can follow the interesting post. So notification will be sent to the followers. For saving bandwidth, users write/read post at CL ONE. Eventually, users always can read the post after a while by read repair. Followers will receive the notification of comments added to post if they listen the post. Here is my question. It must make sure followers can read the comments if notification were delivers to followers. So I am indented to use CL ONE to check if the comment was synced to the node queried. If no result, try CL ALL to synced the comment. So other followers query from the node don't bother to sync other nodes since the CL ALL was done before,which can save bandwidth and lower server overhead. So as for your final scenario, I don't care if the value is old or latest because the data was synced according to notifications. I need to ensure users can get the comment if notification was delivered to followers.

    解决方案

    From the answer to the linked question, Carlo Bertuccini wrote:

    The cases A, B, and C in this question appear to be referring to the three minimum ways of satisfying that disequation, as given in the same answer.

    Case A

    WRITE ALL will send the data to all replicas. If your replication factor (RF) is three(3), then WRITE ALL writes three copies before reporting a successful write to the client. But you can't possibly see that the write occurred until the next read of the same data key. Minimally, READ ONE will read from a single one of the aforementioned replicas, and satisfies the necessary condition: WRITE(3) + READ(1) > RF(3)

    Case B

    WRITE ONE will send the data to only a single replica. In this case, the only way to get a consistent read is to read from all of them. The coordinator node will get all of the answers, figure out which one is the most recent and then send a "hint" to the out-of-date replicas, informing them that there's a newer value. The hint occurs asynchronously but only after the READ ALL occurs does it satisfy the necessary condition: WRITE(1) + READ(3) > RF(3)

    Case C

    QUORUM operations must involve FLOOR(RF / 2) + 1 replicas. In our RF=3 example, that is FLOOR(3 / 2) + 1 == 1 + 1 == 2. Again, consistency depends on both the reads and the writes. In the simplest case, the read operation talks to exactly the same replicas that the write operation used, but that's never guaranteed. In the general case, the coordinator node doing the read will talk to at least one of the replicas used by the write, so it will see the newer value. In that case, much like the READ ALL case, the coordinator node will get all of the answers, figure out which one is the most recent and then send a "hint" to the out-of-date replicas. Of course, this also satisfies the necessary condition: WRITE(2) + READ(2) > RF(3)

    So to the OP's question...

    Is it possible to "merge" cases A and B?

    To ensure consistency it is only possible to "merge" if you mean WRITE ALL + READ ALL because you can always increase the number of readers or writers in the above cases.

    However, WRITE ONE + READ ONE is not a good idea if you need to read consistent data, so my answer is: no. Again, using that disequation and our example RF=3: WRITE(1) + READ(1) > RF(3) does not hold. If you were to use this configuration, receiving an answer that there is no value cannot be trusted -- it simply means that the one replica contacted to do the read did not have a value. But values might exist on one or more of the other replicas.

    So from that logic, it might seem that doing a READ ALL on receiving a no value answer would solve the problem. And it would for that use case, but there's another to consider: what if you get some value back from the READ ALL... how do you know that the value returned is "the latest" one? That's what's meant when we want consistency. If you care about reading the most recent write, then you need to satisfy the disequation.

    Regarding the use case of "timeline" notifications in the edited question

    If my understanding of your described scenario is correct, these are the main points to your use case:

    • Most (but not all?) timeline entries will be write-once (not modified later)
    • Any such entry can be followed (there is a list of followers)
    • Any such entry can be commented upon (there is a list of comments)
    • Any comment on a timeline entry should trigger a notification to the list of followers for that timeline entry
    • Trying to minimize cost (in this case, measured as bandwidth) for the "normal" case
    • Willing to rely on the anti-entropy features built into Cassandra (e.g. read repair)

    Since most of your entries are write-once, and you care more about the existence of an entry and not necessarily the latest content for the entry, you might be able to get away with WRITE ONE + READ ONE with a fallback to READ ALL if you get no record for something that had some other indication it should exist (e.g. from a notification). For the timeline entry content, it does not sound like your case depends on consistency of the user content of the timeline entries.

    If you don't care about consistency, then this discussion is moot; read/write with whatever Consistency Level and let Cassandra's asynchronous replication and anti-entropy features do their work. That said, though your goal is minimizing network traffic/cost, if your workload is mostly reads then the added cost of doing writes at CL QUORUM or ALL may not actually be that much.

    You also said:

    This statement implies that you care about about not only whether the set of followers exists but also its contents (which users are following). You have not detailed how you are storing/tracking the followers, but unless you ensure the consistency of this data it is possible that one or more followers are not notified of a new comment because you retrieved an out-of-date version of the follower list. Or, someone who "unfollowed" a post could still receive notifications for the same reason.

    Cassandra is very flexible and allows each discrete read and write operation to use different consistency levels. Take advantage of this and ensure strong consistency where it is needed and relax it where you are sure that "reading the latest write" is not important to your application's logic and function.

    这篇关于一致性级别的读/写策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 05:41