Cassandra中的原子批是如何工作的？

本文介绍了Cassandra中的原子批是如何工作的？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

原子批次如何保证单个批次中的所有语句都将执行或不执行？

解决方案

为了理解批次如何工作，它有助于查看批处理执行的各个阶段。 / p>

客户

支持使用或现代Cassandra客户端API。在每种情况下，您都可以指定要作为批处理的一部分执行的语句列表，用于所有语句的一致性级别和可选的时间戳。您将能够批量执行INSERT，DELETE和UPDATE语句。如果您选择不提供时间戳，当前时间将自动使用并与批处理相关联。

客户端将必须处理两个异常，

UnavailableException - 没有足够的节点执行指定批次CL的任何更新

WriteTimeoutException - 在写批处理或应用批处理中的任何更新时超时。这可以通过读取异常的 writeType 值（ BATCH_LOG 或 BATCH ）。

在batchlog阶段写入失败将为）：

在大多数情况下，协调器只会将批处理中的所有语句写入集群，但是在写入超时的情况下，我们必须检查超时发生的位置，方法是读取 writeType 值。批处理必须已写入批处理日志，以确保这些保证仍然适用。此外，在这一点上，其他客户端也可以从批处理中读取部分执行的结果。

回到这个问题，Cassandra如何保证一个批次中的所有或全部语句都不会被执行？
原子批处理基本上取决于成功的复制和幂等语句。这不是一个100％保证的解决方案，因为理论上可能有，仍然会导致不一致。但对于Cassandra中的很多用例它是一个非常有用的工具，如果你知道它的工作原理。

How can atomic batches guarantee that either all statements in a single batch will be executed or none?

解决方案

In order to understand how batches work under the hood, its helpful to look at the individual stages of the batch execution.

The client

Batches are supported using CQL3 or modern Cassandra client APIs. In each case you'll be able to specify a list of statements you want to execute as part of the batch, a consistency level to be used for all statements and an optional timestamp. You'll be able to batch execute INSERT, DELETE and UPDATE statements. If you choose not to provide a timestamp, the current time is automatically used and associated with the batch.

The client will have to handle two exception in case the batch could not be executed successfully.

UnavailableException - there are not enough nodes alive to fulfill any of the updates with specified batch CL
WriteTimeoutException - timeout while either writing the batchlog or applying any of the updates within the batch. This can be checked by reading the writeType value of the exception (either BATCH_LOG or BATCH).

Failed writes during the batchlog stage will be retried once automatically by the DefaultRetryPolicy in the Java driver. Batchlog creation is critical to ensure that a batch will always be completed in case the coordinator fails mid-operation. Read on for finding out why.

The coordinator

All batches send by the client will be executed by the coordinator just as with any write operation. Whats different from normal write operations is that Cassandra will also make use of a dedicated log that will contain all pending batches currently executed (called the batchlog). This log will be stored in the local system keyspace and is managed by each node individually. Each batch execution starts by creating a log entry with the complete batch on preferably two nodes other than the coordinator. After the coordinator was able to create the batchlog on the other nodes, it will start to execute the actual statements in the batch.

Each statement in the batch will be written to the replicas using the CL and timestamp of the whole batch. Beside from that, there's nothing special about writes happening at this point. Writes may also be hinted or throw a WriteTimeoutException, which can be handled by the client (see above).

After the batch has been executed, all created batchlogs can be safely removed. Therefor the coordinator will send a batchlog delete message upon successfull execution to the nodes that have received the batchlog before. This happens in the background and will go unnoticed in case it fails.

Lets wrap up what the coordinator does during batch execution:

sends batchlog to two other nodes (preferably in different racks)
execute all statements in batch
deletes batchlog from nodes again after successful batch execution

The batchlog replica nodes

As described above, the batchlog will be replicated across two other nodes (if the cluster size allows it) before batch execution. The idea is that any of these nodes will be able to pick up pending batches in case the coordinator will go down before finishing all statements in the batch.

What makes thinks a bit complicated is the fact that those nodes won't notice that the coordinator is not alive anymore. The only point at which the batchlog nodes will be updated with the current status of the batch execution, is when the coordinator is issuing a delete messages indicating the batch has been successfully executed. In case such a message doesn't arrive, the batchlog nodes will assume the batch hasn't been executed for some reasons and replay the batch from the log.

Batchlog replay is taking place potentially every minute, ie. that is the interval a node will check if there are any pending batches in the local batchlog that haven't been deleted by the -possibly killed- coordinator. To give the coordinator some time between the batchlog creation and the actual execution, a fixed grace period is used (write_request_timeout_in_ms * 2, default 4 sec). In case that the batchlog still exists after 4 sec, it will be replayed.

Just as with any write operation in Cassandra, timeouts may occur. In this case the node will fall back writing hints for the timed out operations. When timed out replicas will be up again, writes can resume from hints. This behavior doesn't seem to be effected whether hinted_handoff_enabled is enabled or not. There's also a TTL value associated with the hint which will cause the hint to be discarded after a longer period of time (smallest GCGraceSeconds for any involved CF).

Now you might be wondering if it isn't potentially dangerous to replay a batch on two nodes at the same time, which may happen has we replicate the batchlog on two nodes. Whats important to keep in mind here is that each batch execution will be idempotent due to the limited kind of supported operations (updates and deletes) and the fixed timestamp associated to the batch. There won't be any conflicts even if both nodes and the coordinator will retry executing the batch at the same time.

Atomicity guarantees

Lets get back to the atomicity aspects of "atomic batches" and review what exactly is meant with atomic (source):

So in a sense we get "all or nothing" guarantees. In most cases the coordinator will just write all the statements in the batch to the cluster. However, in case of a write timeout, we must check at which point the timeout occurred by reading the writeType value. The batch must have been written to the batchlog in order to be sure that those guarantees still apply. Also at this point other clients may also read partially executed results from the batch.

Getting back to the question, how can Cassandra guarantee that either all or no statements at all in a batch will be executed? Atomic batches basically depend on successful replication and idempotent statements. It's not a 100% guaranteed solution as in theory there might be scenarios that will still cause inconsistencies. But for a lot of use cases in Cassandra its a very useful tool if you're aware how it works.

这篇关于Cassandra中的原子批是如何工作的？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！