本文介绍了ElasticSearch中的GET一致性(和Quorum)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在ES中,复制可以是同步或异步。在异步的情况下,一旦文档写入主分片,客户端就会返回成功。然后将文档异步地推送到其他副本。



当异步写入时,如何确保GET完成后即使没有传播到数据所有的副本。因为当我们在ES中执行GET时,查询将转发到相应分片的一个副本。如果我们正在异步写入,则主分片可能具有文档,但是用于执行GET的所选副本可能尚未接收/写入文档。在Cassandra中,我们可以在写入和读取时指定一致性级别(ONE,QUORUM,ALL)。对于ES中的读取可能有这样的可能吗?

解决方案

对,你可以设置要async(默认为sync)等待副本,虽然在实践中不要买你很多。



每当你读数据时,你可以指定参数来控制要从中获取文档的位置。如果您使用首选项:_primary ,请确保始终从主分片中获取文档,否则,如果在所有副本上的文档可用之前完成获取,那么可能会发生,你打破了没有它的碎片。鉴于get api实时工作,通常有意义的是保持复制同步,以便在返回索引操作后,您可以随时通过id从任何应包含它的分片中取回文档。但是,如果您尝试在首次索引文档时重新获得文档,那么可能会发现您没有找到它。



有一个写入一致性参数在弹性搜索中,但与其他数据存储的工作方式不同,与复制是同步还是异步无关。使用参数,您可以控制多少副本的数据需要可用,以便允许写入操作。如果没有足够的数据副本可用,则写入操作将失败(等待最长1分钟,间隔时间可以通过超时参数进行更改)。这只是一个初步的检查来决定是否接受操作。这并不意味着如果操作在副本上失败,它将被回滚。实际上,如果一个写操作在一个副本上失败但是在一个主文件上成功,那么这个假设就是这个副本(或者它在运行的时候)有问题,因此这个分片将被标记为失败并在另一个节点上重新创建。一致性的默认值为 quorum ,也可以设置为一个 all



说到get api,弹性搜索不是最终一致的,只是一旦一旦一个文档被索引,你可以检索



事实上,新添加的文档不可用于搜索,直到默认情况下自动执行的下一次刷新操作,并不是最终的一致性(如文件在那里,可以通过id检索),但更多关于搜索和lucene如何工作以及通过lucene如何使文档可见。


I am new to ElasticSearch and I am evaluating it for a project.

In ES, Replication can be sync or async. In case of async, the client is returned success as soon as the document is written to the primary shard. And then the document is pushed to other replicas asynchronously.

When written asynchronously, how do we ensure that when GET is done, data is returned even if it has not propagated to all the replicas. Because when we do a GET in ES, the query is forwarded to one of the replicas of the appropriate shard. Provided we are writing asynchronously, the primary shard may have the document but the selected replica for doingthe GET may not have received/written the document yet. In Cassandra, we can specify consistency levels (ONE, QUORUM, ALL) at the time of writes as well as reads. Is something like that possible for reads in ES?

解决方案

Right, you can set replication to be async (default is sync) to not wait for the replicas, although in practice this doesn't buy you much.

Whenever you read data you can specify the preference parameter to control where the documents are going to be taken from. If you use preference:_primary you make sure that you always take the document from the primary shard, otherwise, if the get is done before the document is available on all replicas, it might happen that you hit a shard that doesn't have it yet. Given that the get api works in real-time, it usually makes sense to keep replication sync, so that after the index operation returned you can always get back the document by id from any shard that is supposed to contain it. Still, if you try to get back a document while indexing it for the first time, well it can happen that you don't find it.

There is a write consistency parameter in elasticsearch as well, but it is different compared to how other data storages work, and it is not related to whether replication is sync or async. With the consistency parameter you can control how many copies of the data need to be available in order for a write operation to be permissible. If not enough copies of the data are available the write operation will fail (after waiting for up to 1 minute, interval that you can change through the timeout parameter). This is just a preliminary check to decide whether to accept the operation or not. It doesn't mean that if the operation fails on a replica it will be rollbacked. In fact, if a write operation fails on a replica but succeeds on a primary, the assumption is that there is something wrong with the replica (or the hardward it's running on), thus the shard will be marked as failed and recreated on another node. Default value for consistency is quorum, and can also be set to one or all.

That said, when it comes to the get api, elasticsearch is not eventually consistent, but just consistent as once a document is indexed you can retrieve it.

The fact that newly added documents are not available for search till the next refresh operation, which happens every second automatically by default, is not really about eventual consistency (as the documents are there and can be retrieved by id), but more about how search and lucene work and how documents are made visible through lucene.

这篇关于ElasticSearch中的GET一致性(和Quorum)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 15:52