本文介绍了如何在 Elastic-search 中检索超过 10000 个结果/事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例查询:

GET hostname:port /myIndex/_search { 
    "size": 10000,
    "query": {
        "term": { "field": "myField" }
    }
}

我一直在使用 size 选项,知道:

I have been using the size option knowing that:

index.max_result_window = 100000

但如果我的查询有 650,000 个文档甚至更多,我如何才能在一次 GET 中检索所有结果?

But if my query has the size of 650,000 Documents for example or even more, how can I retrieve all of the results in one GET?

我一直在阅读有关 SCROLL、FROM-TO 和 PAGINATION API 的文章,但它们的交付量从未超过 10K.

I have been reading about the SCROLL, FROM-TO, and the PAGINATION API, but all of them never deliver more than 10K.

这是来自 Elasticsearch 论坛的示例,我一直在使用:

This is the example from Elasticsearch Forum, that I have been using:

GET /_search?scroll=1m

谁能提供一个示例,您可以在其中检索 GET 搜索查询的所有文档吗?

Can anybody provide an example where you can retrieve all the documents for a GET search query?

推荐答案

如果您想检索大量文档,滚动是要走的路,从某种意义上说,它远远超过了 10000 个默认限制,可以是提升.

Scroll is the way to go if you want to retrieve a high number of documents, high in the sense that it's way over the 10000 default limit, which can be raised.

第一个请求需要指定您要进行的查询和具有持续时间的 scroll 参数 在搜索上下文超时之前(在下面的示例中为 1 分钟)

The first request needs to specify the query you want to make and the scroll parameter with duration before the search context times out (1 minute in the example below)

POST /index/type/_search?scroll=1m
{
    "size": 1000,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

在对第一次调用的响应中,您会得到一个 _scroll_id,您需要使用它来进行第二次调用:

In the response to that first call, you get a _scroll_id that you need to use to make the second call:

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

在每个后续响应中,您将获得一个新的 _scroll_id,您需要将其用于下一次调用,直到您检索到所需数量的文档为止.

In each subsequent response, you'll get a new _scroll_id that you need to use for the next call until you've retrieved the amount of documents you need.

所以在伪代码中它看起来有点像这样:

So in pseudo code it looks somewhat like this:

# first request
response = request('POST /index/type/_search?scroll=1m')
docs = [ response.hits ]
scroll_id = response._scroll_id

# subsequent requests
while (true) {
   response = request('POST /_search/scroll', scroll_id)
   docs.push(response.hits)
   scroll_id = response._scroll_id
}

更新:

关于深度分页的最佳解决方案,请参考以下更准确的答案:弹性搜索 - 滚动行为

Please refer to the following answer which is more accurate regarding the best solution for deep pagination: Elastic Search - Scroll behavior

这篇关于如何在 Elastic-search 中检索超过 10000 个结果/事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 02:47