本文介绍了如何在Elastic-search中检索超过10000个结果/事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例查询:

GET hostname:port /myIndex/_search { 
    "size": 10000,
    "query": {
        "term": { "field": "myField" }
    }
}

我一直在使用size选项,知道:

I have been using the size option knowing that:

但是,如果我的查询的大小为650,000个文档,甚至更多,我该如何在一个文档中检索所有结果GET?

But if my query has the size of 650,000 Documents for example or even more, how can I retrieve all of the results in one GET?

我一直在阅读有关SCROLL,FROM-TO和PAGINATION API的信息,但它们都不提供超过1万个。

I have been reading about the SCROLL, FROM-TO, and the PAGINATION API, but all of them never deliver more than 10K.

这是我一直在使用的Elasticsearch论坛示例:

This is the example from Elasticsearch Forum, that I have been using:

GET /_search?scroll=1m

谁能提供一个示例,您可以在其中检索GET搜索查​​询的所有文档?

Can anybody provide an example where you can retrieve all the documents for a GET search query?

推荐答案

如果您要检索文档数量多,从某种意义上讲,它超出了可以提高的默认限制10000。

Scroll is the way to go if you want to retrieve a high number of documents, high in the sense that it's way over the 10000 default limit, which can be raised.

第一个请求需要指定您要进行的查询以及 scroll 参数和持续时间(在下面的示例中为1分钟)

The first request needs to specify the query you want to make and the scroll parameter with duration before the search context times out (1 minute in the example below)

POST /index/type/_search?scroll=1m
{
    "size": 1000,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}

在对第一个调用的响应中,您得到一个 _scroll_id ,您需要使用它来进行第二个调用:

In the response to that first call, you get a _scroll_id that you need to use to make the second call:

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}

在随后的每个响应中,您将获得一个新的 _scroll_id 您需要在下次调用之前使用它,直到检索到所需的文档数量为止。

In each subsequent response, you'll get a new _scroll_id that you need to use for the next call until you've retrieved the amount of documents you need.

因此,在伪代码中,它看起来像这样:

So in pseudo code it looks somewhat like this:

# first request
response = request('POST /index/type/_search?scroll=1m')
docs = [ response.hits ]
scroll_id = response._scroll_id

# subsequent requests
while (true) {
   response = request('POST /_search/scroll', scroll_id)
   docs.push(response.hits)
   scroll_id = response._scroll_id
}

这篇关于如何在Elastic-search中检索超过10000个结果/事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 02:47