本文介绍了not_indexed 字段存储在索引中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化我的弹性搜索方案.

I'm trying to optimize my elasticsearch scheme.

我有一个 URL 字段 - 我不想查询或过滤它,只需检索它.

I have a field which is a URL - I do not want to be able to query or filter it, just retreive it.

我的理解是,定义为 "index":"no" 的字段没有被索引,但仍然存储在索引中.(参见 http://www.slideshare.net/nitin_stephens/lucene-basics)这应该与 Lucene UnIndexed 匹配,对吗?

My understanding is that a field that is defined as "index":"no" is not indexed, but is still stored in the index. (see slide 5 in http://www.slideshare.net/nitin_stephens/lucene-basics)This should match to Lucene UnIndexed, right?

这让我很困惑,有没有一种方法可以存储某些字段,而不会占用比简单内容更多的存储空间,并且不会妨碍其他字段的索引?

This confuses me, is there a way to store some fields, without them taking more storage than simply their content, and without encumbering the index for the other fields?

我错过了什么?

推荐答案

我是在堆栈交换上发帖的新手,但相信我能帮上忙!

I'm new to posting on stack exchange but believe I can help a bit!

这里有一些注意事项:

因为你不想做额外的工作,你应该设置 "index": "no".这意味着该字段将不会通过任何标记器和过滤器运行.

As you don't want to do extra work you should set "index": "no". This will mean the field will not be run through any tokenizers and filters.

此外,当在特定字段上进行查询时,它将不可搜索:(没有命中)

Furthermore it will not be searchable when directing a query at the specific field: (no hits)

"query": {
    "term": {
        "url": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
    }
}

*这里的url"是字段名.

*here "url" is the field name.

但是该字段仍然可以在_all字段中搜索:(可能有命中)

However the field will still be searchable in the _all field: (might have a hit)

"query": {
    "term": {
        "_all": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
    }
}

_all 字段

默认情况下,每个字段都放在 _all 字段中.设置 "include_in_all": "false" 以阻止它.这对您来说可能不是问题,因为您不太可能错误地搜索带有 URL 的 _all 字段.

_all field

By default every field gets put in the _all field. Set "include_in_all": "false" to stop that. This might not be an issue with you as you are unlikely to search against the _all field with a URL by mistake.

我正在使用一种模式,其中国家/地区存储为 2 个字母代码,例如:NO"表示挪威,并且有人可能会使用NO"对所有字段进行搜索,因此我确保设置"include_in_all": "false".

I was working with a schema where countries were stored as 2 letter codes, e.g.: "NO" means Norway, and it is possible someone might do a search against the all field with "NO", so I make sure to set "include_in_all": "false".

注意:任何未明确指定字段的查询都将针对 _all 字段执行.

Note: Any query where you don't specify a field explicitly will be executed against the _all field.

默认情况下,elasticsearch 将存储您的整个文档(未分析,在您发送时),这将在命中的 _source 字段中返回给您.如果您将其关闭(如果您的 elasticsearch 数据库可能变得越来越大?)那么您需要显式设置 "store": "yes" 以单独存储字段.(要注意的一件事是 store 需要 yesno 而不是 truefalse - 它把我绊倒了!)

By default, elasticsearch will store your entire document (unanalyzed, as you sent it) and this will be returned to you in a hit's _source field. If you turn this off (if your elasticsearch db is getting huge perhaps?) then you need to explicitly set "store": "yes" to store fields individually. (One thing to notice is that store takes yes or no and not true or false - it tripped me up!)

注意:如果您这样做,您将需要明确请求您想要返回给您的字段.例如:

Note: if you do this you will need to request the fields you want returned to you explicitly. e.g.:

curl -XGET http://path/index_name/type_name/id?fields=url,another_field

终于……

我会让 elasticsearch 存储您的整个文档(作为默认值)并使用以下映射.

finally...

I would leave elasticsearch to store your whole document (as the default) and use the following mapping.

"type_name": {
    "properties": {
        "url": {
            "type": "string",
            "index": "no",
            "include_in_all": "false"
        },
        // other fields' mappings
    }
}

来源:elasticsearch 文档

这篇关于not_indexed 字段存储在索引中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 21:43