本文介绍了如果源包含Elastic Search Server中的给定搜索文本,则获取所有文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很喜欢弹性搜索。我将弹性搜索索引中的字段映射到string。如果字段值包含给定的搜索文本,我需要检索文档。

  JSON1:{\id\:\1\,\message\ :\欢迎使用弹性搜索\
JSON2:{\id\:\2\,\message\:\弹性搜索\}

如果我用弹性搜索,我需要同时获取这两个记录。我只得到第一个。



现在我正在获取基于FTS的文档。请指导我在弹性搜索中实现psql中的/ ilike搜索。



提前感谢

解决方案

这是一个令牌化的问题。你可以看看NGram



您可以使用路线 / _analyze



默认情况下,弹性搜索是如何标记的。



curl -XGET'localhost:9200 / _analyze?tokenizer = standard'-d'这是一个测试弹性搜索'

  {
tokens:[{
token:this,
start_offset:0,
end_offset
type:< ALPHANUM>,
position:1
},{
token:is,
start_offset :5,
end_offset:7,
type:< ALPHANUM>,
position:2
},{
:a,
start_offset:8,
end_offset:9,
type:< ALPHANUM>,
position:3
},{
token:test,
start_offset b $ bend_offset:14,
type:< ALPHANUM>,
position:4
},{
token:elasticsearch ,
start_offset:15,
end_offset:28,
type:< ALPHANUM>,
position:5

]

}



这里是一个例子,nGram和默认值



curl -XGET'localhost:9200 / _analyze?tokenizer = nGram'-d'this是一个测试弹性搜索'

  {
tokens:[{
token:t,
start_offset:0,
end_offset:1,
type:word,
position
},{
token:h,
start_offset:1,
end_offset 2,
type:word,
position:2
},{
token:i,
start_offset 2,
end_offset:3,
type:word,
position:3
},{
token ,
start_offset:3,
end_offset:4,
type:word,
position:4
},{
token:,
start_offset:4,
end_offset:5,
type:word,
position :5
},{
token:i,
start_offset:5,
end_offset:6,
type单词,
position:6
},{
token:s,
start_offset:6,
end_offset ,
type:word,
position:7
},{
token:,
start_offset:7,
end_offset:8,
type:word,
position :8
},{
token:a,
start_offset:8,
end_offset:9,
type word
position:9
},{
token:,
start_offset:9,
end_offset
type:word,
position:10
},{
token:t,
start_offset
end_offset:11,
type:word,
position:11
},{
token:e
start_offset:11,
end_offset:12,
type:word,
position:12
},{
token:s,
start_offset:12,
end_offset:13,
type:word,
position:13
},{
token:t,
start_offset:13,
end_offset:14,
type:word,
position:14
},{

start_offset:14,
end_offset:15,
type:word,
position:15
},{
token:e,
start_offset:15,
end_offset:16,
type:word,
position:16
},{
token:l,
start_offset:16,
end_offset:17,
类型:word,
position:17
},{
token:a,
start_offset:17,
end_offset:18,
type:word,
position:18
},{
token:s,
start_offset:18,
end_offset:19,
:word,
position:19
},{
token:t,
start_offset:19,
end_offset :20,
type:word,
position:20
},{
token:i,
start_offset :20,
end_offset:21,
type:word,
position:21
},{
token c,
start_offset:21,
end_offset:22,
type:word,
position:22
} ,{
token:s,
start_offset:22,
end_offset:23,
type:word,
position:23
},{
token:e,
start_offset:23,
end_offset:24,
type:word,
position:24
},{
token:a,
start_offset:24,
end_offset:25,
type:word,
position 25
},{
token:r,
start_offset:25,
end_offset:26,
type:
position:26
},{
token:c,
start_offset:26,
end_offset
type:word,
position:27
},{
token:h,
start_offset
end_offset:28,
type:word,
position:28
},{
token:th
start_offset:0,
end_offset:2 ,
type:word,
position:29
},{
token:
start_offset ,
end_offset:3,
type:word,
position:30
},{
token:is ,
start_offset:2,
end_offset:4,
type:word,
position:31
},{
token:s,
start_offset:3,
end_offset:5,
type:word,
position :$ 32
},$ $ $ $ $ $ $ $$$$$$$$单词,
position:33
},{
token:is,
start_offset:5,
end_offset ,
type:word,
position:34
},
token:s,
start_offset:6,
end_offset:8,
type:word,
position:35
},{
token:a,
start_offset:7,
end_offset:9,
类型:word,
position:36
},{
token:a,
start_offset:8,
end_offset:10,
type:word,
position:37
},{
token:t,
start_offset:9,
end_offset:11,
type:word,
position:38
},{
:te,
start_offset:10,
end_offset:12,
type:word,
position:39
},{
token:es,
start_offse t:11,
end_offset:13,
type:word,
position:40
},{
token :st,
start_offset:12,
end_offset:14,
type:word,
position:41
},{
token:t,
start_offset:13,
end_offset:15,
type:word,
position:42
},{
token:e,
start_offset:14,
end_offset:16,
类型:word,
position:43
},{
token:el,
start_offset:15,
end_offset:17,
type:word,
position:44
},{
token:la,
start_offset:16,
end_offset:18,
type :word,
position:45
},{
token:as,
start_offset:17,
end_offset :19,
type:word,
position:46
},{
token:st,
start_offset :18,
end_offset:20,
type:word,
position:47
},{
token ti,
start_offset:19,
end_offset:21,
type:word,
position:48
} ,{
token:ic,
start_offset:20,
end_offset:22,
type:word,
position:49
},{
token:cs,
start_offset:21,
end_offset:23,
:word,
position:50
},{
令牌:se,
start_offset:22,
end_offset:24,
type:word,
position:51
},{
token:ea,
start_offset:23,
end_offset:25,
type:word
position:52
},{
token:ar,
start_offset:24,
end_offset:26,
type:word,
position:53
},{
token:rc,
start_offset:25,
end_offset:27,
type:word,
position:54
},{
token:ch,
start_offset:26,
end_offset:28,
type:word,
position:55
}
]
}

这是一个链接在您的索引
中设置适当的分析器/标记器的示例



然后你的查询应该返回预期的文档。


I am new to Elastic Search. I mapped a field to 'string' in Elastic search index. I need to retrieve the documents if field value contains the given search text.

JSON1 : "{\"id\":\"1\",\"message\":\"Welcome to elastic search\"}"
JSON2 : "{\"id\":\"2\",\"message\":\"elasticsearch\"}"

If I search with 'elastic', I need to get both the records. I am getting only first one.

Now I am getting documents based on FTS. Please guide me to achieve search like/ilike in psql in Elastic Search.

Thanks in advance.

解决方案

It's a matter of tokenizer. You can take a look at NGram http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/

You can test it using the route /_analyze

Here is how Elasticsearch tokenize by default.

curl -XGET 'localhost:9200/_analyze?tokenizer=standard' -d 'this is a test elasticsearch'

{
"tokens": [{
        "token": "this",
        "start_offset": 0,
        "end_offset": 4,
        "type": "<ALPHANUM>",
        "position": 1
    }, {
        "token": "is",
        "start_offset": 5,
        "end_offset": 7,
        "type": "<ALPHANUM>",
        "position": 2
    }, {
        "token": "a",
        "start_offset": 8,
        "end_offset": 9,
        "type": "<ALPHANUM>",
        "position": 3
    }, {
        "token": "test",
        "start_offset": 10,
        "end_offset": 14,
        "type": "<ALPHANUM>",
        "position": 4
    }, {
        "token": "elasticsearch",
        "start_offset": 15,
        "end_offset": 28,
        "type": "<ALPHANUM>",
        "position": 5
    }
]

}

Here is an example with nGram and the default values

curl -XGET 'localhost:9200/_analyze?tokenizer=nGram' -d 'this is a test elasticsearch'

{
    "tokens": [{
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "word",
            "position": 1
        }, {
            "token": "h",
            "start_offset": 1,
            "end_offset": 2,
            "type": "word",
            "position": 2
        }, {
            "token": "i",
            "start_offset": 2,
            "end_offset": 3,
            "type": "word",
            "position": 3
        }, {
            "token": "s",
            "start_offset": 3,
            "end_offset": 4,
            "type": "word",
            "position": 4
        }, {
            "token": " ",
            "start_offset": 4,
            "end_offset": 5,
            "type": "word",
            "position": 5
        }, {
            "token": "i",
            "start_offset": 5,
            "end_offset": 6,
            "type": "word",
            "position": 6
        }, {
            "token": "s",
            "start_offset": 6,
            "end_offset": 7,
            "type": "word",
            "position": 7
        }, {
            "token": " ",
            "start_offset": 7,
            "end_offset": 8,
            "type": "word",
            "position": 8
        }, {
            "token": "a",
            "start_offset": 8,
            "end_offset": 9,
            "type": "word",
            "position": 9
        }, {
            "token": " ",
            "start_offset": 9,
            "end_offset": 10,
            "type": "word",
            "position": 10
        }, {
            "token": "t",
            "start_offset": 10,
            "end_offset": 11,
            "type": "word",
            "position": 11
        }, {
            "token": "e",
            "start_offset": 11,
            "end_offset": 12,
            "type": "word",
            "position": 12
        }, {
            "token": "s",
            "start_offset": 12,
            "end_offset": 13,
            "type": "word",
            "position": 13
        }, {
            "token": "t",
            "start_offset": 13,
            "end_offset": 14,
            "type": "word",
            "position": 14
        }, {
            "token": " ",
            "start_offset": 14,
            "end_offset": 15,
            "type": "word",
            "position": 15
        }, {
            "token": "e",
            "start_offset": 15,
            "end_offset": 16,
            "type": "word",
            "position": 16
        }, {
            "token": "l",
            "start_offset": 16,
            "end_offset": 17,
            "type": "word",
            "position": 17
        }, {
            "token": "a",
            "start_offset": 17,
            "end_offset": 18,
            "type": "word",
            "position": 18
        }, {
            "token": "s",
            "start_offset": 18,
            "end_offset": 19,
            "type": "word",
            "position": 19
        }, {
            "token": "t",
            "start_offset": 19,
            "end_offset": 20,
            "type": "word",
            "position": 20
        }, {
            "token": "i",
            "start_offset": 20,
            "end_offset": 21,
            "type": "word",
            "position": 21
        }, {
            "token": "c",
            "start_offset": 21,
            "end_offset": 22,
            "type": "word",
            "position": 22
        }, {
            "token": "s",
            "start_offset": 22,
            "end_offset": 23,
            "type": "word",
            "position": 23
        }, {
            "token": "e",
            "start_offset": 23,
            "end_offset": 24,
            "type": "word",
            "position": 24
        }, {
            "token": "a",
            "start_offset": 24,
            "end_offset": 25,
            "type": "word",
            "position": 25
        }, {
            "token": "r",
            "start_offset": 25,
            "end_offset": 26,
            "type": "word",
            "position": 26
        }, {
            "token": "c",
            "start_offset": 26,
            "end_offset": 27,
            "type": "word",
            "position": 27
        }, {
            "token": "h",
            "start_offset": 27,
            "end_offset": 28,
            "type": "word",
            "position": 28
        }, {
            "token": "th",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 29
        }, {
            "token": "hi",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 30
        }, {
            "token": "is",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 31
        }, {
            "token": "s ",
            "start_offset": 3,
            "end_offset": 5,
            "type": "word",
            "position": 32
        }, {
            "token": " i",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 33
        }, {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 34
        }, {
            "token": "s ",
            "start_offset": 6,
            "end_offset": 8,
            "type": "word",
            "position": 35
        }, {
            "token": " a",
            "start_offset": 7,
            "end_offset": 9,
            "type": "word",
            "position": 36
        }, {
            "token": "a ",
            "start_offset": 8,
            "end_offset": 10,
            "type": "word",
            "position": 37
        }, {
            "token": " t",
            "start_offset": 9,
            "end_offset": 11,
            "type": "word",
            "position": 38
        }, {
            "token": "te",
            "start_offset": 10,
            "end_offset": 12,
            "type": "word",
            "position": 39
        }, {
            "token": "es",
            "start_offset": 11,
            "end_offset": 13,
            "type": "word",
            "position": 40
        }, {
            "token": "st",
            "start_offset": 12,
            "end_offset": 14,
            "type": "word",
            "position": 41
        }, {
            "token": "t ",
            "start_offset": 13,
            "end_offset": 15,
            "type": "word",
            "position": 42
        }, {
            "token": " e",
            "start_offset": 14,
            "end_offset": 16,
            "type": "word",
            "position": 43
        }, {
            "token": "el",
            "start_offset": 15,
            "end_offset": 17,
            "type": "word",
            "position": 44
        }, {
            "token": "la",
            "start_offset": 16,
            "end_offset": 18,
            "type": "word",
            "position": 45
        }, {
            "token": "as",
            "start_offset": 17,
            "end_offset": 19,
            "type": "word",
            "position": 46
        }, {
            "token": "st",
            "start_offset": 18,
            "end_offset": 20,
            "type": "word",
            "position": 47
        }, {
            "token": "ti",
            "start_offset": 19,
            "end_offset": 21,
            "type": "word",
            "position": 48
        }, {
            "token": "ic",
            "start_offset": 20,
            "end_offset": 22,
            "type": "word",
            "position": 49
        }, {
            "token": "cs",
            "start_offset": 21,
            "end_offset": 23,
            "type": "word",
            "position": 50
        }, {
            "token": "se",
            "start_offset": 22,
            "end_offset": 24,
            "type": "word",
            "position": 51
        }, {
            "token": "ea",
            "start_offset": 23,
            "end_offset": 25,
            "type": "word",
            "position": 52
        }, {
            "token": "ar",
            "start_offset": 24,
            "end_offset": 26,
            "type": "word",
            "position": 53
        }, {
            "token": "rc",
            "start_offset": 25,
            "end_offset": 27,
            "type": "word",
            "position": 54
        }, {
            "token": "ch",
            "start_offset": 26,
            "end_offset": 28,
            "type": "word",
            "position": 55
        }
    ]
}

Here is a link with an example to set the proper analyzer/tokenizer in your indexHow to setup a tokenizer in elasticsearch

Then your query should return the expected documents.

这篇关于如果源包含Elastic Search Server中的给定搜索文本,则获取所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 08:06