问题描述
我很喜欢弹性搜索。我将弹性搜索索引中的字段映射到string。如果字段值包含给定的搜索文本,我需要检索文档。 JSON1:{\id\:\1\,\message\ :\欢迎使用弹性搜索\
JSON2:{\id\:\2\,\message\:\弹性搜索\}
如果我用弹性搜索,我需要同时获取这两个记录。我只得到第一个。
现在我正在获取基于FTS的文档。请指导我在弹性搜索中实现psql中的/ ilike搜索。
提前感谢
这是一个令牌化的问题。你可以看看NGram
您可以使用路线 / _analyze
默认情况下,弹性搜索是如何标记的。
curl -XGET'localhost:9200 / _analyze?tokenizer = standard'-d'这是一个测试弹性搜索'
{
tokens:[{
token:this,
start_offset:0,
end_offset
type:< ALPHANUM>,
position:1
},{
token:is,
start_offset :5,
end_offset:7,
type:< ALPHANUM>,
position:2
},{
:a,
start_offset:8,
end_offset:9,
type:< ALPHANUM>,
position:3
},{
token:test,
start_offset b $ bend_offset:14,
type:< ALPHANUM>,
position:4
},{
token:elasticsearch ,
start_offset:15,
end_offset:28,
type:< ALPHANUM>,
position:5
]
}
这里是一个例子,nGram和默认值
curl -XGET'localhost:9200 / _analyze?tokenizer = nGram'-d'this是一个测试弹性搜索'
{
tokens:[{
token:t,
start_offset:0,
end_offset:1,
type:word,
position
},{
token:h,
start_offset:1,
end_offset 2,
type:word,
position:2
},{
token:i,
start_offset 2,
end_offset:3,
type:word,
position:3
},{
token ,
start_offset:3,
end_offset:4,
type:word,
position:4
},{
token:,
start_offset:4,
end_offset:5,
type:word,
position :5
},{
token:i,
start_offset:5,
end_offset:6,
type单词,
position:6
},{
token:s,
start_offset:6,
end_offset ,
type:word,
position:7
},{
token:,
start_offset:7,
end_offset:8,
type:word,
position :8
},{
token:a,
start_offset:8,
end_offset:9,
type word
position:9
},{
token:,
start_offset:9,
end_offset
type:word,
position:10
},{
token:t,
start_offset
end_offset:11,
type:word,
position:11
},{
token:e
start_offset:11,
end_offset:12,
type:word,
position:12
},{
token:s,
start_offset:12,
end_offset:13,
type:word,
position:13
},{
token:t,
start_offset:13,
end_offset:14,
type:word,
position:14
},{
:
start_offset:14,
end_offset:15,
type:word,
position:15
},{
token:e,
start_offset:15,
end_offset:16,
type:word,
position:16
},{
token:l,
start_offset:16,
end_offset:17,
类型:word,
position:17
},{
token:a,
start_offset:17,
end_offset:18,
type:word,
position:18
},{
token:s,
start_offset:18,
end_offset:19,
:word,
position:19
},{
token:t,
start_offset:19,
end_offset :20,
type:word,
position:20
},{
token:i,
start_offset :20,
end_offset:21,
type:word,
position:21
},{
token c,
start_offset:21,
end_offset:22,
type:word,
position:22
} ,{
token:s,
start_offset:22,
end_offset:23,
type:word,
position:23
},{
token:e,
start_offset:23,
end_offset:24,
type:word,
position:24
},{
token:a,
start_offset:24,
end_offset:25,
type:word,
position 25
},{
token:r,
start_offset:25,
end_offset:26,
type:
position:26
},{
token:c,
start_offset:26,
end_offset
type:word,
position:27
},{
token:h,
start_offset
end_offset:28,
type:word,
position:28
},{
token:th
start_offset:0,
end_offset:2 ,
type:word,
position:29
},{
token:
start_offset ,
end_offset:3,
type:word,
position:30
},{
token:is ,
start_offset:2,
end_offset:4,
type:word,
position:31
},{
token:s,
start_offset:3,
end_offset:5,
type:word,
position :$ 32
},$ $ $ $ $ $ $ $$$$$$$$单词,
position:33
},{
token:is,
start_offset:5,
end_offset ,
type:word,
position:34
},
token:s,
start_offset:6,
end_offset:8,
type:word,
position:35
},{
token:a,
start_offset:7,
end_offset:9,
类型:word,
position:36
},{
token:a,
start_offset:8,
end_offset:10,
type:word,
position:37
},{
token:t,
start_offset:9,
end_offset:11,
type:word,
position:38
},{
:te,
start_offset:10,
end_offset:12,
type:word,
position:39
},{
token:es,
start_offse t:11,
end_offset:13,
type:word,
position:40
},{
token :st,
start_offset:12,
end_offset:14,
type:word,
position:41
},{
token:t,
start_offset:13,
end_offset:15,
type:word,
position:42
},{
token:e,
start_offset:14,
end_offset:16,
类型:word,
position:43
},{
token:el,
start_offset:15,
end_offset:17,
type:word,
position:44
},{
token:la,
start_offset:16,
end_offset:18,
type :word,
position:45
},{
token:as,
start_offset:17,
end_offset :19,
type:word,
position:46
},{
token:st,
start_offset :18,
end_offset:20,
type:word,
position:47
},{
token ti,
start_offset:19,
end_offset:21,
type:word,
position:48
} ,{
token:ic,
start_offset:20,
end_offset:22,
type:word,
position:49
},{
token:cs,
start_offset:21,
end_offset:23,
:word,
position:50
},{
令牌:se,
start_offset:22,
end_offset:24,
type:word,
position:51
},{
token:ea,
start_offset:23,
end_offset:25,
type:word
position:52
},{
token:ar,
start_offset:24,
end_offset:26,
type:word,
position:53
},{
token:rc,
start_offset:25,
end_offset:27,
type:word,
position:54
},{
token:ch,
start_offset:26,
end_offset:28,
type:word,
position:55
}
]
}
这是一个链接在您的索引
中设置适当的分析器/标记器的示例
然后你的查询应该返回预期的文档。
I am new to Elastic Search. I mapped a field to 'string' in Elastic search index. I need to retrieve the documents if field value contains the given search text.
JSON1 : "{\"id\":\"1\",\"message\":\"Welcome to elastic search\"}"
JSON2 : "{\"id\":\"2\",\"message\":\"elasticsearch\"}"
If I search with 'elastic', I need to get both the records. I am getting only first one.
Now I am getting documents based on FTS. Please guide me to achieve search like/ilike in psql in Elastic Search.
Thanks in advance.
It's a matter of tokenizer. You can take a look at NGram http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/
You can test it using the route /_analyze
Here is how Elasticsearch tokenize by default.
curl -XGET 'localhost:9200/_analyze?tokenizer=standard' -d 'this is a test elasticsearch'
{
"tokens": [{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 1
}, {
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 2
}, {
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 3
}, {
"token": "test",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 4
}, {
"token": "elasticsearch",
"start_offset": 15,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 5
}
]
}
Here is an example with nGram and the default values
curl -XGET 'localhost:9200/_analyze?tokenizer=nGram' -d 'this is a test elasticsearch'
{
"tokens": [{
"token": "t",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
}, {
"token": "h",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
}, {
"token": "i",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 3
}, {
"token": "s",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 4
}, {
"token": " ",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 5
}, {
"token": "i",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 6
}, {
"token": "s",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 7
}, {
"token": " ",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 8
}, {
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 9
}, {
"token": " ",
"start_offset": 9,
"end_offset": 10,
"type": "word",
"position": 10
}, {
"token": "t",
"start_offset": 10,
"end_offset": 11,
"type": "word",
"position": 11
}, {
"token": "e",
"start_offset": 11,
"end_offset": 12,
"type": "word",
"position": 12
}, {
"token": "s",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 13
}, {
"token": "t",
"start_offset": 13,
"end_offset": 14,
"type": "word",
"position": 14
}, {
"token": " ",
"start_offset": 14,
"end_offset": 15,
"type": "word",
"position": 15
}, {
"token": "e",
"start_offset": 15,
"end_offset": 16,
"type": "word",
"position": 16
}, {
"token": "l",
"start_offset": 16,
"end_offset": 17,
"type": "word",
"position": 17
}, {
"token": "a",
"start_offset": 17,
"end_offset": 18,
"type": "word",
"position": 18
}, {
"token": "s",
"start_offset": 18,
"end_offset": 19,
"type": "word",
"position": 19
}, {
"token": "t",
"start_offset": 19,
"end_offset": 20,
"type": "word",
"position": 20
}, {
"token": "i",
"start_offset": 20,
"end_offset": 21,
"type": "word",
"position": 21
}, {
"token": "c",
"start_offset": 21,
"end_offset": 22,
"type": "word",
"position": 22
}, {
"token": "s",
"start_offset": 22,
"end_offset": 23,
"type": "word",
"position": 23
}, {
"token": "e",
"start_offset": 23,
"end_offset": 24,
"type": "word",
"position": 24
}, {
"token": "a",
"start_offset": 24,
"end_offset": 25,
"type": "word",
"position": 25
}, {
"token": "r",
"start_offset": 25,
"end_offset": 26,
"type": "word",
"position": 26
}, {
"token": "c",
"start_offset": 26,
"end_offset": 27,
"type": "word",
"position": 27
}, {
"token": "h",
"start_offset": 27,
"end_offset": 28,
"type": "word",
"position": 28
}, {
"token": "th",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 29
}, {
"token": "hi",
"start_offset": 1,
"end_offset": 3,
"type": "word",
"position": 30
}, {
"token": "is",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 31
}, {
"token": "s ",
"start_offset": 3,
"end_offset": 5,
"type": "word",
"position": 32
}, {
"token": " i",
"start_offset": 4,
"end_offset": 6,
"type": "word",
"position": 33
}, {
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 34
}, {
"token": "s ",
"start_offset": 6,
"end_offset": 8,
"type": "word",
"position": 35
}, {
"token": " a",
"start_offset": 7,
"end_offset": 9,
"type": "word",
"position": 36
}, {
"token": "a ",
"start_offset": 8,
"end_offset": 10,
"type": "word",
"position": 37
}, {
"token": " t",
"start_offset": 9,
"end_offset": 11,
"type": "word",
"position": 38
}, {
"token": "te",
"start_offset": 10,
"end_offset": 12,
"type": "word",
"position": 39
}, {
"token": "es",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 40
}, {
"token": "st",
"start_offset": 12,
"end_offset": 14,
"type": "word",
"position": 41
}, {
"token": "t ",
"start_offset": 13,
"end_offset": 15,
"type": "word",
"position": 42
}, {
"token": " e",
"start_offset": 14,
"end_offset": 16,
"type": "word",
"position": 43
}, {
"token": "el",
"start_offset": 15,
"end_offset": 17,
"type": "word",
"position": 44
}, {
"token": "la",
"start_offset": 16,
"end_offset": 18,
"type": "word",
"position": 45
}, {
"token": "as",
"start_offset": 17,
"end_offset": 19,
"type": "word",
"position": 46
}, {
"token": "st",
"start_offset": 18,
"end_offset": 20,
"type": "word",
"position": 47
}, {
"token": "ti",
"start_offset": 19,
"end_offset": 21,
"type": "word",
"position": 48
}, {
"token": "ic",
"start_offset": 20,
"end_offset": 22,
"type": "word",
"position": 49
}, {
"token": "cs",
"start_offset": 21,
"end_offset": 23,
"type": "word",
"position": 50
}, {
"token": "se",
"start_offset": 22,
"end_offset": 24,
"type": "word",
"position": 51
}, {
"token": "ea",
"start_offset": 23,
"end_offset": 25,
"type": "word",
"position": 52
}, {
"token": "ar",
"start_offset": 24,
"end_offset": 26,
"type": "word",
"position": 53
}, {
"token": "rc",
"start_offset": 25,
"end_offset": 27,
"type": "word",
"position": 54
}, {
"token": "ch",
"start_offset": 26,
"end_offset": 28,
"type": "word",
"position": 55
}
]
}
Here is a link with an example to set the proper analyzer/tokenizer in your indexHow to setup a tokenizer in elasticsearch
Then your query should return the expected documents.
这篇关于如果源包含Elastic Search Server中的给定搜索文本,则获取所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!