在搜索文本并请求结果查询突出显示时,如果匹配的文档字段包含感叹号,则返回的突出显示的文本将不包含包含感叹号的文本的一部分
Elasticsearch 7.1.1版
文件:{ "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]"}
搜索突出显示“inc”通配符
预期:
高亮文字应为:
"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
实际:
“雅虎!”从响应中丢失。得到了:
"<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
我认为这与!有关!标记。如果我删除它,那一切都很好。
重现步骤:
将文档添加到新索引
POST test/_doc/ { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }
没有其他设置/映射
运行查询
GET test/_search { "query": { "bool": { "should": [ { "wildcard": { "name": { "wildcard": "inc*" } } } ] } }, "highlight": { "fields": { "name" : {} } } }
得到了以下结果:
"hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "511tP3ABoqekxkoUshVf", "_score" : 1.0, "_source" : { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }, "highlight" : { "name" : [ "<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]" ] } } ]
期待的亮点:
"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
最佳答案
这是预期的行为,因为默认情况下,Elasticsearch高亮会返回部分搜索文本(片段),请参阅:https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-request-highlighting.html#unified-highlighter
!和。被认为是前一句的结尾,并且突出显示部分不返回该片段。
在我的情况下,搜索到的文本表示一个具有较短文本长度的名称,并且通过添加"number_of_fragments" : 0
,我强制突出显示以返回整个文档字段。
"highlight": {
"fields": {
"name" : {"number_of_fragments" : 0}
}
}
与:https://github.com/elastic/elasticsearch/issues/52333
关于elasticsearch - 当字段包含感叹号时,Elasticsearch突出显示的文本中缺少文本,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60223820/