在搜索文本并请求结果查询突出显示时,如果匹配的文档字段包含感叹号,则返回的突出显示的文本将不包含包含感叹号的文本的一部分

Elasticsearch 7.1.1版

文件:{ "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]"}搜索突出显示“inc”通配符

预期:
高亮文字应为:

"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

实际:
“雅虎!”从响应中丢失。得到了:
"<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

我认为这与!有关!标记。如果我删除它,那一切都很好。

重现步骤:

将文档添加到新索引
POST test/_doc/ { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }

没有其他设置/映射

运行查询
GET test/_search { "query": { "bool": { "should": [ { "wildcard": { "name": { "wildcard": "inc*" } } } ] } }, "highlight": { "fields": { "name" : {} } } }

得到了以下结果:
"hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "511tP3ABoqekxkoUshVf", "_score" : 1.0, "_source" : { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }, "highlight" : { "name" : [ "<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]" ] } } ]

期待的亮点:
"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"

最佳答案

这是预期的行为,因为默认情况下,Elasticsearch高亮会返回部分搜索文本(片段),请参阅:https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-request-highlighting.html#unified-highlighter

!和。被认为是前一句的结尾,并且突出显示部分不返回该片段。

在我的情况下,搜索到的文本表示一个具有较短文本长度的名称,并且通过添加"number_of_fragments" : 0,我强制突出显示以返回整个文档字段。

"highlight": {
  "fields": {
     "name" : {"number_of_fragments" : 0}
  }
}

与:https://github.com/elastic/elasticsearch/issues/52333

关于elasticsearch - 当字段包含感叹号时,Elasticsearch突出显示的文本中缺少文本,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60223820/

10-11 08:51