本文介绍了Elasticsearch:带有快速矢量荧光笔的多个pre_tags/post_tags的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于pre_tags/post_tags设置,该文档包括以下隐含的注释,这些设置能够包含多对一对pre-/post-tag:

有人知道该语句的确切含义吗?

解决方案

花费了一段时间,但是通过使用ES 1.7和_head插件尝试不同的查询,我能够弄清楚多个pre和post标签如何影响突出显示./p>

使用快速矢量荧光笔,您可以按重要性"的顺序指定标签,这似乎意味着它们的顺序与搜索字词的顺序应匹配.对一个效果使用多个前标签或后标签需要在查询中使用多个字段.

给出索引

  {myindex:{映射:{corpdocument:{特性: {创建日期: {类型:日期",格式:"dateOptionalTime"},docbody:{类型:字符串",分析器:"text_analyzer",栏位:{精确的: {类型:字符串",分析器:"text_analyzer_exact"}}},修改日期: {类型:日期",格式:"dateOptionalTime"},标题: {类型:字符串"}}}}}} 

和搜索

  POST locahost:9200/myindex/corpdocument/_search{强调": {"pre_tags":["| primary-highlight |","|次要重点|","post_tags":["|/primaryh-highlight |","|/secondary-highlight |",字段":{"docbody.exact":{"fragment_size":150,片段数":3}}},_来源": {排除":["docbody"]},询问": {布尔":{应该": [{比赛": {"docbody.exact":{"query":"foo"}}},{比赛": {"docbody.exact":{查询":酒吧"}}}}}} 

您可以获得这样的结果

  {接":14"timed_out":否,"_shards":{总计":5成功":5失败":0},点击数":{总计":97,"max_score":0.48895144,点击数":[{"_index":"myindex","_type":"corpdocument","_id":"XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M =","_score":0.48895144,_来源": {"createddate":"2010-11-02T00:00:00-05:00","modifieddate":"2007-09-04T00:00:00-05:00","_id":"XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M ="},强调": {"docbody.exact":[" Lorem ipsum dolor坐着,保守的精英| prime-highlight | foo |/primary-highlight | Lorem ipsum dolor坐着,保守的专家,"Lorem ipsum dolor坐满,礼节高尚| second-highlight | bar |/secondary-highlight | TOTHE | primary-highlight | foo</span>//primary-highlight | Lorem ipsum dolor坐满,礼节高尚""Lorem ipsum dolor坐满,礼节高手| second-highlight | bar |/secondary-highlight | Lorem ipsum dolor坐在amet,坐席高礼节| prime-highlight |首席|/primary-highlight | Lorem ipsum dolor坐amet,高礼宾adipiscing elit]]}},...]}} 

哪个标签会根据标签和搜索字词的顺序来包装哪个匹配项.切换"foo"和"bar"的顺序,而其他所有内容保持不变,将导致bar被包装在主要标签中,而foo被包装在辅助标签中.

从一些使用3个带有2个标签的搜索词的初步实验中,似乎第三个术语被包裹在第一个标签中,而不是第二个.添加第三个标签可以解决该问题,尽管需要将辅助标签重复n次才能覆盖所有搜索字词.

 突出显示":{"pre_tags":["| primary-highlight |","|次要重点|","|次要重点|","post_tags":["|/primaryh-highlight |","|/secondary-highlight |","|/secondary-highlight |",字段":{"docbody.exact":{"fragment_size":150,片段数":3}}},...询问": {布尔":{应该": [{比赛": {"docbody.exact":{"query":"foo"}}},{比赛": {"docbody.exact":{查询":酒吧"}}},{比赛": {"docbody.exact":{查询":巴兹"}}}}} 

The documentation includes the following cryptic remark in regards to the pre_tags/post_tags settings capable of containing more than one pair of pre-/post-tags:

Does anyone know what is the precise meaning of the statement?

解决方案

It took a while, but by trying different queries using ES 1.7 and the _head plugin I was able to figure out how multiple pre and post tags affect highlighting.

Using the Fast Vector Highlighter, you can specify tags in order of "importance" which seems to mean that their order and the order of your search terms should match. Using more than one pre or post tag to any effect requires more than one field in the query.

Given the index

{
 myindex: {
  mappings: {
   corpdocument: {
    properties: {
     createddate: {
      type: "date",
      format: "dateOptionalTime"
     },
     docbody: {
      type: "string",
      analyzer: "text_analyzer",
      fields: {
       exact: {
        type: "string",
        analyzer: "text_analyzer_exact"
       }
      }
     },
     modifieddate: {
      type: "date",
      format: "dateOptionalTime"
     },
     title: {
      type: "string"
     }
    }
   }
  }
 }
}

and the Search

POST locahost:9200/myindex/corpdocument/_search
{
 "highlight": {
  "pre_tags": ["|primary-highlight|",
  "|secondary-highlight|",
  "post_tags": ["|/primaryh-highlight|",
  "|/secondary-highlight|",
  "fields": {
   "docbody.exact": {
    "fragment_size": 150,
    "number_of_fragments": 3
   }
  }
 },
 "_source": {
  "exclude": ["docbody"]
 },
 "query": {
  "bool": {
   "should": [{
    "match": {
     "docbody.exact": {
      "query": "foo"
     }
    }
   },
   {
    "match": {
     "docbody.exact": {
      "query": "bar"
     }
    }
   }
  }
 }
}

You could get a results like this

{
 "took": 14,
 "timed_out": false,
 "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
 },
 "hits": {
  "total": 97,
  "max_score": 0.48895144,
  "hits": [{
   "_index": "myindex",
   "_type": "corpdocument",
   "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M=",
   "_score": 0.48895144,
   "_source": {
    "createddate": "2010-11-02T00:00:00-05:00",
    "modifieddate": "2007-09-04T00:00:00-05:00",
    "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M="
   },
   "highlight": {
    "docbody.exact": ["Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|foo|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight|TOTHE|primary-highlight|foo</span>|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|Chief|/primary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit"]
   }
  },
  ...
  ]
 }
}

Which tag wraps which hit is based on the order of the tags and of the search terms. Switch the order of "foo" and "bar" while leaving everything else the same will result in bar being wrapped in the primary tag and foo being wrapped in the secondary tag.

From some initial experiments using 3 search terms with 2 tags it seems like the third terms gets wrapped in the first tag instead of the second. Adding a third tag resolves that problem though requires duplication of the secondary tag n times to cover all the search terms.

"highlight": {
 "pre_tags": ["|primary-highlight|",
 "|secondary-highlight|",
 "|secondary-highlight|",
 "post_tags": ["|/primaryh-highlight|",
 "|/secondary-highlight|",
 "|/secondary-highlight|",
 "fields": {
  "docbody.exact": {
   "fragment_size": 150,
   "number_of_fragments": 3
  }
 }
},
..."query": {
 "bool": {
  "should": [{
   "match": {
    "docbody.exact": {
     "query": "foo"
    }
   }
  },
  {
   "match": {
    "docbody.exact": {
     "query": "bar"
    }
   }
  },
  {
   "match": {
    "docbody.exact": {
     "query": "baz"
    }
   }
  }
 }
}

这篇关于Elasticsearch:带有快速矢量荧光笔的多个pre_tags/post_tags的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 05:05