本文介绍了Hibernate全文Serch-按相关性排序结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Hibernate Search 5.5.0.Final进行全文查询(我已经尝试使用最新版本,但由于我使用的是Hibernate的旧版本而无法使用(5.0.12)).

I am trying to make a full text query with Hibernate Search version 5.5.0.Final (I 've already tried with most recent version but doesn't work maybe because of the old version of Hibernate I'm using (5.0.12) ).

我想要获得的最终结果如下:

The final result that I would like to obtain is the following:

Display at the top of the list the result that matches on the description field with the following logic:
    (Let' assume a user is searching "Milk")
    -Results having the word at the beginning (Milk UHT)
    -Results having the word in second or third position (Chocolate Milk)
    -Results having the word in a phrase(MilkShake)
Then displaying the result matching with the field tags (Lactose free, Gluten Free etc)

这是我到目前为止所做的:

This is what I've done so far:

FullTextEntityManager fullTextEntityManager
            = Search.getFullTextEntityManager(entityManager);
    fullTextEntityManager.createIndexer().startAndWait();


    FullTextEntityManager fullTextEntityManager2
            = Search.getFullTextEntityManager(entityManager);

    QueryBuilder queryBuilder = fullTextEntityManager2.getSearchFactory()
            .buildQueryBuilder()
            .forEntity(ProductEntity.class)
            .get();


    Query myQuery = queryBuilder
            .bool()
            .should(queryBuilder.keyword()
                    .onField("description").boostedTo(9l).matching(query)
                    .createQuery())
            .should(queryBuilder.phrase()
                    .onField("description").boostedTo(5l).sentence(query)
                    .createQuery())

            .should(queryBuilder.keyword()
                    .onField("tags").boostedTo(3l).matching(query)
                    .createQuery())
            .should(queryBuilder.phrase()
                    .onField("tags").boostedTo(1l).sentence(query)
                    .createQuery())

            .createQuery();


    org.hibernate.search.jpa.FullTextQuery jpaQuery
            = fullTextEntityManager.createFullTextQuery(myQuery, ProductEntity.class);

    return jpaQuery.getResultList();

我已经在互联网上阅读了很多东西,但是仍然无法获得理想的结果.这有可能吗?你能给我一个提示吗?

I've been reading a lot on the internet but still I cannot get the desired result.Is this even possible? Can you give me a hint?

预先感谢

推荐答案

首先,知道boost不是分配给每个查询的常量权重;而是乘数.因此,当您在查询#4上将boost设置为1并在查询#3上将boost设置为3时,从理论上讲,查询#4可能会以较高的提升分数"结束.如果其基本得分是查询3的基本得分的三倍以上.为避免此类问题,您可以将每个查询的分数标记为常量(使用 .boostedTo(3l).withConstantScore().onField("tags")而不是 .onField(标签").boostedTo(3l).

First, know that the boost is not a constant weight assigned to each query; rather, it's a multiplier. So when you set the boost to 1 on query #4 and to 3 on query #3, it's theoretically possible that query #4 ends up with a higher "boosted score" if its base score is more than three times that of query #3. To avoid that kind of problem, you can mark the score of each query as constant (use .boostedTo(3l).withConstantScore().onField("tags") instead of .onField("tags").boostedTo(3l).

第二,短语查询不是您想的那样.短语查询接受一个多词输入字符串,并将查找包含这些词的顺序相同的文档.既然您通过了一个学期,那就没有意义了.所以你还需要其他东西.

Second, the phrase query is not what you think it is. The phrase query accepts a multi-term input string, and will look for documents that contain these terms in the same order. Since you passed a single term, it's pointless. So you need something else.

我相信准确地做到 的唯一方法是跨度查询.但是,它们不是Hibernate Search DSL的一部分,因此您必须依靠低级的Lucene API.而且,我从未使用过它们,而且不确定如何使用它们...我所知的很少是从 Elasticsearch的文档,但是严重缺乏Lucene的文档.

I believe the only way to do exactly what you want are span queries. However, they are not part of the Hibernate Search DSL, so you'll have to rely on low-level Lucene APIs. What's more, I've never used them, and I'm not sure how they are supposed to be used... What little I know was taken from Elasticsearch's documentation, but the Lucene documentation is severely lacking.

您可以尝试类似的方法,但是如果它不起作用,则您必须自己调试(我对您的了解不多):

You can try something like this, but if it doesn't work you'll have to debug it yourself (I don't know more than you do):

    QueryBuilder queryBuilder = fullTextEntityManager2.getSearchFactory()
            .buildQueryBuilder()
            .forEntity(ProductEntity.class)
            .get();
    Analyzer analyzer = fullTextEntityManager.getSearchFactory()
            .getAnalyzer(ProductEntity.class);

    Query myQuery = queryBuilder
            .bool()
            .should(new BoostQuery(new ConstantScoreQuery(createSpanQuery(qb, "description", query, analyzer)), 9L))
            [... add other clauses here...]
            .createQuery();

// Other methods (to be added to the same class)

    private static Query createSpanQuery(QueryBuilder qb, String fieldName, String searchTerms, Analyzer analyzer) {
        BooleanJunction bool = qb.bool();
        List<String> terms = analyze(fieldName, searchTerms, analyzer);
       for (int i = 0; i < terms.size(); ++i) {
            bool.must(new SpanPositionRangeQuery(new SpanTermQuery(new Term( fieldName, terms.get(i))), i, i);
        }
        return bool.createQuery();
    }

    private static List<String> analyze(String fieldName, String searchTerms, Analyzer analyzer) {
        List<String> terms = new ArrayList<String>();
        try {
            final Reader reader = new StringReader( searchTerms );
            final TokenStream stream = analyzer.tokenStream( fieldName, reader );
            try {
                CharTermAttribute attribute = stream.addAttribute( CharTermAttribute.class );
                stream.reset();
                while ( stream.incrementToken() ) {
                    if ( attribute.length() > 0 ) {
                        String term = new String( attribute.buffer(), 0, attribute.length() );
                        terms.add( term );
                    }
                }
                stream.end();
            }
            finally {
                stream.close();
            }
        }
        catch (IOException e) {
            throw new IllegalStateException( "Unexpected exception while analyzing search terms", e );
        }
        return terms;
    }

查询2:该词位于第二或第三位置的结果

我相信您可以使用与查询1相同的代码,但要添加一个偏移量.如果实际位置无关紧要,并且您会接受第四或第五位的单词,则只需执行以下操作即可:

Query 2: Results having the word in second or third position

I believe you can use the same code as for query 1, but adding an offset. If the actual position doesn't matter, and you'll accept words in fourth or fifth position, you can simply do this:

queryBuilder.keyword().boostedTo(5l).withConstantScore()
        .onField("description").matching(query)
       .createQuery()

查询3:在词组(MilkShake)中包含单词的结果

据我了解,您的意思是结果包含一个包含搜索词的单词".

Query 3: Results having the word in a phrase(MilkShake)

From what I understand, you mean "results containing a word that contains the search term".

您可以使用通配符查询,但是不幸的是,这些查询不适用于分析器,从而导致区分大小写的搜索(以及其他问题).

You could use wilcard queries for that, but unfortunately these queries do not apply analyzers, resulting in case-sensitive search (among other problems).

您最好的选择是为此查询定义一个单独的字段,例如 description_ngram ,并为其分配一个特制的分析器,该分析器在建立索引时使用ngram标记器.ngram分词器简单地获取一个输入字符串并将其转换为其所有子字符串:会变成 ["m","mi","mil","milk",...,"milkshake","i","il","ilk","ilks","ilksh",..."ilkshake","l",..."lkshake",...,"ke","e"] .显然,这会占用大量磁盘空间,但可以用于小型数据集.您可以在此处找到说明..答案提到了一个不同的分析器"edgengram",但是在您的情况下,您确实要使用"ngram"分析器.分析器.

Your best bet is probably to define a separate field for this query, e.g. description_ngram, and assign a specially-crafted analyzer to it, one which uses the ngram tokenizer when indexing. The ngram tokenizer simply takes an input string and transforms it to all its substrings: "milkshake" would become ["m", "mi", "mil", "milk", ..., "milkshake", "i", "il", "ilk", "ilks", "ilksh", ... "ilkshake", "l", ... "lkshake", ..., "ke", "e"]. Obviously it takes a lot of disk space, but it can work for small-ish datasets.You will find instructions for a similar use case here. The answer mentions a different analyzer, "edgengram", but in your case you'll really want to use the "ngram" analyzer.

或者,如果您确定索引文本的格式正确,则可以清楚地将复合"内容的各个组成部分分开.字词(例如奶昔",奶昔"等),您可以简单地创建一个字段(例如 description_worddelimiterfilter ),该字段使用带有字词分隔符过滤器的分析器(请参见 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter ),它将拆分这些复合词.然后,您可以像这样简单地查询:

Alternatively, if you're sure the indexed text is correctly formatted to clearly separate components of a "composite" word (e.g. "milk-shake", "MilkShake", ...), you can simply create a field (e.g. description_worddelimiterfilter) that uses an analyzer with a word-delimiter filter (see org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter) which will split these composite words. Then you can simply query like this:

queryBuilder.keyword().boostedTo(3l).withConstantScore()
        .onField("description_worddelimiterfilter")
        .matching(query)
        .createQuery()

这篇关于Hibernate全文Serch-按相关性排序结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 17:13