本文介绍了当我不知道创建时的最大大小时,如何使用 Lucene 的 PriorityQueue?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为 Lucene.Net 构建了一个自定义收集器,但我不知道如何对结果进行排序(或分页).每次调用 Collect 时,我都可以将结果添加到内部 PriorityQueue 中,我知道这是正确的方法.

I built a custom collector for Lucene.Net, but I can't figure out how to order (or page) the results. Everytime Collect gets called, I can add the result to an internal PriorityQueue, which I understand is the correct way to do this.

我扩展了 PriorityQueue,但它在创建时需要一个大小参数.您必须在构造函数中调用 Initialize 并传入最大大小.

I extended the PriorityQueue, but it requires a size parameter on creation. You have to call Initialize in the constructor and pass in the max size.

但是,在收集器中,搜索器只是在获得新结果时调用 Collect,所以 我不知道创建 PriorityQueue 时有多少结果.基于此,我无法弄清楚如何使 PriorityQueue 工作.

However, in a collector, the searcher just calls Collect when it gets a new result, so I don't know how many results I have when I create the PriorityQueue. Based on this, I can't figure out how to make the PriorityQueue work.

我意识到我可能在这里遗漏了一些简单的东西......

I realize I'm probably missing something simple here...

推荐答案

PriorityQueue 不是 SortedListSortedDictionary.这是一种排序实现,它返回 N 个元素的前 M 个结果(您的 PriorityQueue 的大小).您可以使用 InsertWithOverflow 添加任意数量的项目,但它只会包含前 M 个元素.

PriorityQueue is not SortedList or SortedDictionary.It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow as many items as you want, but it will only hold only the top M elements.

假设您的搜索产生了 1000000 次点击.您会将所有结果返回给用户吗?更好的方法是将前 10 个元素返回给用户(使用 PriorityQueue(10))和如果用户请求下一个 10 个结果,您可以使用 PriorityQueue(20) 进行新搜索,然后返回下一个 10 元素,依此类推.这是大多数搜索引擎(如 google)使用的技巧.

Suppose your search resulted in 1000000 hits. Would you return all of the results to user?A better way would be to return the top 10 elements to the user(using PriorityQueue(10)) andif the user requests for the next 10 result, you can make a new search with PriorityQueue(20) and return the next 10 elements and so on.This is the trick most search engines like google uses.

每次调用 Commit 时,我都可以将结果添加到内部 PriorityQueue.

我无法理解Commitsearch之间的关系,因此我将附上PriorityQueue的示例用法:

I can not undestand the relationship between Commit and search, Therefore I will append a sample usage of PriorityQueue:

public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
    public CustomQueue(int maxSize): base()
    {
        Initialize(maxSize);
    }

    public override bool LessThan(Document a, Document b)
    {
        //a.GetField("field1")
        //b.GetField("field2");
        return  //compare a & b
    }
}

public class MyCollector : Lucene.Net.Search.Collector
{
    CustomQueue _queue = null;
    IndexReader _currentReader;

    public MyCollector(int maxSize)
    {
        _queue = new CustomQueue(maxSize);
    }

    public override bool AcceptsDocsOutOfOrder()
    {
        return true;
    }

    public override void Collect(int doc)
    {
        _queue.InsertWithOverflow(_currentReader.Document(doc));
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        _currentReader = reader;
    }

    public override void SetScorer(Scorer scorer)
    {
    }
}
searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.

编辑@nokturnal

public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
                                where TComp : IComparable<TComp>
{
    Func<TObj, TComp> _KeySelector;

    public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
    {
        _KeySelector = keySelector;
        Initialize(size);
    }

    public override bool LessThan(TObj a, TObj b)
    {
        return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
    }

    public IEnumerable<TObj> Items
    {
        get
        {
            int size = Size();
            for (int i = 0; i < size; i++)
                yield return Pop();
        }
    }
}
var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}

这篇关于当我不知道创建时的最大大小时,如何使用 Lucene 的 PriorityQueue?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-19 11:44