本文介绍了在C#中写入反向索引用于信息检索应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个内部应用程序,其中包含几个文本信息以及关于这些文本的一些数据。这些数据将被保存在一个数据库(SQL Server,虽然这可能会改变)按入口顺序。



我想能够搜索最相关的这些信息,最相关的这些信息在顶端。我最初研究使用SQL Server全文搜索,但是对于我的其他需求来说,它并不灵活,因为我希望看到我需要开发自己的解决方案。



从我了解所需要的是一个,则基于所保留的附加信息的结果,恢复和修改所述反向索引的内容(尽管现在可以将其留在稍后的日期,因为我只想将反向索引索引主文本从数据库表/字符串提供)。



我在Java中使用Hashtable将密钥作为单词和值作为单词发生的列表,在所有诚实我仍然相当新的C#,只有真正使用的东西,如DataSet和DataTables处理信息。如果被请求,一旦我清除了这台笔记本电脑的病毒,我将尽快上传Java代码。



如果从表或列表中给出一组条目如何在C#中创建一个倒排索引,最好保存在DataSet / DataTable中?



编辑:我忘了提到我已经尝试过Lucene和Nutch,但是需要我自己的解决方案来修改Lucene来满足我的需求将比编写一个反向索引要长得多。我将处理大量的元数据,一旦完成了基本的反向索引,还需要处理,所以我现在需要的是使用倒排索引的一个区域的基本全文搜索。最后,在一个倒置的索引上进行工作并不是每天都要做的事情,所以做一个破解是很棒的。

解决方案

以下是我过去在C#中成功使用的方法的粗略概述:

  struct WordInfo 
{
public int position;
public int fieldID;
}

字典< string,List< WordInfo>> invertedIndex =新词典< string,List< WordInfo>>();

public void BuildIndex()
{
foreach(GetDatabaseFieldIDS()中的int fieldID)
{
string textField = GetDatabaseTextFieldForID(fieldID);

字串;

int position = 0;

while(GetNextWord(textField,out word,ref position)== true)
{
WordInfo wi = new WordInfo();

if(invertedIndex.TryGetValue(word,out wi)== false)
{
invertedIndex.Add(word,new List< WordInfo>());
}

wi.Position = position;
wi.fieldID = fieldID;
invertedIndex [word] .Add(wi);

}

}
}

注意:



GetNextWord()遍历字段并返回下一个单词和位置。要实现它,请查看使用string.IndexOf()和char字符类型检查方法(IsAlpha等)。



GetDatabaseTextFieldForID()和GetDatabaseFieldIDS()是不言自明的,实现为


I am writing an in-house application that holds several pieces of text information as well as a number of pieces of data about these pieces of text. These pieces of data will be held within a database (SQL Server, although this could change) in order of entry.

I'd like to be able to search for the most relevant of these pieces of information, with the most relevant of these to be at the top. I originally looked into using SQL Server Full-Text Search but it's not as flexible for my other needs as I had hoped so it seems that I'll need to develop my own solution to this.

From what I understand what is needed is an inverted index, then for the contents of said inverted index to be restored and modified based on the results of the additional information held (although for now this can be left for a later date as I just want the inverted index to index the main text from the database table/strings provided).

I've had a crack at writing this code in Java using a Hashtable with the key as the words and the value as a list of the occurrences of the word but in all honesty I'm still rather new at C# and have only really used things like DataSets and DataTables when handling information. If requested I'll upload the Java code soon once I've cleared this laptop of viruses.

If given a set of entries from a table or from a List of Strings, how could one create an inverted index in C# that will preferably save into a DataSet/DataTable?

EDIT: I forgot to mention that I have already tried Lucene and Nutch, but require my own solution as modifying Lucene to meet my needs would take far longer than writing an inverted index. I'll be handling a lot of meta-data that'll also need handling once the basic inverted index is completed, so all I require for now is a basic full-text search on one area using the inverted index. Finally, working on an inverted index isn't something I get to do every day so it'd be great to have a crack at it.

解决方案

Here's a rough overview of an approach I've used successfully in C# in the past:

 struct WordInfo
 {
     public int position;
     public int fieldID;
 }

 Dictionary<string,List<WordInfo>> invertedIndex=new Dictionary<string,List<WordInfo>>();

       public void BuildIndex()
       {
            foreach (int  fieldID in GetDatabaseFieldIDS())
            {    
                string textField=GetDatabaseTextFieldForID(fieldID);

                string word;

                int position=0;

                while(GetNextWord(textField,out word,ref position)==true)
                {
                     WordInfo wi=new WordInfo();

                     if (invertedIndex.TryGetValue(word,out wi)==false)
                     {
                         invertedIndex.Add(word,new List<WordInfo>());
                     }

                     wi.Position=position;
                     wi.fieldID=fieldID;
                     invertedIndex[word].Add(wi);

                }

            }
        }

Notes:

GetNextWord() iterates through the field and returns the next word and position. To implement it look at using string.IndexOf() and char character type checking methods (IsAlpha etc).

GetDatabaseTextFieldForID() and GetDatabaseFieldIDS() are self explanatory, implement as required.

这篇关于在C#中写入反向索引用于信息检索应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 11:45