本文介绍了如何使用指南针lucene生成的cfs索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用(最新的)lucene 8.7,可以在我无法修改的遗留应用程序中使用lucene实用程序"Luke"打开由2009年左右的lucene 2.2生成的.cfs复合索引文件.?还是可以从.cfs生成Luke的.idx文件?.cfs是由Lucene 2.2上的罗盘生成的,而不是直接由Lucene生成的.是否可以使用包含:
的指南针生成的索引_b.cfs
segment.gen
segment_d

With (the latest) lucene 8.7 is it possible to open a .cfs compound index file generated by lucene 2.2 of around 2009, in a legacy application that I cannot modify, with lucene utility "Luke" ?or alternatively could it be possibile to generate the .idx file for Luke from the .cfs ?the .cfs was generated by compass on top of lucene 2.2, not by lucene directlyIs it possible to use a compass generated index containing :
_b.cfs
segments.gen
segments_d

可能使用solr吗?

有没有示例示例如何在任何地方使用罗盘打开基于文件的.cfs索引?

are there any examples how to open a file based .cfs index with compass anywhere ?

由于索引版本太旧,转换工具无法使用:

the conversion tool won't work because the index version is too old :

来自lucene \ build \ demo:

from lucene\build\demo :

java -cp ../core/lucene-core-8.7.0-SNAPSHOT.jar;../backward-codecs/lucene-backward-codecs-8.7.0-SNAPSHOT.jar org.apache.lucene.index.IndexUpgrader-详细的path_of_old_index

java -cp ../core/lucene-core-8.7.0-SNAPSHOT.jar;../backward-codecs/lucene-backward-codecs-8.7.0-SNAPSHOT.jar org.apache.lucene.index.IndexUpgrader -verbose path_of_old_index

和searchfiles演示:

and the searchfiles demo :

java -classpath ../core/lucene-core-8.7.0-SNAPSHOT.jar;../queryparser/lucene-queryparser-8.7.0-SNAPSHOT.jar;./lucene-demo-8.7.0-SNAPSHOT.jar org.apache.lucene.demo.SearchFiles -index path_of_old_index

java -classpath ../core/lucene-core-8.7.0-SNAPSHOT.jar;../queryparser/lucene-queryparser-8.7.0-SNAPSHOT.jar;./lucene-demo-8.7.0-SNAPSHOT.jar org.apache.lucene.demo.SearchFiles -index path_of_old_index

均失败:

org.apache.lucene.index.IndexFormatTooOldException:不支持格式版本此版本的Lucene仅支持使用6.0版及更高版本创建的索引.

可以以某种方式使用带有 lucene 的旧索引吗?如何使用旧的编解码器"?也可以从lucene.net获得吗?

Is is possible to use an old index with lucene somehow ? how to use the old "codec" ?also from lucene.net if possible ?

当前的lucene 8.7产生一个包含这些文件的索引:

current lucene 8.7 yields an index containing these files :

segments_1
write.lock
_0.cfe
_0.cfs
_0.si

segments_1
write.lock
_0.cfe
_0.cfs
_0.si

==========================================================================更新:令人惊讶的是,似乎使用nuget的lucene.net v.0.3.3打开了非常老的格式索引!

==========================================================================update : amazingly it seems to open that very old format index with lucene.net v. 3.0.3 from nuget !

这似乎可以从索引中提取所有术语:

this seems to work in order to extract all terms from the index :

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Globalization;

    using Lucene.Net.Analysis.Standard;
    using Lucene.Net.Documents;
    using Lucene.Net.Index;
    using Lucene.Net.QueryParsers;
    using Lucene.Net.Search;
    using Lucene.Net.Store;
    using Version = Lucene.Net.Util.Version;

    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main()
            {

                var reader = IndexReader.Open(FSDirectory.Open("C:\\Temp\\ftsemib_opzioni\\v210126135604\\index\\search_0"), true);
                Console.WriteLine("number of documents: "+reader.NumDocs() + "\n");
                Console.ReadLine();

                TermEnum terms = reader.Terms();
                while (terms.Next())
                {
                    Term term = terms.Term;
                    String termField = term.Field;
                    String termText = term.Text;
                    int frequency = reader.DocFreq(term);
                    Console.WriteLine(termField +" "+termText);
                }
                var fieldNames = reader.GetFieldNames(IndexReader.FieldOption.ALL);
                int numFields = fieldNames.Count;
                Console.WriteLine("number of fields: " + numFields + "\n");
                for (IEnumerator<String> iter = fieldNames.GetEnumerator(); iter.MoveNext();)
                {
                    String fieldName = iter.Current;
                    Console.WriteLine("field: " + fieldName);
                }
                reader.Close();

                Console.ReadLine();
            }
        }

    }

出于好奇,是否有可能找出它的索引版本?是否有基于文件系统索引的(旧)指南针示例?

out of curiosity could it be possible to find out what index version it is ?are there any examples of (old) compass with file system based index ?

推荐答案

不幸的是,您不能使用旧的 Codec 从Lucene 2.2访问索引文件.这是因为编解码器是在Lucene 4.0中引入的.在此之前,用于读取和写入索引文件的代码并未组合到一个编解码器中,而是本质上是整个Lucene库的一部分.

Unfortunately you can't use an old Codec to access index files from Lucene 2.2. This is because codecs were introduced in Lucene 4.0. Prior to that the code for reading and writing files of the index was not grouped together into a codec but rather was just inherently part of the overall Lucene Library.

因此在Lucene 4.0之前的版本中,没有编解码器,只是文件读写到库中的代码.追踪所有这些代码并创建可插入到现代版本的Lucene中的编解码器将是非常困难的.这不是一项不可能的任务,但需要Lucene的专业开发人员和大量的工作(即非常昂贵的工作).

So in version of Lucene prior to 4.0 there is no codec, just file reading and writing code baked into the library. It would be very difficult to track down all that code and to create a codec that could be plugged into a modern version of Lucene. It's not an impossible task, but it require an Expert Lucene developer and a large amount of effort (ie an extremely expensive endeavor).

有鉴于此,此SO问题的答案可能有一定用处:

In light of all that, the answer to this SO question may be of some use: How to upgrade lucene files from 2.2 to 4.3.1

您最好的选择是使用Java lucene的旧3.x副本或Lucene.net ver 3.0.3打开索引,然后添加并提交一个文档(这将创建第二段)并执行优化,这将导致两个段合并为一个新段.新的细分受众群将是第3版细分受众群.然后,您可以再次使用Lucene.Net 4.8 Beta或Java Lucene 4.X做同样的事情(但Commit从版本4开始更名为ForceMerge)将索引转换为4.x索引.

Your best bet would be to use an old 3.x copy of java lucene or the Lucene.net ver 3.0.3 to open the index, then add and commit one doc (which will create a 2nd segment) and do a Optimize which will cause the two segments to be merged into one new segment. The new segment will be a version 3 segment. Then you can use Lucene.Net 4.8 Beta or a Java Lucene 4.X to do the same thing (but Commit was renamed ForceMerge starting in ver 4) again to convert the index to a 4.x index.

然后,您可以使用当前的Java版本的Lucene 8.x再次执行此操作,以将索引一直移动到8,因为当前版本的Java Lucene的编解码器一直返回到5.0,请参见: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/codecs

Then you can use the current java version of Lucene 8.x to do this once more to move the index all the way up to 8 since the current version of Java Lucene has codecs reaching all the way back to 5.0 see: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/codecs

但是,如果您再次收到您报告的错误:

However if you do receive the error again that you reported:

然后,您将不得不使用6.x Java Lucene版本再玩一个游戏,才能从5.x索引升级到6.x索引.:-)

then you will have to play this game one more cycle with a version 6.x Java Lucene to get from a 5.x index to a 6.x index. :-)

这篇关于如何使用指南针lucene生成的cfs索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-16 16:44