本文介绍了如何在Sql Server 2008全文搜索中忽略html标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SQL Server 2008全文搜索引擎开发知识库项目。
包含在每篇文章有多个文件的文章和文件中的项目。在这些文章中,整个内容是纯html

,我在SQL Server 2008上成功创建了全文目录和索引,并且我的数据库与版本10兼容。



以下是我的问题:

1)在这些文章中搜索时,是否可以忽略html标签,更清晰的文本包含在< ...>中,因为如果我想搜索div,table等,应该有没有结果返回?



2)文章将随时更新,因此全文索引必须在插入新记录时更新。只需设置TRACK CHANGES AUTOMATIC,同时创建全文目录?
$ b $ 3)我们可以使用FILESTREAM功能,SQL Server 2008在使用全文索引的文件上有很好的性能吗?
SQL Server 2008在索引方面的优势是什么?



问候

解决方案

请检查以下内容:
$ b $ 1在SQL Server全文中,我们可以定义噪音词/停用词。您可以编辑噪音世界文件,然后您必须重建目录。所以你可以把所有的html标签作为噪音。请检查



2)随着曲目的变化,它会自动包含当前已满的变化文本搜索,但这些新增文章的排名从之前改变。所以,除非你的主索引被同步,否则它会与排名一起放弃。


3)据我所知,我们可以实现自定义过滤器,词干分析器和分词器并且可以插入SQL Server全文搜索。默认情况下,我可能不知道完整列表,但它确实是doc和pdf。



有关SQL Server全文的更多信息搜索2008请检查:


I'm working on a knowledge base project using SQL Server 2008 Full Text Search Engine.Project included in articles and files where each article has multiple files.In those articles whole content is pure html.

Right now,I successfully created fulltext catalog and index on SQL Server 2008 and my database is version 10 compatible.

Here are my questions:

1)Is it possible to ignore html tags,more clearly texts containing in "<...>", while searching in these articles,because if i wish to search for div,table etc. there should be no result returned?

2)Articles will be updated anytime,so full text index must be updated when a new record is inserted.Is it enough to set only "TRACK CHANGES AUTOMATIC" while creating full text catalog?

3)We may use FILESTREAM feature hereafter,does SQL Server 2008 have a good performance on files using full text index?What specific document types does SQL Server 2008 good on indexing?

Regards

解决方案

Please check for these:

1) In SQL Server Full Text, we can define noise words/Stopwords. You can edit the Noise world file and then you have to rebuild the catalog. So you can put all the html tags as noise. Please check

http://msdn.microsoft.com/en-us/library/ms142551.aspx

2) With track changes it automatically include the changes in current full text search, but the ranking of these newly added article gets changed from the previous. So until and unless you master index is synced it will give up and down with ranking.

3) As far as i know we can implement custom filters, stemmers and word breakers and can plug into SQL Server full text search.By default i may not know the complete list, but it does doc and pdf.

For more information on SQL Server full text search 2008 please check:

http://technet.microsoft.com/en-us/library/cc721269.aspx

这篇关于如何在Sql Server 2008全文搜索中忽略html标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 05:43