本文介绍了遏制全文搜索不会以不同的语言返回一致的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Sql Server 2016数据库,其中包含4列全文索引,每个列都配置了不同的语言:荷兰语,英语,德语和英语。法语。我使用该向导来设置全文索引。



我正在使用与 FORMSOF 以及我期望执行的每种语言具有词干或任何动词形式的查询将返回来自示例表的两个结果。这似乎以英文和英文运作。德语,有点用法语,根本不用荷兰语。



我在每种语言中都使用了一个非常基本的例子,它带有'running'的动词形式,所以我在想有些东西可能没有正确配置。

示例表

 
+ --- - + ------------- + -------------- + ----------------- + - --------------- +
| ID | KeyWordsNL |关键字词| | KeyWordsDE | KeyWordsFR |
+ ---- + ------------- + -------------- + ----------- ------ + ---------------- +
| 1 | ik loop |我运行| ich laufe | je cours |
| 2 | ik ga lopen |我正在运行| ich gehe laufen | je vais courir |
+ ---- + ------------- + -------------- + ----------- ------ + ---------------- +

英语查询

  CONTAINSTABLE(SearchResult,KeyWordsEN,'FORMSOF(INFLECTIONAL,run)')
CONTAINSTABLE(SearchResult,KeyWordsEN,'FORMSOF(INFLECTIONAL,running)')

返回1& 2每个查询

德语查询

  CONTAINSTABLE(SearchResult,KeyWordsDE,'FORMSOF(INFLECTIONAL,laufe)')
CONTAINSTABLE(SearchResult,KeyWordsDE,'FORMSOF(INFLECTIONAL,laufen)')

返回1& 2为每个查询

法语查询

  CONTAINSTABLE(SearchResult,KeyWordsFR,'FORMSOF(INFLECTIONAL,cours)')
CONTAINSTABLE(SearchResult,KeyWordsFR,'FORMSOF(INFLECTIONAL,courir)')

仅在第一个查询(cours)中返回记录1,第二个查询返回1& 2



荷兰语查询

  CONTAINSTABLE (SearchResult,KeyWordsNL,'FORMSOF(INFLECTIONAL,loop)')
CONTAINSTABLE(SearchResult,KeyWordsNL,'FORMSOF(INFLECTIONAL,lopen)')
pre>

仅在第一个查询(循环)中返回记录1,并且在第二个查询(lopen)中记录2

编辑:进一步测试...



可以使用。这清楚表明,荷兰人根本没有发生任何干扰。在不同的机器上进行测试。



取得LCID语言:

  select * from sys.fulltext_languages where('Dutch','English','German','French')

select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL,koe)' ,1043,0,0)

select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL,cow)',1033,0,0)

荷兰语查询结果为koe,而英语查询结果为cow's,cowed,cowing,cow,cow , 牛。



同样的情况发生在我尝试的每一个单词上,荷兰语中没有任何单词的额外形式,而英语通常会返回5-10个单词形式。 b

解决方案

我发现荷兰语(和其他语言)根本没有特定的词干库。它没有明确说明,但解释了如何恢复分词器并使之成为以前的版本,看起来分词器和stemmer实际上使用了相同的dll。



以下查询显示荷兰语(LCID 1043)使用了默认的中性语言分词器/词干分析器,这解释了不好的结果。

  EXEC sp_help_fulltext_system_components'wordbreaker'; 

获得每种语言的LCID:

  SELECT * FROM sys.fulltext_languages; 


I have an Sql Server 2016 database with full text indexes defined on 4 columns, each configured for a different language: Dutch, English, German & French. I used the wizard to setup the full-text index.

I am using CONTAINSTABLE with FORMSOF and for each language I would expect executing a query with either the word stem or any verb form would return both results from the example table. This seems to work in English & German, somewhat in French, and not at all in Dutch.

I am using a very basic example with verb forms of 'running' in every language so I'm thinking something might not be configured correctly.

Example table

+----+-------------+--------------+-----------------+----------------+
| ID | KeyWordsNL  |  KeyWordsEN  |   KeyWordsDE    |   KeyWordsFR   |
+----+-------------+--------------+-----------------+----------------+
|  1 | ik loop     | i run        | ich laufe       | je cours       |
|  2 | ik ga lopen | i am running | ich gehe laufen | je vais courir |
+----+-------------+--------------+-----------------+----------------+

English queries

CONTAINSTABLE (SearchResult, KeyWordsEN, 'FORMSOF(INFLECTIONAL, "run")')
CONTAINSTABLE (SearchResult, KeyWordsEN, 'FORMSOF(INFLECTIONAL, "running")')

returns 1 & 2 for each query

German queries

CONTAINSTABLE (SearchResult, KeyWordsDE, 'FORMSOF(INFLECTIONAL, "laufe")')
CONTAINSTABLE (SearchResult, KeyWordsDE, 'FORMSOF(INFLECTIONAL, "laufen")')

returns 1 & 2 for each query

French queries

CONTAINSTABLE (SearchResult, KeyWordsFR, 'FORMSOF(INFLECTIONAL, "cours")')
CONTAINSTABLE (SearchResult, KeyWordsFR, 'FORMSOF(INFLECTIONAL, "courir")')

only returns record 1 in the first query (cours), second query return 1 & 2

Dutch queries

CONTAINSTABLE (SearchResult, KeyWordsNL, 'FORMSOF(INFLECTIONAL, "loop")')
CONTAINSTABLE (SearchResult, KeyWordsNL, 'FORMSOF(INFLECTIONAL, "lopen")')

only returns record 1 in the first query (loop), and record 2 in the second query (lopen)

Edit: Further testing ...

It is possible to test how fts parses the input query by using sys.dm_fts_parser. This makes clear there is simply no stemming happening for 'Dutch'. Tested on different machines.

Getting the language LCID:

select * from sys.fulltext_languages where name in ('Dutch','English','German','French')

select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "koe")', 1043, 0, 0)

select * from sys.dm_fts_parser('FORMSOF(INFLECTIONAL, "cow")', 1033, 0, 0)

Dutch query results in "koe", while the english query results in "cow's", "cowed", "cowing", "cows", "cows", "cow".

The same happens for every word I try, no extra forms of any word in Dutch, while English typically returns 5-10 word forms.

解决方案

I found that there is simply no specific stemming library for Dutch (and other languages). It is not clearly stated, but this article explains how to revert word breaker and stemming to previous versions, and it appears the word breaker and stemmer are actually using the same dll.

The following query shows that for Dutch (LCID 1043) the default neutral language word breaker/stemmer is used, which explains the bad results.

EXEC sp_help_fulltext_system_components 'wordbreaker';

To get the LCID per language:

SELECT * FROM sys.fulltext_languages; 

这篇关于遏制全文搜索不会以不同的语言返回一致的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 11:43