本文介绍了用全文搜索部分文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MySQL全文搜索似乎很棒,也是在SQL中搜索的最佳方式。但是,我似乎被卡在不会搜索部分单词的事实上。例如,如果我有一篇名为MySQL Tutorial的文章并且搜索MySQL,它就不会找到它。



完成一些搜索后,支持MySQL 4(我使用5.1.40)。我尝试过使用 MySQL 和%MySQL%,但都不起作用(我发现的一个链接暗示它是星星,但只能在结尾或开头都不要)。

这里是我的表格结构和我的查询,如果有人能告诉我我哪里出错了,那很好。

 
CREATE TABLE IF NOT EXISTS`articles`(
`article_id` smallint(5)unsigned NOT NULL AUTO_INCREMENT,
`article_name` varchar(64)NOT NULL,
`article_desc` text NOT NULL,
`article_link` varchar(128)NOT NULL,
`article_hits` int(11)NOT NULL,
`article_user_hits` int(7)unsigned NOT NULL DEFAULT'0',$ b $``article_guest_hits` int(10)unsigned NOT NULL DEFAULT'0',
`article_rating` decimal(4,2)NOT NULL DEFAULT'0.00',
`article_site_id` smallint(5)unsigned NOT NULL DEFAULT'0',$ b $``article_time_added` int(10) unsigned NOT NULL,
`article_discussion_id` smallint(5)unsigned NOT NULL DEFAULT'0',
`article_source_type` varchar(12)NOT NULL,
`article_source_value` varchar(12)NOT NULL ,
PRIMARY KEY(`article_id`),
FULLTEXT KEY`article_name`(`article_name`,`article_desc`,`article_link`)
)ENGIN E = MyISAM DEFAULT CHARSET = utf8 AUTO_INCREMENT = 7;



 
INSERT INTO`articles` VALUES
(1,'MySQL Tutorial','Lorem Ipsum Lorem Ipsum自从十六世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机采用了一种类型的厨房,并将其制作成一本样本书,它不仅存活了五年它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单以及最近的包括Aldus PageMaker在内的桌面出版软件(包括Lorem Ipsum版本)的推出而广为流传。','' http://www.domain.com/',6,3,1,'1.50',1,1269702050,1,'0','0'),
(2,'如何使用MySQL ','Lorem Ipsum仅仅是印刷和排版行业的虚拟文本,Lorem Ipsum自从16世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机花了一个小时并且将它打成类型样本。它不仅存活了五个世纪,而且还实现了电子排版的飞跃,基本保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单以及最近的桌面出版软件(如Aldus PageMaker,包括Lorem Ipsum版本)的发布而得到推广。','http://www.domain.com/',1, 2,0,'3.00',1,1269702050,1,'0','0'),
(3,'优化MySQL','Lorem Ipsum只是印刷和排版行业的虚拟文本。 Ipsum自从16世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机采用了一种类型的厨房,并将其制作成样本书,不仅在五个世纪中幸存下来,而且还实现了电子排版的跨越,它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单以及最近的桌面出版软件(如Aldus PageMaker,包括Lorem Ipsum版本)的推出而广为流行。','http://www.domain.com/', 0,1,0,'3.00',1,1269702050,1,'0','0 ),
(4,1001个时MySQL的技巧,Lorem存有简直是印刷排版行业的虚拟文本。 Lorem Ipsum自从16世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机采用了一种类型的厨房,并将其制作成样本书。它不仅存活了五个世纪,而且还实现了电子排版的飞跃,基本保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单以及最近的桌面出版软件(如Aldus PageMaker,包括Lorem Ipsum版本)的发布而得到推广。','http://www.domain.com/',0, 1,0,'3.00',1,1269702050,1,'0','0'),
(5,'MySQL vs. YourSQL','Lorem Ipsum只是印刷和排版行业的虚拟文本Lorem Ipsum自从16世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机采用了一种类型的厨房,并将其制作成了一本样本书,它不仅存活了五个世纪,而且还进入了电子排版,基本上保持不变,它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单的发布以及最近使用桌面出版软件(如Aldus PageMaker,包括Lorem Ipsum版本)而得到推广。','http://www.domain.com/ ',0,2,0,'3.00',1,1269702050,1,'0','0 '),
(6,'MySQL安全','Lorem Ipsum只是印刷和排版行业的虚拟文本。 Lorem Ipsum自从16世纪以来一直是业界标准的虚拟文本,当时一台未知的打印机采用了一种类型的厨房,并将其制作成样本书。它不仅存活了五个世纪,而且还实现了电子排版的飞跃,基本保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表单以及最近的桌面出版软件(如Aldus PageMaker,包括Lorem Ipsum版本)的发布而得到推广。','http://www.domain.com/',0, 2,0,'3.00',1,1269702050,1,'0','0');



 
SELECT count(a.article_id)FROM articles a

WHAT MATCH( a.article_name,a.article_desc,a.article_link)AGAINST('mysql')
GROUP BY a.article_id
ORDER BY a.article_time_added ASC

使用前缀,因为它来自有时会添加额外连接的函数。



正如你所看到的,MySQL的搜索应该返回计数为6,但不幸的是它没有。



更新



行匹配。



搜索结果是空的,因为MySQL这个词至少有50%的行存在,因此它被有效地视为一个停用词,对于大型数据集来说,这是最重要的能够行为:自然语言查询不应该从1GB表中返回每隔一行。

解决方案

我的理解是,MySQL FULLTEXT索引支持搜索前缀( MATCH(a.article_name)仅限于AGAINST('MySQL *'IN BOOLEAN MODE))。


MySQL Full Text searching appears to be great and the best way to search in SQL. However, I seem to be stuck on the fact that it won't search partial words. For instance if I have an article titled "MySQL Tutorial" and search for "MySQL", it won't find it.

Having done some searching I found various references to support for this coming in MySQL 4 (i'm using 5.1.40). I've tried using "MySQL" and "%MySQL%", but neither works (one link I found suggested it was stars but you could only do it at the end or the beginning not both).

Here's my table structure and my query, if someone could tell me where i'm going wrong that would be great. I'm assuming partial word matching is built in somehow.

CREATE TABLE IF NOT EXISTS `articles` (
  `article_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `article_name` varchar(64) NOT NULL,
  `article_desc` text NOT NULL,
  `article_link` varchar(128) NOT NULL,
  `article_hits` int(11) NOT NULL,
  `article_user_hits` int(7) unsigned NOT NULL DEFAULT '0',
  `article_guest_hits` int(10) unsigned NOT NULL DEFAULT '0',
  `article_rating` decimal(4,2) NOT NULL DEFAULT '0.00',
  `article_site_id` smallint(5) unsigned NOT NULL DEFAULT '0',
  `article_time_added` int(10) unsigned NOT NULL,
  `article_discussion_id` smallint(5) unsigned NOT NULL DEFAULT '0',
  `article_source_type` varchar(12) NOT NULL,
  `article_source_value` varchar(12) NOT NULL,
  PRIMARY KEY (`article_id`),
  FULLTEXT KEY `article_name` (`article_name`,`article_desc`,`article_link`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=7 ;
INSERT INTO `articles` VALUES
(1, 'MySQL Tutorial', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 6, 3, 1, '1.50', 1, 1269702050, 1, '0', '0'),
(2, 'How To Use MySQL Well', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 1, 2, 0, '3.00', 1, 1269702050, 1, '0', '0'),
(3, 'Optimizing MySQL', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 0, 1, 0, '3.00', 1, 1269702050, 1, '0', '0'),
(4, '1001 MySQL Tricks', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 0, 1, 0, '3.00', 1, 1269702050, 1, '0', '0'),
(5, 'MySQL vs. YourSQL', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 0, 2, 0, '3.00', 1, 1269702050, 1, '0', '0'),
(6, 'MySQL Security', 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.', 'http://www.domain.com/', 0, 2, 0, '3.00', 1, 1269702050, 1, '0', '0');
SELECT count(a.article_id) FROM articles a

            WHERE MATCH (a.article_name, a.article_desc, a.article_link) AGAINST ('mysql')
            GROUP BY a.article_id
            ORDER BY a.article_time_added ASC

The prefix is used as it comes from a function that sometimes adds additional joins.

As you can see a search for MySQL should return a count of 6, but unfortunately it doesn't.

Update

No results where returned as every single row was matched.

http://dev.mysql.com/doc/refman/5.1/en/fulltext-natural-language.html

"The search result is empty because the word "MySQL" is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large data sets, this is the most desirable behavior: A natural language query should not return every second row from a 1GB table. For small data sets, it may be less desirable."

解决方案

My understanding is that MySQL FULLTEXT indexes support searching for prefixes (MATCH (a.article_name) AGAINST ('MySQL*' IN BOOLEAN MODE)) only.

这篇关于用全文搜索部分文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 05:40