本文介绍了MySQL 子串模糊搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常有趣的问题:

I have a very interesting problem:

我有一个 MySQL 表 'Venue',其字段为:'name'、'addressLine1'、'addressLine2'、'addressLine3'、'city'、'country'、'description';所有字段都是 VARCHAR.描述"是一个更大的文本字段.

I have a MySQL table 'Venue' with fields: 'name', 'addressLine1', 'addressLine2', 'addressLine3', 'city', 'country', 'description'; all fields are VARCHAR. 'description' is a larger text field.

我想做的是对桌子 Venue 进行模糊搜索.到目前为止,我正在使用:

What I would like to do is a fuzzy search on table Venue. So far I am using:

SELECT * FROM Venue WHERE MATCH(name, addressLine1,..., description) AGAINST("London" IN NATURAL LANGUAGE MODE).

我还可以根据 MA​​TCH 得分对这个查询进行排序.

I can also sort this query based on the MATCH score.

这很好,但有一些明显的问题:1) 如果用户输入lond",则不返回任何内容2) 如果用户输入lodnod",则不返回任何内容.

This is great but has some obvious problems:1) if the user types "lond" then nothing is returned2) if the user types "lodnod" then nothing is returned.

我想到的另一种解决方案是使用 Levenshtein 扩展(2 个动作) - 因此对于搜索词lodnod",查询将如下所示:

One alternative solution I thought of would be to use a Levenshtein expansion (2 movements) - so for the search term "lodnod" the query would be like:

SELECT * FROM Venue WHERE Venue. name LIKE '%__lodnod%'
OR Venue.addressLine1 LIKE '%__lodnod%'
OR Venue.addressLine2 LIKE '%__lodnod%'
OR Venue.addressLine3 LIKE '%__lodnod%'
OR Venue.city LIKE '%__lodnod%'
OR Venue.county LIKE '%__lodnod%'
OR Venue.country LIKE '%__lodnod%'
OR Venue. name LIKE '%lodnod%'
OR Venue.addressLine1 LIKE '%lodnod%'
OR Venue.addressLine2 LIKE '%lodnod%'
OR Venue.addressLine3 LIKE '%lodnod%'
OR Venue.city LIKE '%lodnod%'
OR Venue.county LIKE '%lodnod%'
OR Venue.country LIKE '%lodnod%'
OR Venue. name LIKE '%_lodnod%'
OR Venue.addressLine1 LIKE '%_lodnod%'
OR Venue.addressLine2 LIKE '%_lodnod%'
OR Venue.addressLine3 LIKE '%_lodnod%'
OR Venue.city LIKE '%_lodnod%'
OR Venue.county LIKE '%_lodnod%'
OR Venue.country LIKE '%_lodnod%'
OR Venue. name LIKE '%_odnod%'
OR Venue.addressLine1 LIKE '%_odnod%'
OR Venue.addressLine2 LIKE '%_odnod%'
OR Venue.addressLine3 LIKE '%_odnod%'
OR Venue.city LIKE '%_odnod%'
OR Venue.county LIKE '%_odnod%'
OR Venue.country LIKE '%_odnod%'
OR Venue. name LIKE '%__odnod%'
OR Venue.addressLine1 LIKE '%__odnod%'
OR Venue.addressLine2 LIKE '%__odnod%'
OR Venue.addressLine3 LIKE '%__odnod%'
OR Venue.city LIKE '%__odnod%'
OR Venue.county LIKE '%__odnod%'
OR Venue.country LIKE '%__odnod%'
OR Venue. name LIKE '%_l_odnod%'
OR Venue.addressLine1 LIKE '%_l_odnod%'
OR Venue.addressLine2 LIKE '%_l_odnod%'
OR Venue.addressLine3 LIKE '%_l_odnod%'
OR Venue.city LIKE '%_l_odnod%'
OR Venue.county LIKE '%_l_odnod%'
OR Venue.country LIKE '%_l_odnod%'
OR Venue. name LIKE '%_ldnod%'
OR Venue.addressLine1 LIKE '%_ldnod%'
OR Venue.addressLine2 LIKE '%_ldnod%'
OR Venue.addressLine3 LIKE '%_ldnod%'
OR Venue.city LIKE '%_ldnod%'
OR Venue.county LIKE '%_ldnod%'
OR Venue.country LIKE '%_ldnod%'
OR Venue. name LIKE '%_l_dnod%'
OR Venue.addressLine1 LIKE '%_l_dnod%'
OR Venue.addressLine2 LIKE '%_l_dnod%'
OR Venue.addressLine3 LIKE '%_l_dnod%'
OR Venue.city LIKE '%_l_dnod%'
OR Venue.county LIKE '%_l_dnod%'
OR Venue.country LIKE '%_l_dnod%'
OR Venue. name LIKE '%_lo_dnod%'
OR Venue.addressLine1 LIKE '%_lo_dnod%'
OR Venue.addressLine2 LIKE '%_lo_dnod%'
OR Venue.addressLine3 LIKE '%_lo_dnod%'
OR Venue.city LIKE '%_lo_dnod%'
OR Venue.county LIKE '%_lo_dnod%'
OR Venue.country LIKE '%_lo_dnod%'
OR Venue. name LIKE '%_lonod%'
OR Venue.addressLine1 LIKE '%_lonod%'
OR Venue.addressLine2 LIKE '%_lonod%'
OR Venue.addressLine3 LIKE '%_lonod%'
OR Venue.city LIKE '%_lonod%'
OR Venue.county LIKE '%_lonod%'
OR Venue.country LIKE '%_lonod%'
OR Venue. name LIKE '%_lo_nod%'
OR Venue.addressLine1 LIKE '%_lo_nod%'
OR Venue.addressLine2 LIKE '%_lo_nod%'
OR Venue.addressLine3 LIKE '%_lo_nod%'
OR Venue.city LIKE '%_lo_nod%'
OR Venue.county LIKE '%_lo_nod%'
OR Venue.country LIKE '%_lo_nod%'
OR Venue. name LIKE '%_lod_nod%'
OR Venue.addressLine1 LIKE '%_lod_nod%'
OR Venue.addressLine2 LIKE '%_lod_nod%'
OR Venue.addressLine3 LIKE '%_lod_nod%'
OR Venue.city LIKE '%_lod_nod%'
OR Venue.county LIKE '%_lod_nod%'
OR Venue.country LIKE '%_lod_nod%'
OR Venue. name LIKE '%_lodod%'
OR Venue.addressLine1 LIKE '%_lodod%'
OR Venue.addressLine2 LIKE '%_lodod%'
OR Venue.addressLine3 LIKE '%_lodod%'
OR Venue.city LIKE '%_lodod%'
OR Venue.county LIKE '%_lodod%'
OR Venue.country LIKE '%_lodod%'
OR Venue. name LIKE '%_lod_od%'
OR Venue.addressLine1 LIKE '%_lod_od%'
OR Venue.addressLine2 LIKE '%_lod_od%'
OR Venue.addressLine3 LIKE '%_lod_od%'
OR Venue.city LIKE '%_lod_od%'
OR Venue.county LIKE '%_lod_od%'
OR Venue.country LIKE '%_lod_od%'
OR Venue. name LIKE '%_lodn_od%'
OR Venue.addressLine1 LIKE '%_lodn_od%'
OR Venue.addressLine2 LIKE '%_lodn_od%'
OR Venue.addressLine3 LIKE '%_lodn_od%'
OR Venue.city LIKE '%_lodn_od%'
OR Venue.county LIKE '%_lodn_od%'
OR Venue.country LIKE '%_lodn_od%'
OR Venue. name LIKE '%_lodnd%'
OR Venue.addressLine1 LIKE '%_lodnd%'
OR Venue.addressLine2 LIKE '%_lodnd%'
OR Venue.addressLine3 LIKE '%_lodnd%'
OR Venue.city LIKE '%_lodnd%'
OR Venue.county LIKE '%_lodnd%'
OR Venue.country LIKE '%_lodnd%'
OR Venue. name LIKE '%_lodn_d%'
OR Venue.addressLine1 LIKE '%_lodn_d%'
OR Venue.addressLine2 LIKE '%_lodn_d%'
OR Venue.addressLine3 LIKE '%_lodn_d%'
OR Venue.city LIKE '%_lodn_d%'
OR Venue.county LIKE '%_lodn_d%'
OR Venue.country LIKE '%_lodn_d%'
OR Venue. name LIKE '%_lodno_d%'
OR Venue.addressLine1 LIKE '%_lodno_d%'
OR Venue.addressLine2 LIKE '%_lodno_d%'
OR Venue.addressLine3 LIKE '%_lodno_d%'
OR Venue.city LIKE '%_lodno_d%'
OR Venue.county LIKE '%_lodno_d%'
OR Venue.country LIKE '%_lodno_d%'
OR Venue. name LIKE '%_lodno%'
OR Venue.addressLine1 LIKE '%_lodno%'
OR Venue.addressLine2 LIKE '%_lodno%'
OR Venue.addressLine3 LIKE '%_lodno%'
OR Venue.city LIKE '%_lodno%'
OR Venue.county LIKE '%_lodno%'
OR Venue.country LIKE '%_lodno%'
OR Venue. name LIKE '%_lodno_%'
OR Venue.addressLine1 LIKE '%_lodno_%'
OR Venue.addressLine2 LIKE '%_lodno_%'
OR Venue.addressLine3 LIKE '%_lodno_%'
OR Venue.city LIKE '%_lodno_%'
OR Venue.county LIKE '%_lodno_%'
OR Venue.country LIKE '%_lodno_%'
OR Venue. name LIKE '%_lodnod_%'
OR Venue.addressLine1 LIKE '%_lodnod_%'
OR Venue.addressLine2 LIKE '%_lodnod_%'
OR Venue.addressLine3 LIKE '%_lodnod_%'
OR Venue.city LIKE '%_lodnod_%'
OR Venue.county LIKE '%_lodnod_%'
OR Venue.country LIKE '%_lodnod_%'
OR Venue. name LIKE '%dnod%'
OR Venue.addressLine1 LIKE '%dnod%'
OR Venue.addressLine2 LIKE '%dnod%'
OR Venue.addressLine3 LIKE '%dnod%'
OR Venue.city LIKE '%dnod%'
OR Venue.county LIKE '%dnod%'
OR Venue.country LIKE '%dnod%'
OR Venue. name LIKE '%_dnod%'
OR Venue.addressLine1 LIKE '%_dnod%'
OR Venue.addressLine2 LIKE '%_dnod%'
OR Venue.addressLine3 LIKE '%_dnod%'
OR Venue.city LIKE '%_dnod%'
OR Venue.county LIKE '%_dnod%'
OR Venue.country LIKE '%_dnod%'
OR Venue. name LIKE '%o_dnod%'
OR Venue.addressLine1 LIKE '%o_dnod%'
OR Venue.addressLine2 LIKE '%o_dnod%'
OR Venue.addressLine3 LIKE '%o_dnod%'
OR Venue.city LIKE '%o_dnod%'
OR Venue.county LIKE '%o_dnod%'
OR Venue.country LIKE '%o_dnod%'
OR Venue. name LIKE '%onod%'
OR Venue.addressLine1 LIKE '%onod%'
OR Venue.addressLine2 LIKE '%onod%'
OR Venue.addressLine3 LIKE '%onod%'
OR Venue.city LIKE '%onod%'
OR Venue.county LIKE '%onod%'
OR Venue.country LIKE '%onod%'
OR Venue. name LIKE '%o_nod%'
OR Venue.addressLine1 LIKE '%o_nod%'
OR Venue.addressLine2 LIKE '%o_nod%'
OR Venue.addressLine3 LIKE '%o_nod%'
OR Venue.city LIKE '%o_nod%'
OR Venue.county LIKE '%o_nod%'
OR Venue.country LIKE '%o_nod%'
OR Venue. name LIKE '%od_nod%'
OR Venue.addressLine1 LIKE '%od_nod%'
OR Venue.addressLine2 LIKE '%od_nod%'
OR Venue.addressLine3 LIKE '%od_nod%'
OR Venue.city LIKE '%od_nod%'
OR Venue.county LIKE '%od_nod%'
OR Venue.country LIKE '%od_nod%'
OR Venue. name LIKE '%odod%'
OR Venue.addressLine1 LIKE '%odod%'
OR Venue.addressLine2 LIKE '%odod%'
OR Venue.addressLine3 LIKE '%odod%'
OR Venue.city LIKE '%odod%'
OR Venue.county LIKE '%odod%'
OR Venue.country LIKE '%odod%'
OR Venue. name LIKE '%od_od%'
OR Venue.addressLine1 LIKE '%od_od%'
OR Venue.addressLine2 LIKE '%od_od%'
OR Venue.addressLine3 LIKE '%od_od%'
OR Venue.city LIKE '%od_od%'
OR Venue.county LIKE '%od_od%'
OR Venue.country LIKE '%od_od%'
OR Venue. name LIKE '%odn_od%'
OR Venue.addressLine1 LIKE '%odn_od%'
OR Venue.addressLine2 LIKE '%odn_od%'
OR Venue.addressLine3 LIKE '%odn_od%'
OR Venue.city LIKE '%odn_od%'
OR Venue.county LIKE '%odn_od%'
OR Venue.country LIKE '%odn_od%'
OR Venue. name LIKE '%odnd%'
OR Venue.addressLine1 LIKE '%odnd%'
OR Venue.addressLine2 LIKE '%odnd%'
OR Venue.addressLine3 LIKE '%odnd%'
OR Venue.city LIKE '%odnd%'
OR Venue.county LIKE '%odnd%'
OR Venue.country LIKE '%odnd%'
OR Venue. name LIKE '%odn_d%'
OR Venue.addressLine1 LIKE '%odn_d%'
OR Venue.addressLine2 LIKE '%odn_d%'
...(cut short because of maximum 30000 character limit)
OR Venue.city LIKE '%lodno_%'
OR Venue.county LIKE '%lodno_%'
OR Venue.country LIKE '%lodno_%'
OR Venue. name LIKE '%lodno__%'
OR Venue.addressLine1 LIKE '%lodno__%'
OR Venue.addressLine2 LIKE '%lodno__%'
OR Venue.addressLine3 LIKE '%lodno__%'
OR Venue.city LIKE '%lodno__%'
OR Venue.county LIKE '%lodno__%'
OR Venue.country LIKE '%lodno__%'
OR Venue. name LIKE '%lodno_d_%'
OR Venue.addressLine1 LIKE '%lodno_d_%'
OR Venue.addressLine2 LIKE '%lodno_d_%'
OR Venue.addressLine3 LIKE '%lodno_d_%'
OR Venue.city LIKE '%lodno_d_%'
OR Venue.county LIKE '%lodno_d_%'
OR Venue.country LIKE '%lodno_d_%'
OR Venue. name LIKE '%lodno%'
OR Venue.addressLine1 LIKE '%lodno%'
OR Venue.addressLine2 LIKE '%lodno%'
OR Venue.addressLine3 LIKE '%lodno%'
OR Venue.city LIKE '%lodno%'
OR Venue.county LIKE '%lodno%'
OR Venue.country LIKE '%lodno%'
OR Venue. name LIKE '%lodnod__%'
OR Venue.addressLine1 LIKE '%lodnod__%'
OR Venue.addressLine2 LIKE '%lodnod__%'
OR Venue.addressLine3 LIKE '%lodnod__%'
OR Venue.city LIKE '%lodnod__%'
OR Venue.county LIKE '%lodnod__%'
OR Venue.country LIKE '%lodnod__%'
OR Venue. name LIKE '%lodnod_%'
OR Venue.addressLine1 LIKE '%lodnod_%'
OR Venue.addressLine2 LIKE '%lodnod_%'
OR Venue.addressLine3 LIKE '%lodnod_%'
OR Venue.city LIKE '%lodnod_%'
OR Venue.county LIKE '%lodnod_%'
OR Venue.country LIKE '%lodnod_%'

显然,这个查询是1) 对服务器造成巨大负载,响应时间会受到影响.2) 我不知道如何给这个评分,因此对它进行排序.

Obviously, this query is1) going to be a huge load on the server and response times will be affected.2) I have no idea on scoring this and therefore sorting it.

有没有更好的方法?

推荐答案

MySQL 的全文搜索功能文档页面 有一些有用的信息.不管上面的评论是什么,您不需要将自己限制在 MySQL 中的 MyISAM 表类型.正如其他人所说,MySQL 可能不是最佳选择,但您可以采取一些措施使其更实用.

MySQL's documentation page for full text search functions has some useful information. Despite what the comments above say, you don't need to restrict yourself to the MyISAM table type in MySQL. As others have said, MySQL may not be the best choice, but there are some things you can do to make it more workable.

如果您发现自己总是对多个列进行这样的搜索,那么拥有一个可搜索"列可能会有所帮助,该列仅包含来自所有其他列的标记.您需要某种机制来使其保持最新状态,但是 ON INSERTON UPDATE 触发器就足够了.这可能会简化您的索引并使您的查询更易于阅读(并有望加快响应速度),但并不能解决您最初的问题.

If you find that you're always doing searches like this against multiple columns, it might help to have a "searchable" column that contains just the tokens from all of the other columns. You'd need some mechanism to keep it up to date, but ON INSERT and ON UPDATE triggers would be sufficient. This may simplify your indexes and make your queries easier to read (and hopefully, speedup responses), but doesn't solve your initial issue.

这种可搜索"列想法的一个优点是您可以悄悄地添加常见拼写错误的标记,让您的用户更轻松.您的搜索日志在这里很有帮助.

One advantage of this "searchable" column idea is you can quietly add tokens of common misspellings to make things easier for your users. Your search logs are helpful here.

最近取得一些成功的一种解决方案是使用 soundex 而不是 Levenshtein 扩展.例如,参见 Rob Gravelle (2015) "使用 SOUNDEX 函数的 MySQL 模糊文本搜索数据库日志

One solution that's had some recent success is using soundex instead of Levenshtein expansion. See, e.g., Rob Gravelle (2015) "MySQL Fuzzy Text Searching Using the SOUNDEX Function, Database Journal

另一种选择是将用户定义函数 (UDF) 添加到您的 MySQL 实现中.参见例如这个堆栈溢出问题这篇博文.

Another option is adding a User Defined Function (UDF) to your MySQL implementation. See e.g., this Stack Overflow question or this blog post.

无论您采用何种方法,您都会发现使用 Levenshtein 进行真正的模糊搜索是一种缓慢的野兽.祝你好运!

However you approach it, you'll find that truly fuzzy searching with Levenshtein is a slow beast. Best of luck to you!

这篇关于MySQL 子串模糊搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-10 22:03