本文介绍了MySQL:复合索引全文+ btree?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个查询在一个字段上进行全文搜索,然后在另一个字段上进行排序(想象一下按发布日期搜索一些文本文档和顺序)。这张桌子有大约17M行,它们或多或少均匀地分布在日期中。这是用于Web应用程序的请求/响应周期,所以查询必须在200ms内完成。



示意图:

  em>执行 * 。

  SELECT a。* 
FROM tbl AS a
JOIN(SELECT date,id
FROM tbl
WHERE MATCH(...)AGAINST(...)
ORDER BY date DESC
LIMIT 10)AS x
USING(date,id)
ORDER BY date DESC;

加上

  PRIMARY KEY(date,id),
INDEX(id),
FULLTEXT(...)

这个公式和索引应该是这样的:


  1. 使用 FULLTEXT 找到30K行,交付PK。

  2. 使用PK,按照 date 排序30K行。

  3. 选择最后一个10,交付 date,id

  4. 返回表10次使用PK。

  5. 再次排序。 (是的,这是必要的。)

更多(回应过多的评论) p>

我的重新配置背后的目标是避免抓取 30K 行的所有列。取而代之的是,它仅提取 PRIMARY KEY ,然后将其分解为10,然后仅提取 * 10行。



关于InnoDB表中的 COUNT :


  • INDEX(col)使得索引扫描适用于 SELECT COUNT(*)或 SELECT COUNT(col)不含 WHERE 。

  • code> INDEX(col), SELECT COUNT(*)将使用最小索引;但是 SELECT COUNT(col)`将需要一个扫描。

  • 表扫描通常是慢于索引扫描。

  • 小心计时 - 索引和/或表是否已被缓存在RAM中会受到很大影响。



关于 FULLTEXT 的另一件事是在前面的 + 单词 - 要说每个单词都必须存在,否则就没有匹配。这可能会减少30K。


$ b

FULLTEXT 指数将提供日期, id 是随机顺序,而不是PK顺序。无论如何,假设任何顺序都是错误的,因此,添加 ORDER BY 是'正确的',然后让Optimizer抛出它,如果它知道这是多余的。有时候,优化器可以利用 ORDER BY (并非如此)。

删除 ORDER BY 使查询运行得更快。这是因为它避免了提取30K行并对它们进行排序。相反,它只是提供any10行。



(我还没有使用Postgres的经验,所以我无法解决这个问题。)


I want a query that does a fulltext search on one field and then a sort on a different field (imagine searching some text document and order by publication date). The table has about 17M rows and they are more or less uniformly distributed in dates. This is to be used in a webapp request/response cycle, so the query has to finish in at most 200ms.

Schematically:

SELECT * FROM table WHERE MATCH(text) AGAINST('query') ORDER BY date=my_date DESC LIMIT 10;

One possibility is having a fulltext index on the text field and a btree on the publication date:

ALTER TABLE table ADD FULLTEXT index_name(text);
CREATE INDEX index_name ON table (date);

This doesn't work very well in my case. What happens is that MySQL evaluates two execution paths. One is using the fulltext index to find the relevant rows, and once they are selected use a FILESORT to sort those rows. The second is using the BTREE index to sort the entire table and then look for matches using a FULL TABLE SCAN. They're both bad. In my case MySQL chooses the former. The problem is that the first step can select some 30k results which it then has to sort, which means the entire query might take of the order 10 seconds.

So I was thinking: do composite indexes of FULLTEXT+BTREE exist? If you know how a FULLTEXT index works, it first tokenizes the column you're indexing and then builds an index for the tokens. It seems reasonable to me to imagine a composite index such that the second index is a BTREE in dates for each token. Does this exist in MySQL and if so what's the syntax?

BONUS QUESTION: If it doesn't exist in MySQL, would PostgreSQL perform better in this situation?

解决方案

Use IN BOOLEAN MODE.

The date index is not useful. There is no way to combine the two indexes.

Beware, if a user searches for something that shows up in 30K rows, the query will be slow. There is no straightforward away around it.

I suspect you have a TEXT column in the table? If so, there is hope. Instead of blindly doing SELECT *, let's first find the ids and get the LIMIT applied, then do the *.

SELECT a.* 
    FROM tbl AS a
    JOIN ( SELECT date, id
             FROM tbl
             WHERE MATCH(...) AGAINST (...)
             ORDER BY date DESC
             LIMIT 10 ) AS x
        USING(date, id)
    ORDER BY date DESC;

Together with

PRIMARY KEY(date, id),
INDEX(id),
FULLTEXT(...)

This formulation and indexing should work like this:

  1. Use FULLTEXT to find 30K rows, deliver the PK.
  2. With the PK, sort 30K rows by date.
  3. Pick the last 10, delivering date, id
  4. Reach back into the table 10 times using the PK.
  5. Sort again. (Yeah, this is necessary.)

More (Responding to a plethora of Comments):

The goal behind my reformulation is to avoid fetching all columns of 30K rows. Instead, it fetches only the PRIMARY KEY, then whittles that down to 10, then fetches * only 10 rows. Much less stuff shoveled around.

Concerning COUNT on an InnoDB table:

  • INDEX(col) makes it so that an index scan works for SELECT COUNT(*) or SELECT COUNT(col) without a WHERE.
  • Without INDEX(col),SELECT COUNT(*)will use the "smallest" index; butSELECT COUNT(col)` will need a table scan.
  • A table scan is usually slower than an index scan.
  • Be careful of timing -- It is significantly affected by whether the index and/or table is already cached in RAM.

Another thing about FULLTEXT is the + in front of words -- to say that each word must exist, else there is no match. This may cut down on the 30K.

The FULLTEXT index will deliver the date, id is random order, not PK order. Anyway, it is 'wrong' to assume any ordering, hence it is 'right' to add ORDER BY, then let the Optimizer toss it if it knows that it is redundant. And sometimes the Optimizer can take advantage of the ORDER BY (not in your case).

Removing just the ORDER BY, in many cases, makes a query run much faster. This is because it avoids fetching, say, 30K rows and sorting them. Instead it simply delivers "any" 10 rows.

(I have not experience with Postgres, so I cannot address that question.)

这篇关于MySQL:复合索引全文+ btree?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 05:37