本文介绍了如何在mongodb中构造复合索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在mongo中创建和排序索引时,我需要一些建议.

I need some advice in creating and ordering indexes in mongo.

我有一个具有5个属性的帖子集:

I have a post collection with 5 properties:

帖子

  • 状态
  • 开始日期
  • 结束日期
  • lowerCaseTitle
  • sortOrder

几乎所有帖子的状态均为1,只有少数帖子的状态被拒绝.我所有的查询都将根据状态,开始日期和结束日期进行过滤,并按sortOrder进行排序.我还将有一个查询对标题进行正则表达式搜索.

Almost all the posts will have the same status of 1 and only a handful will have a rejected status. All my queries will filter on status, start and end dates, and sort on sortOrder. I also will have one query that does a regex search on the title.

我应该在{status:1,start:1,end:1,sort:1}上设置复合键吗?我将字段放在复合索引中的顺序是否重要-我应该把状态放在复合索引中,因为它是最广泛的?在每个属性上执行复合索引而不是单个索引会更好吗? mongo是否在任何给定查询上仅使用单个索引?

Should I set up a compound key on {status:1, start:1, end:1, sort:1}? Does it matter which order I put the fields in the compound index - should I put status first in the compound index since it's the most broad? Is it better to do a compound index rather than a single index on each property? Does mongo only use a single index on any given query?

如果我正在执行正则表达式查询,是否有关于LowerCaseTitle索引的提示?

Are there any hints for indexes on lowerCaseTitle if I'm doing a regex query on that?

示例查询是:

db.posts.find({status: {$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})

db.posts.find( {lowerCaseTitle: /japan/, status:{$gte:0}, start: {$lt: today}, end: {$gt: today}}).sort({sortOrder:1})

推荐答案

在一个帖子中有很多问题;)让我以实际的顺序进行讨论:

That's a lot of questions in one post ;) Let me go through them in a practical order :

  • 每个查询最多可以使用一个索引(顶级$ or子句等除外).这包括任何排序.
  • 由于上述原因,您肯定会需要一个复合索引来解决您的问题,而不是单独使用每个字段的索引.
  • 低基数字段(因此,整个数据集中唯一值很少的字段)通常不应该包含在索引中,因为它们的选择性非常有限.
  • 复合索引中字段的顺序很重要,复合索引中每个字段的相对方向也很重要(例如"{name:1,age:-1}").在mongodb.org上有很多关于复合索引和索引字段方向的文档,因此在这里我不再赘述.
  • 仅当排序字段在索引中并且在索引中的字段紧接用于选择结果集的最后一个字段之后时,排序才会使用索引.在大多数情况下,这将是索引的最后一个字段.

因此,您完全不应该在索引中包含状态,因为一旦索引遍历消除了基于较高基数字段的绝大多数文档,那么在大多数情况下最多将剩下2-3个文档,而这几乎不能通过优化状态索引(特别是因为您提到了这2-3个文档,无论如何它们都具有相同的状态).

So, you should not include status in your index at all since once the index walk has eliminated the vast majority of documents based on higher cardinality fields it will at most have 2-3 documents left in most cases which is hardly optimized by a status index (especially since you mentioned those 2-3 documents are very likely to have the same status anyway).

现在,与您的情况相关的最后一个注释是(当您使用)范围查询时,无论如何,它都不会使用索引进行排序.测试查询后,可以通过查看explain()的"scanAndOrder"值进行检查.如果该值存在且为true,则表示它将对内存中的结果集进行排序(扫描和排序),而不是直接使用索引.在您的特定情况下,这是无法避免的.

Now, the last note that's relevant in your case is that when you use range queries (and you are) it'll not use the index for sorting anyway. You can check this by looking at the "scanAndOrder" value of your explain() once you test your query. If that value exists and is true it means it'll sort the resultset in memory (scan and order) rather than use the index directly. This cannot be avoided in your specific case.

因此,您的索引应该是:

So, your index should therefore be :

db.posts.ensureIndex({start:1, end:1})

和您的查询(修改顺序只是为了清楚起见,查询优化器将通过相同的执行路径运行原始查询,但我更喜欢将索引字段放在首位和顺序):

and your query (order modified for clarity only, query optimizer will run your original query through the same execution path but I prefer putting indexed fields first and in order) :

db.posts.find({start: {$lt: today}, end: {$gt: today}, status: {$gte:0}}).sort({sortOrder:1})

这篇关于如何在mongodb中构造复合索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 13:58