本文介绍了如何在ElasticSearch中使用doc_values的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释一下doc_value如何工作?
为什么会在汇总时帮助我?



过滤时会帮助我吗?



对于过滤,我看到它的方式,ElasticSearch将访问反向索引以找到适合聚合的所有文档的指针,因此根据文档,无反向索引的doc_values是无关紧要的?或者我错了吗?



有人可以在启用doc_values时解释汇总的流程,何时不启用,为什么启用它可以节省内存?



谢谢。

解决方案

关于 doc_values :




  • doc_values 将帮助

  • 它们用于名为 fielddata的内存部分

  • 在使用父子关系和地理距离过滤器时,使用 fielddata 进行排序,进行聚合,使用访问字段值的脚本时



直到 doc_values 开始播放,fielddata被加载到。 doc_values 不会使用堆,而是堆外的内存 - 文件系统缓存,因为 doc_values 将生活在文件系统。 Lucene将访问文件系统,操作系统会将其缓存在文件系统缓存中,然后从那里提供请求。



为什么这很重要:堆有一个有限的大小和建议不要使用超过30ish GB的堆大小。堆还包含其他部分:过滤器高速缓存,查询缓存,索引缓冲区,段文件等的元数据。Fielddata通常需要很多空间,因为它的效率不高,但由于ES需要加载所有的将文档记录到内存中,以便它可以对其进行排序,聚合。对于较大的索引(隐含地,分片),这意味着很多数据。



这就是为什么 doc_values 被引入:move所有这些从堆(这是有限的)负担到OS文件系统缓存(这是有限的,但是肯定有更少的压力)。



doc_values 它不会帮助您本身的聚合。 doc_values 表示fielddata。 Fielddata对于聚合是强制性的。 doc_values 将帮助您使用内存使用情况。


Can someone explain to me how does doc_values work?Why would that help me when doing aggregations?

Would it help me when filtering?

For filtering, the way I see it, ElasticSearch would access the inverted index to find "pointers" to all the documents that fit the aggregations, so doc_values, which is an "uninverted index" according to the documentation, is irrelevant? Or am I wrong?

Can someone explain the flow of an aggregation when doc_values is enabled, and when it isn't, and why enabling it saves memory?

Thanks.

解决方案

General statements about doc_values:

  • doc_values will help with heap memory usage
  • they are used for the memory section called fielddata
  • fielddata is being used when sorting, doing aggregations, when using scripts that access field values, when using parent-child relationships and geo-distance filters

Until doc_values came into play, fielddata was being loaded into heap. doc_values will not use the heap, but the memory outside the heap - the file system cache, because doc_values will live in the file system. Lucene will access the file system, the operating system will cache it in the file system cache and then serve requests from there.

Why is this important: the heap has a limited size and the recommendation is not to use more than 30ish GB for heap size. The heap, also, contains other sections: filter caches, query caches, indexing buffers, meta-data from the segment files etc. Fielddata, usually, takes a lot of room not because it is inefficient, but because ES needs to load all the documents into memory so that it can sort, aggregate on them. For larger indices (implicitly, shards) this means a lot of data.

That's why doc_values were introduced: move all this burden from the heap (which is limited) to the OS file system cache (which is limited, as well, but definitely with less pressure on it).

doc_values it will not help you with aggregations per se. doc_values means fielddata. Fielddata is mandatory for aggregations. doc_values will help you with heap memory usage.

这篇关于如何在ElasticSearch中使用doc_values的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 08:02