本文介绍了在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道的是:


  • 火花塞使用spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead存储什么样的数据?

  • 而在这种情况下,我应该提高spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead的价值?


解决方案

在YARN术语中,执行者和应用程序运行大师里面容器。星火纱线提供特定的属性,所以你可以运行你的应用程序:


  • spark.yarn.executor.memoryOverhead 是关闭的堆内存的每个执行人要分配量(以兆字节)。这是内存占之类的虚拟机管理费用,实习弦,其他原生费用等,这往往与执行人尺寸(通常6-10%)增长。

  • spark.yarn.driver.memoryOverhead 是关闭的堆内存的需要每司机群集模式与内存性能执行人的分配量(以兆字节) memoryOverhead。

所以它不是关于存储数据,它只是需要YARN正常运行的资源。

在某些情况下,


如:如果您启用 dynamicAllocation 你可能要与执行人的最大数量以及明确设置这些属性( spark.dynamicAllocation.maxExecutors ),可以将过程,可以很容易地通过询问数以千计的执行者,从而失去了已经运行的执行人压倒纱过程中创建。

Increasing the target number of executors happens in response to backlogged tasks waiting to be scheduled. If the scheduler queue is not drained in N seconds, then new executors are added. If the queue persists for another M seconds, then more executors are added and so on. The number added in each round increases exponentially from the previous round until an upper bound has been reached. The upper bound is based both on a configured property and on the current number of running and pending tasks, as described above.

This can lead into an exponential increase of the number of executors in some cases which can break the YARN resource manager. In my case :

16/03/31 07:15:44 INFO ExecutorAllocationManager: Requesting 8000 new executors because tasks are backlogged (new desired total will be 40000)


This doesn't cover all the use case which one can use those property, but it gives a general idea about how it's been used.

这篇关于在spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead是用来存放什么样的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 08:24