


  • 火花塞使用spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead存储什么样的数据?

  • 而在这种情况下,我应该提高spark.yarn.driver.memoryOverhead或spark.yarn.executor.memoryOverhead的价值?



  • spark.yarn.executor.memoryOverhead 是关闭的堆内存的每个执行人要分配量(以兆字节)。这是内存占之类的虚拟机管理费用,实习弦,其他原生费用等,这往往与执行人尺寸(通常6-10%)增长。

  • spark.yarn.driver.memoryOverhead 是关闭的堆内存的需要每司机群集模式与内存性能执行人的分配量(以兆字节) memoryOverhead。



如:如果您启用 dynamicAllocation 你可能要与执行人的最大数量以及明确设置这些属性( spark.dynamicAllocation.maxExecutors ),可以将过程,可以很容易地通过询问数以千计的执行者,从而失去了已经运行的执行人压倒纱过程中创建。

Increasing the target number of executors happens in response to backlogged tasks waiting to be scheduled. If the scheduler queue is not drained in N seconds, then new executors are added. If the queue persists for another M seconds, then more executors are added and so on. The number added in each round increases exponentially from the previous round until an upper bound has been reached. The upper bound is based both on a configured property and on the current number of running and pending tasks, as described above.

This can lead into an exponential increase of the number of executors in some cases which can break the YARN resource manager. In my case :

16/03/31 07:15:44 INFO ExecutorAllocationManager: Requesting 8000 new executors because tasks are backlogged (new desired total will be 40000)

This doesn't cover all the use case which one can use those property, but it gives a general idea about how it's been used.


09-18 08:24