本文介绍了Spark如何逐出缓存的分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以独立模式运行Spark 2.0,并且是集群中唯一一个提交作业的人.

I'm running Spark 2.0 in stand-alone mode, and I'm the only one submitting jobs in my cluster.

假设我有一个具有100个分区的RDD,一次只能在内存中容纳总共10个分区.

Suppose I have an RDD with 100 partitions and only 10 partitions in total would fit in memory at a time.

我们还假定分配的执行内存足够并且不会干扰存储内存.

Let's also assume that allotted execution memory is enough and will not interfere with storage memory.

假设我遍历该RDD中的数据.

Suppose I iterate over the data in that RDD.

rdd.persist()  // MEMORY_ONLY

for (_ <- 0 until 10) {
  rdd.map(...).reduce(...)
}

rdd.unpersist()

对于每次迭代,持久化的前10个分区是否会一直保留在内存中,直到rdd.unpersist()?

For each iteration, will the first 10 partitions that are persisted always be in memory until rdd.unpersist()?

推荐答案

我想找到了答案,所以我将回答自己的问题.

I think I found the answer, so I'm going to answer my own question.

驱逐策略似乎在MemoryStore类中.这是源代码.

The eviction policy seems to be in the MemoryStore class. Here's the source code.

似乎没有驱逐条目来代替位于同一RDD中的条目.

It seems that entries are not evicted to make place for entries in the same RDD.

这篇关于Spark如何逐出缓存的分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-25 00:35