本文介绍了Stream#filter内存不足,无法保存1,000,000个项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的Stream的长度为1,000,000,全为1.

Let's say I have a Stream of length 1,000,000 with all 1's.

scala> val million = Stream.fill(100000000)(1)
million: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> million filter (x => x % 2 == 0)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

我得到一个Out of Memory异常.

然后,我尝试使用List进行相同的filter调用.

Then, I tried the same filter call with List.

scala> val y = List.fill(1000000)(1)
y: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ...

scala> y.filter(x => x % 2 == 0)
res2: List[Int] = List()

但是成功了.

为什么Stream#filter在这里用完了内存,但是List#filter完成就好了?

Why does the Stream#filter run out of memory here, but the List#filter completes just fine?

最后,在大数据流中,filter是否会导致整个数据流的非惰性评估?

Lastly, with a large stream, will filter result in the non-lazy evaluation of the entire stream?

推荐答案

List的开销-单个对象(::的实例),每个元素具有2个字段(2个指针).

Overhead of List - single object (instance of ::) with 2 fields (2 pointers) per element.

Stream的开销-Cons的实例(带有3个指针)加上Function的实例(tl: => Stream[A]),用于每个元素的Stream#tail的惰性计算.

Overhead of Stream - instance of Cons (with 3 pointers) plus an instance of Function (tl: => Stream[A]) for lazy evaluation of Stream#tail per element.

因此,您将在Stream上花费大约2倍的内存.

So you'll spend ~2 times more memory on Stream.

您已将Stream定义为val.或者,您可以将million定义为def-在这种情况下,在filter之后,GC将删除所有已创建的元素,并取回内存.

You have defined your Stream as val. Alternatively you could define million as def - in this case after filter GC will delete all created elements and you'll get your memory back.

请注意,只有Stream中的tail是惰性的,head是严格的,因此filter严格评估,直到获得满足给定谓词的第一个元素为止,并且由于 filter遍历所有million流并将所有元素放入内存中.

Note that only tail in Stream is lazy, head is strict, so filter evaluates strictly until it gets first element that satisfies a given predicate, and since there is no such elements in your Stream filter iterates over all your million stream and puts all elements in memory.

这篇关于Stream#filter内存不足,无法保存1,000,000个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 05:17