

在对一些Clojure代码进行压力测试时,我注意到当迭代大数据集时,它会耗尽堆空间。我最终设法将问题追溯到Clojure的 doseq 函数和实现惰性序列的组合。

When stress-testing some Clojure code at work, I noticed it runs out of heap space when iterating over large data-sets. I eventually managed to trace the issues back to the combination of Clojure's doseq function, and implementation fo lazy sequences.


This is the minimal code snippet that crashes Clojure by exhausting available heap space:

(doseq [e (take 1000000000 (iterate inc 1))] (identity e))

doseq 清楚地表明它不保留延迟序列的头,所以我希望上述代码的内存复杂性接近O(1)。有什么我错过了吗?如果 doseq 不能胜任这项工作,那么Clojure-idiomatic对非常大的延迟序列进行迭代的方式是什么?

The documentation for doseq clearly states that it doesn't retain the head of the lazy sequence, so I would expect the memory complexity of the above code to be close to O(1). Is there something I'm missing? What's the Clojure-idiomatic way of iterating over extremely large lazy sequences, if doseq isn't up to the job?



When I run this sample I see the memory usage hit 2.0 Gigs so perhaps you are actually just running out of ram.


it sure does take a while to run:

user=> (time (doseq [e (take 1000000000 (iterate inc 1))] (identity e)))
"Elapsed time: 266396.221132 msecs"


23999 arthur    20   0 4001m 1.2g 5932 S  213 15.3  17:11.35 java                                          
24017 arthur    20   0 3721m 740m 5548 S   88  9.3  13:49.95 java  


10-27 14:17