本文介绍了保存 Word2VecModel 时超出 spark.akka.frameSize的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark 的 Word2Vec 来训练一些词向量.培训基本上是有效的,但在保存模型时,我收到一个 org.apache.spark.SparkException 说:

I am using Spark's Word2Vec to train some word vectors. The training is essentially working but when it comes to saving the model I am getting a org.apache.spark.SparkException saying:

作业因阶段失败而中止:序列化任务 1278:0 为 1073394582 字节,超过了允许的最大值:spark.akka.frameSize(134217728 字节) - 保留(204800 字节).考虑增加 spark.akka.frameSize 或为大值使用广播变量.

堆栈跟踪指向第 190 行,但我有可能更改了一些代码,我认为实际上是第 196 行导致了问题:

The stack trace points at line 190, but there is a chance that I changed some of the code and I think it's actually line 196 that causes the problem:

190: val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);
191: 
192: println("Final vocabulary word count: " + model.getVectors.size)
193: println("Output file size:      ~ " + f"$sizeGb%1.4f" + " GB")
154: println("Saving model to " + outputFilePath)
195:
196: model.save(sc, outputFilePath)

从我自己的输出中,我得到了一个估计的模型大小

From my own output I got an estimated model size of

// (vocab-size * vector-size * 4)/(1024^3) = ~ 0.9767 GB
val sizeGb = (model.getVectors.size * arguments.getVectorSize * 4.0)/(1024*1024*1024.0);

接近 1073394582 字节.堆栈跟踪:

which comes close to 1073394582 bytes.The stack trace:

org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 1278:0 was 1073394582 bytes, which exceeds max allowed: spark.akka.frameSize (134217728 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values.
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
    ...
    at org.apache.spark.mllib.feature.Word2VecModel$SaveLoadV1_0$.save(Word2Vec.scala:617)
    at org.apache.spark.mllib.feature.Word2VecModel.save(Word2Vec.scala:489)
    at masterthesis.code.wordvectors.Word2VecOnCluster$.main(Word2VecOnCluster.scala:190)
    at masterthesis.code.wordvectors.Word2VecOnCluster.main(Word2VecOnCluster.scala)

错误消息很清楚,但我不确定我能做些什么.另一方面,我已经保存了大于 125MB(我们的默认帧大小)的模型并且 Spark 没有抱怨..

The error message is clear but I am not sure what I can do about this. On the other hand I have already saved models larger than 125MB (our default frame size) and Spark didn't complain..

我不确定我能对此做些什么..

I am not sure what I can do about this..

推荐答案

就像你的错误日志表明有两种方法可以做到这一点

Just like your error Log suggests there are two ways of Doing this

  • 通过增加spark.akka.frameSize,默认大小为128MB.

  • Either by increasing the spark.akka.frameSize, the default size is 128MB.

您可以参考网络配置文档 或者如果您使用一个独立的外壳,你可以设置它传递参数 --driver-java-options "-Dspark.akka.frameSize=128"

You can refer to the Network Configuration Documentation Or if your using a standalone shell you can set it passing the argument --driver-java-options "-Dspark.akka.frameSize=128"

或者通过使用广播变量适用于大值.

Or by using broadcast variables for large values.

这篇关于保存 Word2VecModel 时超出 spark.akka.frameSize的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-18 15:22