本文介绍了连续INFO JobScheduler:59-在我的Spark Standalone集群中添加了时间为*** ms的作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用具有8个核心和32GB Ram的Spark独立群集,其中3个节点的群集具有相同的配置.

We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration.

有时流式批处理在不到1秒的时间内完成.有时需要花费10秒钟以上的时间,下面的日志才会显示在控制台中.

Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears in console.

2016-03-29 11:35:25,044  INFO TaskSchedulerImpl:59 - Removed TaskSet 18.0, whose tasks have all completed, from pool 
2016-03-29 11:35:25,044  INFO DAGScheduler:59 - Job 18 finished: foreachRDD at EventProcessor.java:87, took 1.128755 s
2016-03-29 11:35:31,471  INFO JobScheduler:59 - Added jobs for time 1459231530000 ms
2016-03-29 11:35:35,004  INFO JobScheduler:59 - Added jobs for time 1459231535000 ms
2016-03-29 11:35:40,004  INFO JobScheduler:59 - Added jobs for time 1459231540000 ms
2016-03-29 11:35:45,136  INFO JobScheduler:59 - Added jobs for time 1459231545000 ms
2016-03-29 11:35:50,011  INFO JobScheduler:59 - Added jobs for time 1459231550000 ms
2016-03-29 11:35:55,004  INFO JobScheduler:59 - Added jobs for time 1459231555000 ms
2016-03-29 11:36:00,014  INFO JobScheduler:59 - Added jobs for time 1459231560000 ms
2016-03-29 11:36:05,003  INFO JobScheduler:59 - Added jobs for time 1459231565000 ms
2016-03-29 11:36:10,087  INFO JobScheduler:59 - Added jobs for time 1459231570000 ms
2016-03-29 11:36:15,004  INFO JobScheduler:59 - Added jobs for time 1459231575000 ms
2016-03-29 11:36:20,004  INFO JobScheduler:59 - Added jobs for time 1459231580000 ms
2016-03-29 11:36:25,139  INFO JobScheduler:59 - Added jobs for time 1459231585000 ms

请您帮忙解决此问题.

推荐答案

将spark-submit主机从本地更改为本地[2]

Change the spark-submit master from local to local[2]

spark-submit --master local[2] --class YOURPROGRAM YOUR.jar

或设置

new SparkConf().setAppName("SparkStreamingExample").setMaster("local[2]")

如果将数字更改为2后仍然遇到相同的问题,也许应该将其更改为更大的数字.

If you still facing the same problem after changing the number to 2, maybe you should just change it to a bigger number.

参考: http://spark.apache.org/docs/latest/streaming-programming -guide.html

在本地运行Spark Streaming程序时,请勿将"local"或"local [1]"用作主URL.这两种方式均意味着仅一个线程将用于本地运行任务.如果您使用基于接收方的输入DStream(例如套接字,Kafka,Flume等),则将使用单个线程来运行接收方,而不会留下任何线程来处理接收到的数据.因此,在本地运行时,请始终使用"local [n]"作为主URL,其中n>要运行的接收者的数量(有关如何设置主服务器的信息,请参阅Spark属性.)

When running a Spark Streaming program locally, do not use "local" or "local[1]" as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use "local[n]" as the master URL, where n > the number of receivers to run (see Spark Properties for information on how to set the master).

扩展逻辑以使其在集群上运行,分配给Spark Streaming应用程序的内核数必须大于接收器数.否则,系统将接收数据,但无法处理它们.

Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise, the system will receive data, but not be able to process them.

贷记到bit1129: http://bit1129.iteye.com/blog/2174751

Credit to bit1129: http://bit1129.iteye.com/blog/2174751

这篇关于连续INFO JobScheduler:59-在我的Spark Standalone集群中添加了时间为*** ms的作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 23:18