本文介绍了对HDFS的火花需求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,有人可以向我解释一下,Apache"Spark Standalone"是否需要HDFS?

Hi can anyone explain me, does Apache 'Spark Standalone' need HDFS?

如果需要,在Spark应用程序执行期间Spark如何使用HDFS块大小.我的意思是试图了解在Spark应用程序执行期间HDFS的角色.

If it's required how Spark uses the HDFS block size during the Spark application execution.I mean am trying to understand what will be the HDFS role during Spark application execution.

Spark文档说处理并行性是通过RDD分区和执行器/内核控制的.

Spark documentation says that the processing parallelism is controlled through RDD partitions and the executors/cores.

任何人都可以帮助我理解.

Can anyone please help me to understand.

推荐答案

在不使用HDFS的情况下,Spark可以毫无问题地工作,并且最肯定地,核心执行不需要它.

Spark can work without any issues without using HDFS and most certainly it is not required for core execution.

检查点初始化需要一些分布式存储(不一定是HDFS),对于保存结果很有用.

Some distributed storage (not necessarily HDFS) is required for checkpoiniting and is useful for saving results.

这篇关于对HDFS的火花需求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 02:38