本文介绍了如何设置 Zeppelin 以使用远程 EMR Yarn 集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 Amazon EMR Hadoop v2.6 集群和 Spark 1.4.1,以及 Yarn 资源管理器.我想在单独的机器上部署 Zeppelin,以便在没有作业运行时关闭 EMR 集群.

I have Amazon EMR Hadoop v2.6 cluster with Spark 1.4.1, with Yarn resource manager.I want to deploy Zeppelin on separate machine to allow turning off EMR cluster when there is no jobs running.

我尝试按照此处的说明操作 https://zeppelin.incubator.apache.org/docs/install/yarn_install.html收效甚微.

I tried following instruction from here https://zeppelin.incubator.apache.org/docs/install/yarn_install.htmlwith not much of success.

有人可以解开 Zeppelin 如何从不同机器连接到现有 Yarn 集群的步骤吗?

Can somebody demystify steps how Zeppelin should connect to existing Yarn cluster from different machine?

推荐答案

[1] 使用适当的参数安装 Zeppelin:

[1] install Zeppelin with proper params:

git clone https://github.com/apache/incubator-zeppelin.git ~/zeppelin;
cd ~/zeppelin;
mvn clean package -Pspark-1.4 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -DskipTests

[2] 更新 EMR_MASTER EC2 安全组以接受来自所有端口的传入请求,与 Zeppelin 通信(应该是特定端口,尚不知道是哪个)

[2] Update EMR_MASTER EC2 security groups to accept incoming requests from all ports, to communicate with Zeppelin (should be specific port, not yet know which)

[3] 将目录 EMR_MASTER:/etc/hadoop/conf 复制到 MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.

[3] Copy directory EMR_MASTER:/etc/hadoop/conf to MY_STANDALONE_SERVER:/home/zeppelin/hadoop-conf.

[4] zeppelin/conf/zeppelin-env.sh 应该包含:

[4] zeppelin/conf/zeppelin-env.sh should contain:

export MASTER=yarn-client
export HADOOP_CONF_DIR=/home/zeppelin/hadoop-conf

注意:spark.executor.instances 等 Spark 参数取自 Interpreter 设置,在那里指定.

Note: Spark parameters like spark.executor.instances are taken from Interpreter settings, is specified there.

这篇关于如何设置 Zeppelin 以使用远程 EMR Yarn 集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:12