本文介绍了从群集中的其他Azure HDinsight群集访问Hive表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的设置中有一个HDInsight群集,并且将数据存储在Hive表中(数据位于ADLS中的外部表中,而数据位于外部元存储中,并可以通过Azure群集中的Hive服务进行访问).共享的最佳方式是什么 这些数据是否与其他Azure群集(不一定在同一订阅中)一样?

We have an HDInsight cluster in our setup and we are storing the data in Hive tables(The data lies as External tables in ADLS and the metadata in the External metastore and accessed using the Hive service from our Azure cluster). What is the best way to share this data with other Azure clusters, not necessarily within the same subscription?

Azure具有服务主体的概念,因此我们需要设置ACL,以允许另一个群集的服务主体访问与我们共享的配置单元表相对应的ADLS文件夹.此外,如何使用集群的hiveserver2网址 作为其他Azure实例的jdbc连接,以便它们可以查询数据?我们应该为他们提供什么集群登录名,以便他们能够使用HiveServer2查询Hive表中的数据?

Azure has this concept of Service principals, so we’d need to setup the acls to allow the other cluster’s service principal access to the ADLS folders corresponding to the hive tables that we share. Additionally, how can our cluster’s hiveserver2 url be used as jdbc connection by the other Azure instances, so that they can query the data? What cluster login should we provision for them to be able to use our HiveServer2 to query the data in our Hive tables?

我知道正确的方法是使用Azure ESP服务,但这显然是一个昂贵的选择.

I understand the right way to do this would be to use the Azure ESP service, but that is apparently a costly choice.

为他们提供仅对ADLS文件夹的访问权限似乎也不正确,因为元数据随后不用于访问数据...

Providing them access only to the ADLS folders also seems incorrect as the metadata is then not used for accessing the data...

推荐答案

关于Hiverserver2连接性,请参阅将Apache Beeline客户端与Apache Hive一起使用.

Regarding Hiverserver2 connectivity, Refer toUse the Apache Beeline client with Apache Hive.

让我们知道是否有帮助.


这篇关于从群集中的其他Azure HDinsight群集访问Hive表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 05:54