本文介绍了通过Python从Impala访问表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在同一cloudera服务器上使用python通过CLI从Impala访问表

I need to access tables from Impala through CLI using python on the same cloudera server

我尝试了以下代码来建立连接:

I have tried below code to establish the connection :

def query_impala(sql):
    cursor = query_impala_cursor(sql)
    result = cursor.fetchall()
    field_names = [f[0] for f in cursor.description]
    return result, field_names


def query_impala_cursor(sql, params=None):
    conn = connect(host='xx.xx.xx.xx', port=21050, database='am_playbook',user='xxxxxxxx', password='xxxxxxxx')
    cursor = conn.cursor()
    cursor.execute(sql.encode('utf-8'), params)
    return cursor

但是由于我在同一台cloudera服务器上,所以不需要提供主机名.您能否提供正确的代码来通过python访问同一服务器上存在的Impala/hive表.

but since I am on the same cloudera server, I will not need to provide the host name. Could you please provide the correct code to access Impala/hive tables existing on the same server through python.

推荐答案

您可以使用pyhive建立与配置单元的连接并访问您的配置单元表.

you can use pyhive to make connection to hive and get access to your hive tables.

from pyhive import hive
import pandas as pd
import datetime

conn = hive.Connection(host="hostname", port=10000, username="XXXX")
hive.connect('hostname', configuration={'hive.execution.engine':'tez'})
query="select col1,col2,col3,col4 from db.yourhiveTable"

start_time= datetime.datetime.now()

data=pd.read_sql(query,conn)
print(data)

end_time=datetime.datetime.now()
print 'Finished reading from Hive table', (start_time-end_time).seconds/60.0,' minutes'

这篇关于通过Python从Impala访问表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:25