从外部服务器将Python数据框插入Hive

本文介绍了从外部服务器将Python数据框插入Hive的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用PyHive(Python3.6)将数据读取到Hive群集之外的服务器，然后使用Python进行分析.

I'm currently using PyHive (Python3.6) to read data to a server that exists outside the Hive cluster and then use Python to perform analysis.

执行分析后，我想将数据写回到Hive服务器.在寻找解决方案时，大多数帖子都使用PySpark处理.从长远来看，我们将设置我们的系统以使用PySpark.但是，在短期内，是否有一种方法可以使用Python从群集外部的服务器轻松地将数据直接直接写入Hive表?

After performing analysis I would like to write data back to the Hive server. In searching for a solution, most posts deal with using PySpark. In the long term we will set up our system to use PySpark. However, in the short term is there a way to easily write data directly to a Hive table using Python from a server outside of the cluster?

感谢您的帮助！

推荐答案

您可以写回.将df的数据转换为这种格式，就像您一次将多行插入到表中一样.例如插入表值(数据帧以逗号分隔的第一行)，(第二行)，(第三行).... 很快;因此您可以插入.

You can write back. Convert data of df into such format like you are inserting multiple rows into the table at once eg.. insert into table values (first row of dataframe comma separated ), (second row), (third row).... so on;thus you can insert.

bundle=df.assign(col='('+df[df.col[0]] + ','+df[df.col[1]] +...+df[df.col[n]]+')'+',').col.str.cat(' ')[:-1]

con.cursor().execute('insert into table table_name values'+ bundle)

您就完成了.

这篇关于从外部服务器将Python数据框插入Hive的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！