本文介绍了 pandas DataFrame的序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种快速的方法来对DataFrame进行序列化?

Is there a fast way to do serialization of a DataFrame?

我有一个可以并行运行熊猫分析的网格系统.最后,我想从每个网格作业中收集所有结果(作为一个DataFrame)并将其聚合到一个巨大的DataFrame中.

I have a grid system which can run pandas analysis in parallel. In the end, I want to collect all the results (as a DataFrame) from each grid job and aggregate them into a giant DataFrame.

如何将数据帧保存为可以快速加载的二进制格式?

How can I save data frame in a binary format that can be loaded rapidly?

推荐答案

最简单的方法就是使用 to_pickle (作为 pickle ),请参见从docs api页面进行酸洗:

The easiest way is just to use to_pickle (as a pickle), see pickling from the docs api page:

df.to_pickle(file_name)

另一种选择是使用 HDF5 ,开始时需要做更多工作,但查询方面要丰富得多.

Another option is to use HDF5, slightly more work to get started but much richer for querying.

这篇关于 pandas DataFrame的序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 04:40