本文介绍了存储星火Graphx与HDFS一个图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经构建了星火的GraphX​​图。此图将有可能1十亿节点和10个十亿边缘向上,所以我不希望有一遍又一遍建设这个图。

I have constructed a graph in Spark's GraphX. This graph is going to have potentially 1 billion nodes and upwards of 10 billion edges, so I don't want to have to build this graph over and over again.

我想有一次构建它,拯救它(我觉得最好的是HDFS),在其上运行的一些进程,然后访问它在几天或几周,增加一些新的节点和边的能力,并在其上​​运行一些程序。

I want to have the ability to build it once, save it (I think the best is in HDFS), run some processes on it, and then access it in a couple of days or weeks, add some new nodes and edges, and run some more processes on it.

我怎么能做到这一点在Apache中星火的GraphX​​?

How can I do that in Apache Spark's GraphX?

编辑:我想我已经找到了一个潜在的解决方案,但我想有人来确认这是否是最好的方法。

I think I have found a potential solution, but I would like someone to confirm if this is the best way.

如果我有一个图,说,我必须由其vertexRDD及其edgeRDDs单独在一个文本文件中存储的图形。于是,在后来的时候,我可以访问这些文本文件,就像这样:

If I have a graph, say graph, I must store the graph by its vertexRDD and its edgeRDDs separately in a text file. Then, later in time, I can access those text files, like so:

graph.vertices.saveAsTextFile(somePath)
graph.edges.saveAsTextFile(somePath)

一个问题我现在拥有的是:我应该使用saveAsTextFile()或saveAsObjectFile()?然后,我应该怎么在以后的时间访问这些文件?

One question I have now is: should I use saveAsTextFile() or saveAsObjectFile() ? And then how should I access those file at a later time?

推荐答案

GraphX​​还没有图形节约机制。因此,接下来要做的最好的事情是保存边和顶点都并从构造图。如果你的顶点在本质上是复杂的,你应该使用序列文件进行保存。

GraphX does not yet have a graph saving mechanism. Consequently, the next best thing to do is to save both the edges and vertices and construct the graph from that. If your vertices are complex in nature, you should use sequence files to save them.

 vertices.saveAsObjectFile("location/of/vertices")
 edges.saveAsObjectFile("location/of/edges")

和以后,您可以从磁盘读取和构造图。

And later on, you can read from disk and construct the graph.

val vertices = sc.objectFile[T]("/location/of/vertices")
val edges = sc.objectFile[T]("/location/of/edges")
val graph = Graph(vertices, edges)

这篇关于存储星火Graphx与HDFS一个图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 02:12