本文介绍了有效进行大规模图形遍历的数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的二部有向图数据集(约2000万个元素)。在当前使用的情况下,我运行遍历算法,每次遍历约500,000个节点。该算法可以正常工作,但是从历史上看,会耗尽从文本文件加载到内存中的数据。

I have a large bipartite directed graph dataset (~20 million elements). In its current use, I run traversal algorithms that hit ~500,000 nodes per run. The algorithms work, but historically run off of data loaded to memory from text files.

文本文件似乎是一种不好的方法,因此我将数据作为邻接关系转移到mongoDB中列表,即。

Text files seem like a bad approach, so I transferred the data into mongoDB as an adjacency list, ie.

{ _id: 1, children: [2, 3] }
{ _id: 2, children: [4] }
{ _id: 3, children: [5, 6, 7] }

这可行,但是我觉得该模型对于我正在做的事情效率低下。在伪代码中,从_ id:1 开始的广度优先搜索的查询结构如下:

This works, but I feel like the model is inefficient for what I'm doing. In pseudocode, the query structure for a breadth-first search starting from _id: 1 would look like:

children = getChildren(_id = 1)
for child in children
    grandchildren = getChildren(_id = child)
    // etc., either recursively or as a nested loop

数据库的问题是没有逻辑连接节点。每个查询都必须遍历索引树,如果我没记错的话,它是O(log N)。文本文件加载后为O(1),因为我可以制定一些简单的查找规则以直接指向节点子节点。

The issue I have with the database is that there's no logic connecting nodes. Every query has to traverse the index tree, which, if I'm not mistaken, is O(log N). The text file approach is O(1) after loading, as I can make some simple lookup rules to point directly to node children.

TL; DR 是否可以使用数据库在O(1)时间内遍历大型网络?

TL;DR Is there a way to traverse a large network in O(1) time using a database?

推荐答案

您可以尝试使用,一个NoSQL图形数据库。我没有用过,但它保证了高性能。

You could try using Neo4J, a NoSQL graph database. I haven't used it, but it promises high performance.

这篇关于有效进行大规模图形遍历的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-02 15:28