NoSQL大规模保留图形的解决方案

本文介绍了NoSQL大规模保留图形的解决方案的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我迷上了使用Python和NetworkX分析图的方法，随着我学到更多，我想使用越来越多的数据(猜测我正在成为一个数据迷:-).最终，我认为NetworkX图形(作为dict的字典存储)将超过系统上的内存.我知道我可能只能增加更多的内存，但是我想知道是否有办法将NetworkX与Hbase或类似的解决方案集成在一起?

I'm hooked on using Python and NetworkX for analyzing graphs and as I learn more I want to use more and more data (guess I'm becoming a data junkie :-). Eventually I think my NetworkX graph (which is stored as a dict of dict) will exceed the memory on my system. I know I can probably just add more memory but I was wondering if there was a way to instead integrate NetworkX with Hbase or a similar solution?

我环顾四周，找不到任何东西，但是我也找不到与允许简单的MySQL后端相关的任何东西.

I looked around and couldn't really find anything but I also couldn't find anything related to allowing a simple MySQL backend as well.

这可能吗?是否存在任何可以连接到某种持久性存储的东西?

Is this possible? Does anything exist to allow for connectivity to some kind of persistant storage?

谢谢！

更新:我记得在面向初创企业的社交网络分析"中看到过这个主题，作者谈论了其他存储方法(包括hbase，s3等)，但没有显示如何执行此操作或可能的话. /p>

Update: I remember seeing this subject in 'Social Network Analysis for Startups', the author talks about other storage methods(including hbase, s3, etc..) but does not show how to do this or if its possible.

推荐答案

有两种一般类型的用于存储图形的容器:

There are two general types of containers for storing graphs:

真实图形数据库:，例如 Neo4J ， agamemnon ， GraphDB 和 AllegroGraph ;这些不仅存储图形，而且还了解图形，因此，例如，您可以查询这些图形.数据库，例如，最短路径之间有多少个节点节点X和节点Y ?

true graph databases: e.g., Neo4J, agamemnon, GraphDB, and AllegroGraph; these not only store a graph but they also understand that a graph is, so for instance, you can query thesedatabases e.g., how many nodes are between the shortest path fromnode X and node Y?

静态图容器:Twitter的适用于MySQL的FlockDB是此处最著名的示例.这些数据库可以存储和检索图就好了；但是要查询图形本身，您必须先从数据库中检索图形，然后使用一个库(例如，Python的出色的Networkx)来查询图形本身.

static graph containers: Twitter's MySQL-adapted FlockDB is the most well-known exemplar here. These DBs can store and retrievegraphs just fine; but to query the graph itself, you have to firstretrieve the graph from the DB then use a library (e.g., Python'sexcellent Networkx) to query the graph itself.

我下面讨论的基于redis的图形容器属于第二类，尽管显然 redis-graph ，一个非常小的python软件包，用于在Redis中实现图形数据库.

The redis-based graph container i discuss below is in the second category, though apparently redis is also well-suited for containers in the first category as evidenced by redis-graph, a remarkably small python package for implementing a graph database in redis.

redis 在这里可以很好地工作.

redis will work beautifully here.

Redis 是适用于生产的耐用，耐用的数据存储，但它也很容易用于命令行分析

Redis is a heavy-duty, durable data store suitable for production use, yet it's also simple enough to use for command-line analysis.

Redis与其他数据库的不同之处在于它具有多种数据结构类型.我在这里推荐的是 hash 数据类型.使用此redis数据结构，您可以非常接近地模仿字典列表"，这是一种用于存储图形的常规架构，其中列表中的每个项目都是边缘字典，这些边缘被键入到这些边缘所源自的节点.

Redis is different than other databases in that it has multiple data structure types; the one i would recommend here is the hash data type. Using this redis data structure allows you to very closely mimic a "list of dictionaries", a conventional schema for storing graphs, in which each item in the list is a dictionary of edges keyed to the node from which those edges originate.

您需要先安装 redis 和python客户端. DeGizmo博客具有出色的运行"教程，包括安装它们的分步指南.

You need to first install redis and the python client. The DeGizmo Blog has an excellent "up-and-running" tutorial which includes a step-by-step guid on installing both.

一旦安装了redis及其python客户端，请启动redis服务器，您可以这样做:

Once redis and its python client are installed, start a redis server, which you do like so:

cd 到安装Redis的目录(如果通过 make install安装，则在'nix上为/usr/local/bin em>);下一个

cd to the directory in which you installed redis (/usr/local/bin on 'nix if you installed via make install); next

键入 redis-server ，然后输入

您现在应该在外壳程序窗口中看到服务器日志文件的尾巴

you should now see the server log file tailing on your shell window

>>> import numpy as NP
>>> import networkx as NX

>>> # start a redis client & connect to the server:
>>> from redis import StrictRedis as redis
>>> r1 = redis(db=1, host="localhost", port=6379)

在下面的代码段中，我存储了一个四节点图；下面的每一行在redis客户端上调用 hmset ，并存储一个节点和连接到该节点的边("0" =>无边，"1" =>边). (当然，实际上，您会在一个函数中抽象这些重复的调用；在这里我展示每个调用，因为这样可能更容易理解.)

In the snippet below, i have stored a four-node graph; each line below calls hmset on the redis client and stores one node and the edges connected to that node ("0" => no edge, "1" => edge). (In practice, of course, you would abstract these repetitive calls in a function; here i'm showing each call because it's likely easier to understand that way.)

>>> r1.hmset("n1", {"n1": 0, "n2": 1, "n3": 1, "n4": 1})
      True

>>> r1.hmset("n2", {"n1": 1, "n2": 0, "n3": 0, "n4": 1})
      True

>>> r1.hmset("n3", {"n1": 1, "n2": 0, "n3": 0, "n4": 1})
      True

>>> r1.hmset("n4", {"n1": 0, "n2": 1, "n3": 1, "n4": 1})
      True

>>> # retrieve the edges for a given node:
>>> r1.hgetall("n2")
      {'n1': '1', 'n2': '0', 'n3': '0', 'n4': '1'}

现在该图已保留，请从redis数据库中将其检索为NetworkX图.

Now that the graph is persisted, retrieve it from the redis DB as a NetworkX graph.

执行此操作的方法很多，下面以两个 *步骤*进行了操作:

There are many ways to do this, below did it in two *steps*:

从redis数据库中提取数据到邻接矩阵，实现为2D NumPy数组；然后

extract the data from the redis database into an adjacency matrix,implemented as a 2D NumPy array; then

使用NetworkX将其直接转换为NetworkX图内置功能:

convert that directly to a NetworkX graph using a NetworkXbuilt-in function:

简化为代码，这两个步骤是:

reduced to code, these two steps are:

>>> AM = NP.array([map(int, r1.hgetall(node).values()) for node in r1.keys("*")])
>>> # now convert this adjacency matrix back to a networkx graph:
>>> G = NX.from_numpy_matrix(am)

>>> # verify that G in fact holds the original graph:
>>> type(G)
      <class 'networkx.classes.graph.Graph'>
>>> G.nodes()
      [0, 1, 2, 3]
>>> G.edges()
      [(0, 1), (0, 2), (0, 3), (1, 3), (2, 3), (3, 3)]

结束redis会话后，您可以像这样从客户端关闭服务器:

When you end a redis session, you can shut down the server from the client like so:

>>> r1.shutdown()

redis在关闭之前将其保存到磁盘，因此这是确保所有写入均得以持久的好方法.

redis saves to disk just before it shuts down so this is a good way to ensure all writes were persisted.

那么redis数据库在哪里?它以默认文件名存储在默认位置，该默认文件名是主目录中的 dump.rdb .

So where is the redis DB? It is stored in the default location with the default file name, which is dump.rdb on your home directory.

要更改此设置，请编辑 redis.conf 文件(包含在redis源分发中)；转到以以下内容开头的行:

To change this, edit the redis.conf file (included with the redis source distribution); go to the line starting with:

# The filename where to dump the DB
dbfilename dump.rdb

将dump.rdb更改为所需的任何内容，但保留.rdb扩展名.

change dump.rdb to anything you wish, but leave the .rdb extension in place.

接下来，要更改文件路径，请在redis.conf中找到以下行:

Next, to change the file path, find this line in redis.conf:

# Note that you must specify a directory here, not a file name

下面的行是redis数据库的目录位置.编辑它，使其背诵您想要的位置.保存您的修订并重命名此文件，但保留.conf扩展名.您可以将该配置文件存储在任意位置，在启动Redis服务器时，只需在同一行上提供此自定义配置文件的完整路径和名称即可:

The line below that is the directory location for the redis database. Edit it so that it recites the location you want. Save your revisions and rename this file, but keep the .conf extension. You can store this config file anywhere you wish, just provide the full path and name of this custom config file on the same line when you start a redis server:

因此，下次启动Redis服务器时，必须这样做(在shell提示符下:

So the next time you start a redis server, you must do it like so (from the shell prompt:

$> cd /usr/local/bin    # or the directory in which you installed redis

$> redis-server /path/to/redis.conf

最后， Python包索引列出了专门用于在Redis中实现图形数据库的包.该程序包称为 redis-graph ，我还没有使用它.

Finally, the Python Package Index lists a package specifically for implementing a graph database in redis. The package is called redis-graph and i have not used it.

这篇关于NoSQL大规模保留图形的解决方案的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！