本文介绍了Redis或Mongo用于确定数字是否在范围内?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我需要一种方法来快速检查IP地址是否属于许多禁止的IP范围之一.

I need a way to quickly check if an IP address falls into one of many forbidden IP ranges.

我目前使用iptables检查IP是否落在指定范围内.在几千个范围内都可以正常工作,但是这个数字将急剧增加到几十万,并且还将继续增长.

I currently use iptables to check if the IP falls into a specified range. This works fine for several thousand ranges, but that number is about to increase drastically to several hundred thousand, and will continue to grow.

我目前向iptables简单添加新规则的方法的另一个问题是重复项的数量不断增加.

The other issue with my current method of simply adding new rules to iptables is the increasing number of duplicates.

在将IP或范围添加到规则集之前,我需要一种有效的方法来检查IP或范围是否属于现有(较大)范围.

I need an efficient method for checking if an IP or range falls into an existing (larger) range before it's added to the ruleset.

Ruby是我最熟悉的语言,但是对于越来越多的范围,哪种数据结构将是最好的选择?

Ruby is the language I am most familiar with, but what data structures would be the best options for the ever growing number of ranges?

我想出的一个解决方案是使用Redis集或MongoDB将单个IP存储为整数,然后简单地检查IP是否存在于集合中……但是我的直觉告诉我,必须有一种更聪明的方法.

One solution I came up with is to use Redis sets or perhaps MongoDB to store individual IPs as integers, then simply check if the IP exists in the set... but my gut tells me there has to be a smarter way.

如果我要将IP转换为整数并存储范围,那么遍历范围以查看新IP或范围是否可能已被现有的较大范围涵盖的最佳方式是什么?

If I were to convert IPs into integers and store the ranges, what would be the optimal way to run through the ranges to see if a new IP or range might already be encompassed by an existing bigger range?

最后一点:速度比内存成本更为重要.

Final note: speed is more important than memory costs.

推荐答案

与前一个海报相反,我认为您不能通过使用朴素的索引来获得O(log n)复杂性.让我们以mongodb为例.您可以定义两个索引(用于范围的开始和结束属性),但是mongodb仅使用一个索引来解决给定的查询.因此它将不起作用.现在,如果您使用涉及范围的开始和结束属性的单个复合索引,则复杂度将是对数的,以找到要检查的第一个范围,但是,它将变得线性,以找到与查询匹配的最后一个范围.最糟糕的情况是复杂度O(n),当所有存储的范围与您的输入重叠时,您就会拥有它.

Contrary to the previous poster, I don't think you can get O(log n) complexity by using naive indexing. Let's consider mongodb for instance. You could define two indexes (for start and end properties of the ranges), but mongodb will only use one to solve a given query. So it will not work. Now if you use a single compound index involving both start and end properties of the ranges, the complexity will be logarithmic to find the first range to check, but then it will get linear to find the last range matching the query. The worst case complexity is O(n), and you have it when all the stored ranges overlap your input.

另一方面,如果知道要执行的操作,则可以使用Redis排序集模拟排序索引(复杂度为O(log n)). Redis不仅仅是一个简单的键值存储.使用跳过列表实现Redis排序集,得分和值都用于比较项目.

On a side note, using Redis sorted set you can emulate a sorted index (with O(log n) complexity) if you know what you do. Redis is a bit more than a simple key-value store.Redis sorted sets are implemented using a skip list, and both the score and value are used to compare items.

为解决此类问题,需要专用的索引结构.您可能需要看看:

To solve this kind of problem, a dedicated indexing structure is needed. You may want to have a look at:

http://en.wikipedia.org/wiki/Segment_tree 或者 http://en.wikipedia.org/wiki/Interval_tree

如果关注的是空间上的速度,那么将索引展平可能会很有趣.例如,让我们考虑以下范围(仅使用整数来简化讨论):

If the concern is speed over space, then it may be interesting to flatten the index.For instance, let's consider the following ranges (using only integers to simplify the discussion):

A 2-8
B 4-6
C 2-9
D 7-10

可以构建索引非重叠段的稀疏结构.

A sparse structure indexing non overlapping segments can be built.

0  []
2  [A C]
4  [A C B]
7  [A C D]
9  [C D]
10 [D]
11 []

每个条目均包含一个非重叠段的下限作为键,并包含匹配范围的列表或集合作为一个值.条目应使用排序后的容器(树,跳过列表,btree等)建立索引

Each entry contains the lower bound of a non overlapping segment as the key, and the list or set of matching ranges as a value. Entries should be indexed using a sorted container (tree, skip list, btree, etc ...)

要查找与5匹配的范围,我们寻找小于或等于5(在此示例中为4)的第一个条目,并提供了范围列表([A C B])

To find the ranges matching 5, we look for the first entry which is lower or equal to 5 (it will be 4 in this example) and the list of ranges is provided ( [A C B] )

使用这种数据结构,查询的复杂度实际上为O(log n).但是,构建和维护它并非微不足道(且昂贵).可以使用mongodb和Redis来实现.

With this data structure, the complexity of the queries is really O(log n). However it is not trivial (and expensive) to build and maintain it. It can be implemented with both mongodb and Redis.

以下是Redis的示例:

Here is an example with Redis:

> rpush range:2 2-8 2-9
(integer) 2
> rpush range:4 2-8 2-9 4-6
(integer) 3
> rpush range:7 2-8 2-9 7-10
(integer) 3
> rpush range:9 2-9 7-10
(integer) 2
> rpush range:10 7-10
(integer) 1
> zadd range_index 0 range:0 2 range:2 4 range:4 7 range:7 9 range:9 10 range:10
(integer) 6
> zrevrangebyscore range_index 5 0 LIMIT 0 1
1) "range:4"
> lrange range:4 0 -1
1) "2-8"
2) "2-9"
3) "4-6"

这篇关于Redis或Mongo用于确定数字是否在范围内?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-07 02:17