本文介绍了冲突解决:二次探测与分离链的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好了,我一直在做与哈希表和不同的冲突解决问题的一些实验。我试图找出哪些是做的发现,即使用单独的链接或二次探测冲突解决一个哈希表更有效。我的研究结果表明,单独的链接是比小负荷因素,如0.4或0.2二次探查得更快。这样的话或者我的结果错了吗?

Ok, so I've been doing some experiments with hash tables and different collision resolution problems. I'm trying to figure out which is more efficient for doing finds, a hash table that uses separate chaining or quadratic probing for collision resolution. My results suggest that separate chaining is faster than quadratic probing even for small load factors such as 0.4 or 0.2. Is this the case or are my results wrong?

推荐答案

在之间这两种方法都是对的处理成本
的区别   (带链接)
   - 是间接的,即指针引用

   (含二次探测)
   - 一个[简单但复合]数学公式
评价   - 索引到新位置
   - 可能的重复其(由于存储在这些位置的测头值和非目标值之间的碰撞;东西链不必担心

The difference in processing cost between the two approaches are that of
  (with chaining)
  - an indirection, i.e. pointer dereferencing
vs.
  (with quadratic probing)
  - evaluation of a [simple but composite] arithmetic formula
  - indexing to the new location
  - possible repeats thereof (due to collision between the probe value and non-target values stored at these locations; something chaining doesn't have to worry about.

因此,它应该是小惊喜的链接速度更快;指针引用是大多数CPU,具有可比性(在大多数情况下相同)的索引到阵列中的本机指令,而使算术运算和可能的碰撞的开销探测失宠。最简单的探测序列的公式将需要一些CPU指令(初始化stepNr的stepNr的一些典型的转移,增加了当前的位置/探头),这本身就是很容易比指针废弃慢好几倍。 (POSS警告:请参阅编辑此后不久,因为它讨论了如何链接可能会招致更多的CPU一级缓存未命中,因此使得它比的线性的探测效率低)

It should therefore be little surprise that chaining is faster; pointer dereferencing is a "native" instruction of most CPUs, comparable (identical in most cases) to that of indexing into the array, leaving the arithmetic operations and possible collisions as overhead in disfavor of probing. The simplest of probing sequence's formula will require a few CPU instructions (initialize stepNr, typically some shifting of the stepNr, adding to current location/probe) which in of itself is readily several times slower than pointer dereferencing. (Poss. caveat: please see 'Edit' shortly thereafter, as it discusses how chaining may incur more CPU-level cache misses hence making it less efficient than linear probing)

二次(或其他形式)链的优势

The advantages of quadratic (or other forms of) chaining are

  • 存储管理(无动力分配)简单的逻辑
  • 快插(理性的简单存储)
  • 一般
  • 降低存储需求
  • simpler logic for storage management (no dynamic allocation)
  • faster inserts (for reason of simpler storage)
  • reduced storage requirement in general

思考这个空间与速度(或也将及时与搜索时间)妥协的非常笼统的,链接(大多为指针本身的存储开销,不考虑可能的堆管理开销)是用于存储pre-计算值[什么是与探测]探针位置。由于这些计算是很容易做到,链接方法是快了搜索时间。
修改(感谢,蚂蚁Aasma)
一个警告这样的说法[约pre计算的位置]是,在现代的CPU和他们的缓存,运行一个小的计算的成本可能比访问[数据]内存时,高速缓存未命中要少得多。这表明,探测的顺序(或更一般地与探测其产生的位置物理地接近碰撞点功能)能胜过高速缓存未命中的较低比例的,因为一个链接的策略。鉴于此,在纯顺序探测的方法是最好的探测功能,因为它很简单的计算,但更重要的,因为它最大限度地提高缓存命中的几率。考虑到这一点,当散列函数具有良好的分配和负载因子是小的(因此与过去的初始碰撞短/本地搜索路径)的 1应该与线性实验(或非常本地)探测方法;但应避免探测功能它提供了一个搜索路径,这不是身体局部。


Thinking about this Space vs. Speed (or also Insert-time vs. Search-time) compromise in very broad terms, the storage overhead of chaining (mostly for the pointers themselves, not considering possible heap-management overhead) is used for storing pre-calculated values of [what would be with probing] "probe locations". Since these calculations are readily done, the chaining approach is faster at search-time.
Edit (thanks, Ants Aasma)
A caveat to this argument [about pre-calculated locations] is that on modern CPUs and their caches, the cost of running a small calculation can be much less than that of accessing [data] memory when the cache misses. This suggests that probing sequentially (or more generally with probing functions which produce locations physically close to the collision spot) could outperform a chaining strategy because of the lower ratio of cache misses. In this light, the purely sequential probing approach is the best of the probing functions, because of its very simple calculation but more importantly because it maximizes the odds of cache hits. With this in mind, when the hash function has a good distribution and the load factor is small (hence with a short/local search path past an initial collision) one should experiment with a linear (or very local) probing approach; one should however avoid probing functions which provide a search path that's not physically local.


这是很难的意见专门对这个问题中提到的实验,例如不知道散列的大小(如果该尺寸相匹配的单词/ CPU中的寄存器,运算能更快),或不知道的碰撞率(我们假设一个良好的,以及分布式哈希函数)。
当你跟上这种尝试也将是有趣的收集一组独立的计时/统计数据访问与他们的哈希插槽与那些产生冲突的项目。

It is hard to comment specifically on the experiment mentioned in the question, for example not knowing the size of the hash (if this size matches that of words/registers in the CPU, the arithmetic can be faster), or not knowing the collision ratio (let's assume a good, well distributed hash function).
As you keep experimenting with this it would be interesting to gather a separate set of timings/statistics for accessing items with their hash-slot vs. the ones that produced a collision.

甚至的中的 ...即使是小负荷的因素... 的表示你的期望链的相对优势,要进一步加强与负载,因此作为冲突变得越来越多。我也希望这会出现这种情况。
此外,增加负荷可以说明尚未探测的另一个缺点是:探测循环时,案件和/或更普遍的时候,那里已经没有空间,以适应特定的项目

The "even" in "...even for small load factors..." indicates your expectation that the relative advantage of chaining should further increase with the load, hence as the collisions become more numerous. I too expect this to would be the case.
Furthermore, increasing the load may illustrate yet another drawback of probing: cases when probing cycles and/or more generally when there's no room to fit a particular item.

这篇关于冲突解决:二次探测与分离链的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 05:14