问题描述
谁能给我解释一下静态 HashMap#hash(int) 方法?
Can someone please explain to me the static HashMap#hash(int) method?
生成均匀分布的哈希背后的理由是什么?
What's the justification behind it to generate uniformly distributed hashes?
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
举个例子会更容易理解.
An example would make it easier to digest.
澄清我知道运算符、真值表和按位运算.我真的无法真正解码实现或评论.甚至背后的原因.
ClarificationI'm aware of the operators, truth tables and bitwise operations. I just can't really decode the implementation nor the comment really. Or even the reasoning behind it.
推荐答案
>>>
是逻辑右移(无符号扩展)(JLS 15.19 Shift Operators),^
是按位异或 (JLS 15.22.1 整数按位运算符).
>>>
is the logical right shift (no sign-extension) (JLS 15.19 Shift Operators), and ^
is the bitwise exclusive-or (JLS 15.22.1 Integer Bitwise Operators).
至于这样做的原因,文档提供了一个提示:HashMap
使用 2 的幂长度表,并通过屏蔽掉较高位并仅取其较低位来散列键哈希码.
As to why this is done, the documentation offers a hint: HashMap
uses power-of-two length tables, and hashes keys by masking away the higher bits and taking only the lower bits of their hash code.
// HashMap.java -- edited for conciseness
static int indexFor(int h, int length) {
return h & (length-1);
}
public V put(K key, V value) {
int hash = hash(key.hashCode());
int index = indexFor(hash, table.length);
// ...
}
所以 hash()
试图将相关性带到更高的位,否则会被屏蔽掉(indexFor
基本上丢弃了 h
code> 并且只采用较低的 k
位,其中 length == (1 << k)
).
So hash()
attempts to bring relevancy to the higher bits, which otherwise would get masked away (indexFor
basically discards the higher bits of h
and takes only the lower k
bits where length == (1 << k)
).
将此与 Hashtable
(它不应该是二次幂长度表)使用键的哈希码的方式进行对比.
Contrast this with the way Hashtable
(which should have NOT a power-of-two length table) uses a key's hash code.
// Hashtable.java -- edited for conciseness
public synchronized V get(Object key) {
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length;
// ...
}
通过执行更昂贵的 %
操作(而不是简单的位掩码),Hashtable
的性能对低位分布较差的哈希码不太敏感(特别是如果 table.length
是素数).
By doing the more expensive %
operation (instead of simple bit masking), the performance of Hashtable
is less sensitive to hash codes with poor distribution in the lower bits (especially if table.length
is a prime number).
这篇关于HashMap#hash(int) 方法说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!