问题描述
我正在寻找高效的C ++(或其他快速)来反转一个巨大的哈希表。
I am looking for efficient C++ (or other fast) to invert a huge hash table.
哈希键的数量约为200,000,000;并且每个散列键中的可能元素的数量是大约100,000的。
The number of hash keys is on the order of 200,000,000; and the number of possible elements in each hash key is of the order of 100,000.
我想知道什么是一个好的方式这样的表,现在的元素是键和键是元素。
I'd like to know what'd be a good way to (efficiently) invert such table such that now the elements are the keys and the keys are the elements.
现在我有我的硬盘驱动器中的数据存储在一个名为hash_file.txt 。该文件如下所示:
Right now I have the data in my hard drive stored in a file called hash_file.txt. The file looks like:
>1
T1
T3
T4
T100
>2
T4
T77
T9980
etc.
其中,> 1,...,> 200,000,000是原始哈希表的所有可能键;
和T1,...,T100000是每个键的所有可能元素。
注意:哈希表非常稀疏,每个键只有几百个元素。
Where, >1,...,>200,000,000 are all the possible keys of the original hash table;and T1,...,T100000 are all the possible elements for each key.Note: The hash table is quite sparse with no more of a few hundred elements per key.
输出,反转哈希表在这个示例:
The output, inverted hash table would look like this in this example:
>T1
1
>T3
1
>T100
1
>T4
1
2
>T77
2
>T9980
2
我试过一些天真的代码,并永远,耗尽了mem,所以我正在寻找好的建议
I tried some naive code and took forever, and run out of mem, so I am looking for good suggestions to start with.
推荐答案
这是一个很简单的方法;值得一试(记住,启用优化,但最好不禁用断言; - ))。
This is a pretty simply approach; worth a try (remember to build with optimisation enabled, but preferably not disabling assert ;-)).
#include <iostream>
#include <vector>
#include <cassert>
int main()
{
char c;
int n;
int key = -1;
const int max_t = 100000;
std::vector<std::vector<int>> v(max_t + 1);
while (std::cin >> c >> n)
if (c == '>')
key = n;
else
{
assert(c == 'T');
assert(key != -1);
assert(0 <= n && n < v.size());
v[n].push_back(key);
}
assert(std::cin.eof());
for (int i = 0; i < v.size(); ++i)
{
if (v[i].empty()) continue;
std::cout << ">T" << i << '\n';
for (int j = 0; j < v[i].size(); ++j)
std::cout << v[i][j] << '\n';
}
}
(输出顺序不是字典式的问题...如果你关心你可以寻找/写一个算法在i中迭代,以这种方式镜像词典顺序)
(the output order is numeric not lexicographical like in your question... if you cared you could look for / write an algorithm to iterate in "i" in such a way mirroring lexicographical ordering)
这篇关于用Cpp或其他快速语言反转大型哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!