本文介绍了用Cpp或其他快速语言反转大型哈希表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找高效的C ++(或其他快速)来反转一个巨大的哈希表。

I am looking for efficient C++ (or other fast) to invert a huge hash table.

哈希键的数量约为200,000,000;并且每个散列键中的可能元素的数量是大约100,000的。

The number of hash keys is on the order of 200,000,000; and the number of possible elements in each hash key is of the order of 100,000.

我想知道什么是一个好的方式这样的表,现在的元素是键和键是元素。

I'd like to know what'd be a good way to (efficiently) invert such table such that now the elements are the keys and the keys are the elements.

现在我有我的硬盘驱动器中的数据存储在一个名为hash_file.txt 。该文件如下所示:

Right now I have the data in my hard drive stored in a file called hash_file.txt. The file looks like:

>1
T1
T3
T4
T100
>2
T4
T77
T9980
etc.

其中,> 1,...,> 200,000,000是原始哈希表的所有可能键;
和T1,...,T100000是每个键的所有可能元素。
注意:哈希表非常稀疏,每个键只有几百个元素。

Where, >1,...,>200,000,000 are all the possible keys of the original hash table;and T1,...,T100000 are all the possible elements for each key.Note: The hash table is quite sparse with no more of a few hundred elements per key.

输出,反转哈希表在这个示例:

The output, inverted hash table would look like this in this example:

>T1
1
>T3
1
>T100
1
>T4
1
2
>T77
2
>T9980
2

我试过一些天真的代码,并永远,耗尽了mem,所以我正在寻找好的建议

I tried some naive code and took forever, and run out of mem, so I am looking for good suggestions to start with.

推荐答案

这是一个很简单的方法;值得一试(记住,启用优化,但最好不禁用断言; - ))。

This is a pretty simply approach; worth a try (remember to build with optimisation enabled, but preferably not disabling assert ;-)).

#include <iostream>
#include <vector>
#include <cassert>

int main()
{
    char c;
    int n;
    int key = -1;
    const int max_t = 100000;
    std::vector<std::vector<int>> v(max_t + 1);
    while (std::cin >> c >> n)
        if (c == '>')
            key = n;
        else
        {
            assert(c == 'T');
            assert(key != -1);
            assert(0 <= n && n < v.size());
            v[n].push_back(key);
        }
    assert(std::cin.eof());
    for (int i = 0; i < v.size(); ++i)
    {
        if (v[i].empty()) continue;
        std::cout << ">T" << i << '\n';
        for (int j = 0; j < v[i].size(); ++j)
             std::cout << v[i][j] << '\n';
    }
}

(输出顺序不是字典式的问题...如果你关心你可以寻找/写一个算法在i中迭代,以这种方式镜像词典顺序)

(the output order is numeric not lexicographical like in your question... if you cared you could look for / write an algorithm to iterate in "i" in such a way mirroring lexicographical ordering)

这篇关于用Cpp或其他快速语言反转大型哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 03:07