问题描述
我需要在key不唯一的地方存储大量的 key,value 对。 键和值都是字符串。我的目标是只保存唯一的配对。
我已经尝试使用 List< KeyValuePair<字符串,字符串>>
,但 Contains()
非常慢。
LINQ Any()
看起来更快一些,但仍然太慢。
有没有在通用列表上更快地执行搜索的替代方法?或者,也许我应该使用另一个存储?
我会使用字典< string,HashSet< string> ;>
将一个键映射到它的所有值。
这是一个完整的解决方案。首先,编写一对扩展方法,将(键,值)
对添加到 Dictionary
中,另一个获取所有(键,值)
对。请注意,我使用任意类型的键和值,您可以用 string
替换它,而不会出现问题。
您甚至可以将这些方法写在其他地方,而不是作为扩展名,或者根本不使用方法,只需在程序的某处使用此代码即可。
public static class Program
{
public static void Add TKey,TValue>(
this Dictionary< TKey,HashSet< TValue>> data,TKey key, TValue值)
{
HashSet< TValue> values = null;
if(!data.TryGetValue(key,out values)){
//第一次使用这个键?创建一个新的HashSet
values = new HashSet< TValue>();
data.Add(key,values);
}
values.Add(value);
}
public static IEnumerable< KeyValuePair< TKey,TValue>> KeyValuePairs< TKey,TValue>(
this Dictionary< TKey,HashSet< TValue>> data)
{
return data.SelectMany(k => k.Value,
(k,v)=>新的KeyValuePair< TKey,TValue>(k.Key,v));
$ / code>
现在您可以按如下方式使用它:
public static void Main(string [] args)
{
Dictionary< string,HashSet< string>> data = new Dictionary< string,HashSet< string>>();
data.Add(k1,v1.1);
data.Add(k1,v1.2);
data.Add(k1,v1.1); //已经在,所以这里没有任何反应
data.Add(k2,v2.1);
foreach(data.KeyValuePairs()中的var kv)
Console.WriteLine(kv.Key +:+ kv.Value);
}
这将会列印此项:
k1:v1.1
k1:v1.2
k2:v2.1
如果您的键映射到 List >
,那么您需要自己处理重复项。 HashSet< string>
已经为您做了。
I need to store big amount of key, value pairs where key is not unique. Both key and value are strings. And items count is about 5 million.
My goal is to hold only unique pairs.
I've tried to use List<KeyValuePair<string, string>>
, but the Contains()
is extremely slow.LINQ Any()
looks a little bit faster, but still too slow.
Are there any alternatives to perform the search faster on a generic list? Or maybe I should use another storage?
I would use a Dictionary<string, HashSet<string>>
mapping one key to all its values.
Here is a full solution. First, write a couple of extension methods to add a (key,value)
pair to your Dictionary
and another one to get all (key,value)
pairs. Note that I use arbitrary types for keys and values, you can substitute this with string
without problem.You can even write these methods somewhere else instead of as extensions, or not use methods at all and just use this code somewhere in your program.
public static class Program
{
public static void Add<TKey, TValue>(
this Dictionary<TKey, HashSet<TValue>> data, TKey key, TValue value)
{
HashSet<TValue> values = null;
if (!data.TryGetValue(key, out values)) {
// first time using this key? create a new HashSet
values = new HashSet<TValue>();
data.Add(key, values);
}
values.Add(value);
}
public static IEnumerable<KeyValuePair<TKey, TValue>> KeyValuePairs<TKey, TValue>(
this Dictionary<TKey, HashSet<TValue>> data)
{
return data.SelectMany(k => k.Value,
(k, v) => new KeyValuePair<TKey, TValue>(k.Key, v));
}
}
Now you can use it as follows:
public static void Main(string[] args)
{
Dictionary<string, HashSet<string>> data = new Dictionary<string, HashSet<string>>();
data.Add("k1", "v1.1");
data.Add("k1", "v1.2");
data.Add("k1", "v1.1"); // already in, so nothing happens here
data.Add("k2", "v2.1");
foreach (var kv in data.KeyValuePairs())
Console.WriteLine(kv.Key + " : " + kv.Value);
}
Which will print this:
k1 : v1.1
k1 : v1.2
k2 : v2.1
If your key mapped to a List<string>
then you would need to take care of duplicates yourself. HashSet<string>
does that for you already.
这篇关于泛型列表包含()性能和替代品的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!