本文介绍了Vec::dedup 不起作用 — 如何对字符串向量进行重复数据删除?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经解析了一个文件,将字符串按行拆分,并希望在每个向量中只留下唯一的元素.我希望 vec.dedup() 像这样工作:

I've parsed a file, split the string by lines and want to leave only unique elements in each vector. I expect vec.dedup() to work like this:

let mut vec = vec!["a", "b", "a"];
vec.dedup();
assert_eq!(vec, ["a", "b"]);

但它失败了:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `["a", "b", "a"]`,
 right: `["a", "b"]`', src/main.rs:4:4

如何删除重复项?

推荐答案

如文档所述,Vec#dedup 仅从向量中删除连续 元素(它比完全重复数据删除便宜得多).例如,如果向量是 vec!["a", "a", "b"],它会正常工作.

As documented, Vec#dedup only removes consecutive elements from a vector (it is much cheaper than a full deduplication). It would work fine if the vector was vec!["a", "a", "b"], for example.

当然,有多种潜在的解决方案.

Of course, there are multiple potential solutions.

为了获得一个去除了所有重复项同时保留元素原始顺序的向量,itertools crate 提供了一个 unique 适配器.

In order to obtain a vector with all duplicates removed while retaining the original order of the elements, the itertools crate provides a unique adaptor.

use itertools::Itertools;

let v = vec!["b", "a", "b"];
let v: Vec<_> = v.into_iter().unique().collect();
assert_eq!(v, ["b", "a"]);

如果元素顺序不重要,您可以先对元素进行排序,然后调用重复数据删除.

If element order is not important, you may sort the elements first and then call dedupe.

let mut v = vec!["a", "b", "a"];
v.sort_unstable();
v.dedup();
assert_eq!(v, ["a", "b"]);

如果快速元素查找很重要,您也可以考虑使用集合类型,例如 HashSet.

If fast element lookup is important, you may also consider using a set type instead, such as HashSet.

let v: HashSet<_> = ["a", "b", "a"].iter().cloned().collect();
let v2: HashSet<_> = ["b", "a"].iter().cloned().collect();
assert_eq!(v, v2);

这篇关于Vec::dedup 不起作用 — 如何对字符串向量进行重复数据删除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 22:57