本文介绍了如何将 Chars 迭代器存储在与它正在迭代的 String 相同的结构中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始学习 Rust,我正在努力处理生命周期.

I am just beginning to learn Rust and I’m struggling to handle the lifetimes.

我想要一个带有 String 的结构体,用于缓冲来自 stdin 的行.然后我想在结构上有一个方法,它返回缓冲区中的下一个字符,或者如果该行中的所有字符都已被消耗,它将从 stdin 中读取下一行.

I’d like to have a struct with a String in it which will be used to buffer lines from stdin. Then I’d like to have a method on the struct which returns the next character from the buffer, or if all of the characters from the line have been consumed it will read the next line from stdin.

文档说 Rust 字符串不能按字符索引,因为这在 UTF-8 中效率低下.当我按顺序访​​问字符时,使用迭代器应该没问题.但是,据我所知,Rust 中的迭代器与它们正在迭代的事物的生命周期相关,我无法弄清楚如何将这个迭代器与 String 一起存储在结构中.

The documentation says that Rust strings aren’t indexable by character because that is inefficient with UTF-8. As I’m accessing the characters sequentially it should be fine to use an iterator. However, as far as I understand, iterators in Rust are tied to the lifetime of the thing they’re iterating and I can’t work out how I could store this iterator in the struct alongside the String.

这是我想要实现的伪 Rust.显然它不会编译.

Here is the pseudo-Rust that I’d like to achieve. Obviously it doesn’t compile.

struct CharGetter {
    /* Buffer containing one line of input at a time */
    input_buf: String,
    /* The position within input_buf of the next character to
     * return. This needs a lifetime parameter. */
    input_pos: std::str::Chars
}

impl CharGetter {
    fn next(&mut self) -> Result<char, io::Error> {
        loop {
            match self.input_pos.next() {
                /* If there is still a character left in the input
                 * buffer then we can just return it immediately. */
                Some(n) => return Ok(n),
                /* Otherwise get the next line */
                None => {
                    io::stdin().read_line(&mut self.input_buf)?;
                    /* Reset the iterator to the beginning of the
                     * line. Obviously this doesn’t work because it’s
                     * not obeying the lifetime of input_buf */
                    self.input_pos = self.input_buf.chars();
                }
            }
        }
    }
}

我正在尝试进行 Synacor 挑战.这涉及实现一个虚拟机,其中一个操作码从 stdin 读取字符并将其存储在寄存器中.我有这部分工作正常.文档指出,无论何时 VM 内的程序读取一个字符,它都会一直读取直到读取整行.我想利用这一点在我的实现中添加一个保存"命令.这意味着每当程序要求输入一个字符时,我都会从输入中读取一行.如果该行是save",我将保存 VM 的状态,然后继续获取另一行以提供给 VM.每次 VM 执行输入操作码时,我需要能够从缓冲行开始一次给它一个字符,直到缓冲区耗尽.

I am trying to do the Synacor challenge. This involves implementing a virtual machine where one of the opcodes reads a character from stdin and stores it in a register. I have this part working fine. The documentation states that whenever the program inside the VM reads a character it will keep reading until it reads a whole line. I wanted to take advantage of this to add a "save" command to my implementation. That means that whenever the program asks for a character, I will read a line from the input. If the line is "save", I will save the state of the VM and then continue to get another line to feed to the VM. Each time the VM executes the input opcode, I need to be able to give it one character at a time from the buffered line until the buffer is depleted.

我当前的实现是这里.我的计划是将 input_bufinput_pos 添加到表示 VM 状态的 Machine 结构中.

My current implementation is here. My plan was to add input_buf and input_pos to the Machine struct which represents the state of the VM.

推荐答案

为什么我不能存储值和在同一个结构中引用那个值?,通常你不能这样做,因为它确实不安全.移动内存时,会使引用无效.这就是很多人使用 Rust 的原因 - 没有导致程序崩溃的无效引用!

As thoroughly described in Why can't I store a value and a reference to that value in the same struct?, in general you can't do this because it truly is unsafe. When you move memory, you invalidate references. This is why a lot of people use Rust - to not have invalid references which lead to program crashes!

让我们看看你的代码:

io::stdin().read_line(&mut self.input_buf)?;
self.input_pos = self.input_buf.chars();

在这两行之间,您使 self.input_pos 处于错误状态.如果发生panic,那么对象的析构函数就有机会访问无效内存!Rust 正在保护您免受大多数人从未考虑过的问题.

Between these two lines, you've left self.input_pos in a bad state. If a panic occurs, then the destructor of the object has the opportunity to access invalid memory! Rust is protecting you from an issue that most people never think about.

正如该答案中所述:

有一种特殊情况,即生命周期跟踪过于热情:当你有东西放在堆上时.当您使用例如,Box.在这种情况下,移动的结构包含一个指向堆的指针.指向的值将保留稳定,但指针本身的地址会移动.在实践中,这没关系,因为你总是跟着指针走.

有些 crate 提供了表示这种情况的方法,但它们需要基地址永远不会移动.这排除了变异向量,这可能会导致重新分配和移动堆分配价值.

Some crates provide ways of representing this case, but they requirethat the base address never move. This rules out mutating vectors,which may cause a reallocation and a move of the heap-allocatedvalues.

请记住,String 只是添加了额外前提条件的字节向量.

Remember that a String is just a vector of bytes with extra preconditions added.

我们也可以推出自己的解决方案,而不是使用其中一个板条箱,这意味着我们(阅读)必须承担所有责任,以确保我们没有做错任何事情.

Instead of using one of those crates, we can also roll our own solution, which means we (read you) get to accept all the responsibility for ensuring that we aren't doing anything wrong.

这里的技巧是确保 String 中的数据永远不会移动,并且不会被意外引用.

The trick here is to ensure that the data inside the String never moves and no accidental references are taken.

use std::{mem, str::Chars};

/// I believe this struct to be safe because the String is
/// heap-allocated (stable address) and will never be modified
/// (stable address). `chars` will not outlive the struct, so
/// lying about the lifetime should be fine.
///
/// TODO: What about during destruction?
///       `Chars` shouldn't have a destructor...
struct OwningChars {
    _s: String,
    chars: Chars<'static>,
}

impl OwningChars {
    fn new(s: String) -> Self {
        let chars = unsafe { mem::transmute(s.chars()) };
        OwningChars { _s: s, chars }
    }
}

impl Iterator for OwningChars {
    type Item = char;
    fn next(&mut self) -> Option<Self::Item> {
        self.chars.next()
    }
}

您甚至可以考虑将只是这段代码放入一个模块中,这样您就不会不小心弄乱了内部结构.

You might even think about putting just this code into a module so that you can't accidentally muck about with the innards.

这是使用 ouroboros crate 创建包含 String 和一个 Chars 迭代器:

Here's the same code using the ouroboros crate to create a self-referential struct containing the String and a Chars iterator:

use ouroboros::self_referencing; // 0.4.1
use std::str::Chars;

#[self_referencing]
pub struct IntoChars {
    string: String,
    #[borrows(string)]
    chars: Chars<'this>,
}

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.with_mut(|me| me.chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.with(|me| me.chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.with_mut(|me| me.chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoCharsBuilder {
            string: self,
            chars_builder: |s| s.chars(),
        }
        .build()
    }
}


这是使用 rental crate 创建包含 String 和一个 Chars 迭代器:


Here's the same code using the rental crate to create a self-referential struct containing the String and a Chars iterator:

#[macro_use]
extern crate rental; // 0.5.5

rental! {
    mod into_chars {
        pub use std::str::Chars;

        #[rental]
        pub struct IntoChars {
            string: String,
            chars: Chars<'string>,
        }
    }
}

use into_chars::IntoChars;

// All these implementations are based on what `Chars` implements itself

impl Iterator for IntoChars {
    type Item = char;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next())
    }

    #[inline]
    fn count(mut self) -> usize {
        self.rent_mut(|chars| chars.count())
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.rent(|chars| chars.size_hint())
    }

    #[inline]
    fn last(mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.last())
    }
}

impl DoubleEndedIterator for IntoChars {
    #[inline]
    fn next_back(&mut self) -> Option<Self::Item> {
        self.rent_mut(|chars| chars.next_back())
    }
}

impl std::iter::FusedIterator for IntoChars {}

// And an extension trait for convenience

trait IntoCharsExt {
    fn into_chars(self) -> IntoChars;
}

impl IntoCharsExt for String {
    fn into_chars(self) -> IntoChars {
        IntoChars::new(self, |s| s.chars())
    }
}

这篇关于如何将 Chars 迭代器存储在与它正在迭代的 String 相同的结构中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 11:13