Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: RewindIterator for bidirectional iterators #2896

Closed
wants to merge 3 commits into from

Conversation

jonhoo
Copy link
Contributor

@jonhoo jonhoo commented Apr 2, 2020

Rendered


Today, we have std::iter::Peekable, which allows a single-element look-ahead in iterators. Peekable is useful, but it must buffer one element of the iterator, since it has now knowledge of the underlying iterator implementation. It also only allows peeking at a single element, and only provides a reference to it at that. This means that if you peek ahead in an iterator that provides mutable access, you cannot mutate the peeked-at element.

Some iterators can provide much more robust navigation options for the iterator, and can do so efficiently. The most immediately useful of these is the ability to rewind the iterator: effectively undoing the last call to Iterator::next. An iterator that supports rewinding allows the user to move both forwards and backwards through the underlying element stream. This in turn allows peeking all the way to the end of the iterator if need be, and allows iterating over the same set of elements more than once.

Beyond peeking, the most obvious use-cases for rewinding is for iterators where consecutive elements are related in some fashion. For example, for a BTreeMap<(Game, Team, Player), usize>, the user may wish to compute aggregate statistics for each team for a game, but then also compute something per player that depends on the team statistics. Currently, developers are forced to either do multiple lookups into the map, or iterate over the whole map multiple times. With a rewinding iterator, neither is necessary; the developer can simply rewind the iterator after computing the team statistics, and then iterate again to enumerate the players.

@Centril Centril added A-iterators Iterator related proposals & ideas T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-collections Proposals about collection APIs labels Apr 2, 2020
@RustyYato
Copy link

RustyYato commented Apr 2, 2020

You could do this with Clone, for example with slices

fn are_all_unique(slice: &[i32]) -> bool {
    let mut iter = slice.iter();
    
    while let Some(item) = iter.next() {
        let rest = iter.clone();
        if rest.find(|x| x == item) {
            return false
        }
    }

    true
}

This works with all std iterators over collections that return references, and even with combinators provided that the closures/iterators passed into the combinators are also Clone. For all other iterators, for example iterators over values, if it is possible to implement RewindIterator it should also be possible to implement Clone. So I see no additional value in this.

Using clone has the added benefit that it usually is just copying some pointers, so it's really cheap. However, rewinding to a previous state can get really expensive.

I was able to convert your print_mvp function to use Clone instead of previous, and found a unnecessary usage of previous in the process! I think this should make it clear enough that RewindIterator isn't just unnecessary, but it also actively worse than using Clone. By using Clone, Rust can warn you for unused variables and unused assignments, which can allow you to aggressively eliminate them. With previous, Rust can't know which are necessary because previous may contain side-effects.

edit:

Many iterators implement Clone, which essentially provides something equivalent to RewindIterator. However, not all iterators can implement Clone. Primary among these are all iterators that mutably borrow the underlying data structure, since they cannot be cloned.

Note: if an iterator yields exclusive references, then it is unsound for it to implement RewindIterator and uphold RewindIterator's contract, because that would require it to yield aliasing exclusive references. This is why they don't implement Clone. The vast majority of other iterators don't hold an exclusive reference, so it should be fine to clone them.

Now, for owning iterator, it may be expensive to clone them, but that's a cost that must be paid regardless. I think your idea of snapshot iterators is interesting, and could mitigate this cost.

One could imagine something like a SnapshotIterator, which allows you to get a "temporary" iterator that rewinds the source iterator automatically when it is dropped, but this would probably require generic associated types to let us tie the returned snapshot iterator to the mutable borrow of the original iterator.

You could build the idea of a snapshot iterator on top of the current implementation, for example a std::slice::IterMut could yield a snapshot as a std::slice::Iter that would iterate over the rest of the slice. A signature like this should suffice

impl<'a, T> std::slice::IterMut<'a, T> {
    fn snapshot(&self) -> std::slice::Iter<'_, T> { ... }
}

edit 2:

I guess the snapshot method above could be approximated as iter.as_slice().iter() when rust-lang/rust#58957 stabilizes. But the same idea could be applied to the other collections.

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 2, 2020

Note: if an iterator yields exclusive references, then it is unsound for it to implement RewindIterator and uphold RewindIterator's contract, because that would require it to yield aliasing exclusive references.

I do not believe that that is true. Since the returned mutable references are tied to the lifetime of the mutable reference to the iterator, the borrow checker would catch any attempt to continue using the values yielded by previous calls to the iterator after calling previous. This is exactly what today happens if you try to use the value returned from a past call to next after calling next a second time.

I agree with you that RewindIterator is only needed for iterators that yield mutable references. For other iterators, Clone is likely to be sufficient. Though I don't think that's an argument for why RewindIterator should not exist. Being able to move back mutable iterators can yield big benefits when you need it. You do make a good point though that the example isn't great since it's also solvable with Clone. I'll try to come up with one that cannot be solved with Clone.

Now, for owning iterator, it may be expensive to clone them, but that's a cost that must be paid regardless.

I don't think owning iterators will be able to implement RewindIterator, since they already yielded the owner element in next, and therefore cannot yield it again.

You could build the idea of a snapshot iterator on top of the current implementation, for example a std::slice::IterMut could yield a snapshot as a std::slice::Iter that would iterate over the rest of the slice. A signature like this should suffice.

Yes, you can implement it for a specific type (i.e., without a trait) just fine today. But you cannot (I think) make a trait for it, since that would require generic associated types as mentioned in the RFC.

@frankmcsherry
Copy link

@KrishnaSannasi

The use of clone seems pretty limited. In this case, there is structure that allows you to know when to clone (at the start of each game), but if you wanted to allow more general navigation, I don't think your approach would be able to efficiently replace a RewindIterator implementation.

A simple litmus test would be to implement a Turing machine, where RewindIterator is used to access a read-only tape. You would need to consult the state, and based on what value the iterator returns, either call next() or previous(). If you can emulate that with clone that would be a more compelling argument for me (I don't see how you could, which is already a good argument that RewindIterator is at the very least more ergonomic; but I think it is also more expressive than clone on iterators).

@RustyYato
Copy link

RustyYato commented Apr 2, 2020

I do not believe that that is true. Since the returned mutable references are tied to the lifetime of the mutable reference to the iterator, the borrow checker would catch any attempt to continue using the values yielded by previous calls to the iterator after calling previous. This is exactly what today happens if you try to use the value returned from a past call to next after calling next a second time.

What I'm talking about is something like

let first = iter.next().unwrap();
let second = iter.previous().unwrap();
let third = iter.next().unwrap();

Here either first and second alias or first and third alias.

I agree with you that RewindIterator is only needed for iterators that yield mutable references. For other iterators

No, it would be unsound for mutable iterators to implement RewindIterator, see example above.

Yes, you can implement it for a specific type (i.e., without a trait) just fine today. But you cannot (I think) make a trait for it, since that would require generic associated types as mentioned in the RFC.

I think you could do it with a trait like this:

trait SnapshotIterator<'a>: Iterator {
    type Snapshot: 'a;
    fn snapshot(&'a self) -> Self::Snapshot;
}

But this has worse ergonomics than GATs

The use of clone seems pretty limited. In this case, there is structure that allows you to know when to clone (at the start of each game), but if you wanted to allow more general navigation, I don't think your approach would be able to efficiently replace a RewindIterator implementation.

You do make a good point though that the example isn't great since it's also solvable with Clone. I'll try to come up with one that cannot be solved with Clone.

You can always clone right before you call next to get the same effect as previous. You can then reset the iterator with the clone as needed. They are isomorphic.

I didn't fully understand what @frankmcsherry was suggesting before I previously commented. Yes, a turing machine would be difficult to implement using just Clone on an arbitrary iterator, but I don't think that's a good litmus test. In most cases you know about the structure of iteration, which means that Clone can be efficiently used.

@RustyYato
Copy link

@frankmcsherry please also note that rewind is likely more expensive than clone. Clone will almost always just boil down to cloning some pointers, (esp. for std collection iterators, and those iterators built atop those). So unless rewind can be cheaper than moving some pointers around (which I don't think is possible for more intricate data structures like *Map), it will be cheaper to clone even if that clone is usually not used.

@CAD97
Copy link

CAD97 commented Apr 2, 2020

Since the returned mutable references are tied to the lifetime of the mutable reference to the iterator, the borrow checker would catch any attempt to continue using the values yielded by previous calls to the iterator after calling previous. This is exactly what today happens if you try to use the value returned from a past call to next after calling next a second time.

This isn't true. See this playground for an example; Rust iterators are not "streaming iterators," so yielded references must live for the lifetime of the iterator, not just until the next .next() call.

So without streaming iterators (read: GAT), you can't have a rewindable iterators of unique references. You could maybe do it by the signature rewind(self) -> Self rather than rewind(&mut self).

On top of that, this would require the size of slice::Iter to increase by 2x for all users, whether they need on the fly rewinding or not. That's because slice::Iter only remembers the to-be-iterated part of the slice; it's a (begin, end) pointer pair. The only way to have full rewinding support built in would be to use a (slice: (ptr, len), begin, end) tuple.

So if you really want this, it'd have to be

  • a) an opt in extra cost wrapper, like Peekable
  • b) reverse-by-move, to invalidate any existing borrows

What I think you want is something more like RandomAccessIterator or an actual Cursor API, rather than something on top of iterators.

(EDIT: sorry about the multiple posts, the GH app was misbehaving.)

@jonhoo
Copy link
Contributor Author

jonhoo commented Apr 2, 2020

What I'm talking about is something like

let first = iter.next().unwrap();
let second = iter.previous().unwrap();
let third = iter.next().unwrap();

Here either first and second alias or first and third alias.

Ah, damn, you're right. In my head, I had it that the return type of next() was tied to the mutable borrow of &mut self, but that is indeed not true as you (and @CAD97) rightfully observe! Not sure why my brain thought that they were 😅 Hmm, that is unfortunate indeed. I guess I'll close this — sorry for the noise (and disagreement!). If someone does come up with a bright idea for how we might be able to get something like this soundly, I think that'd enable some cool high-performance use-cases, but I suppose it will remain a pipe-dream for now :)

What I think you want is something more like RandomAccessIterator or an actual Cursor API, rather than something on top of iterators.

Yeah, I think you're right. You've convinced me it's sadly probably not possible to model this on top of Iterators.

Thanks for the quick feedback!

PS: @CAD97 it looks like your comment got duplicated with the GitHub stuttering going on atm :)

@jonhoo jonhoo closed this Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Proposals about collection APIs A-iterators Iterator related proposals & ideas T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants