Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitv improvements #7703

Closed
wants to merge 5 commits into from
Closed

Bitv improvements #7703

wants to merge 5 commits into from

Conversation

sfackler
Copy link
Member

Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha).

Derived clone for all Bitv types.

Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.

@emberian
Copy link
Member

Can you add tests for the iterator(s)?

@alexcrichton
Copy link
Member

This should also probably remove the ones iterator from Bitv, and out of curiosity could you also see what happens to the benchmarks for this change? There currently aren't any benchmarks for iteration, but they shouldn't be too bad to just iterate over small/large bit vectors.

@sfackler
Copy link
Member Author

I added some iterator benchmarks. It looks like these iterators are significantly slower than the old ones: https://gist.github.com/sfackler/058274387cf0bc90883d. I don't know if it matters, but if it does I can probably speed them up by opening some boxes. The crate as a whole needs a lot of work two switch over from internal to external iterators.

@alexcrichton
Copy link
Member

If these were a bit slower, I'd be fine with that, but being 3-5x worse seems like a bad thing to me. Normally bit vectors are are specifically to be fast, so if iteration is super slow it seems to defeat the purpose... Others may have different opinions though. If there's an obvious speedup in the future once external iterators are "more solid" then that's fine, but I'm not sure how this would get much better.

I'm kinda of the opinion that we shouldn't entirely switch from internal to external iterators. In some cases it's really nice and convenient, but it's not 100% necessary all the time.

@thestinger
Copy link
Contributor

@sfackler: which optimization level are you using?

@alexcrichton: They're not slower because they're external iterators. C++ uses external iterators exclusively, and LLVM is fully capable of making them zero-cost in all cases. I'm sure there's something else going on here, because there are no such performance losses when comparing the external iterators/adaptors on vectors to a counter with unchecked indexing.

@thestinger thestinger closed this Jul 14, 2013
@thestinger thestinger reopened this Jul 14, 2013
@sfackler
Copy link
Member Author

The benchmarks were compiled with -O. I'm traveling now, but I can take a deeper look at what's going on with the iterators tomorrow.

@sfackler
Copy link
Member Author

So I've done some digging and it looks like the performance problems may actually be an issue with the benchmark framework rather than any of the bitv code.

I wrote up a quick test: https://gist.github.com/sfackler/0efb2f23738898b9b938. I commented out the iter benchmarks and compiled and ran it under master without any of the bitv changes with these results:

test bench_bitv_big_each ... bench: 828 ns/iter (+/- 665)
test bench_bitv_small_each ... bench: 33 ns/iter (+/- 14)

I then cherry-picked 128bdad onto master and un-deleted each from Bitv. I compiled the test with the iter benchmarks and ran it with these results:

test bench_bitv_big_each ... bench: 4008 ns/iter (+/- 2122)
test bench_bitv_small_each ... bench: 248 ns/iter (+/- 104)
test bench_btv_big_iter ... bench: 4015 ns/iter (+/- 2117)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 89)

Keep in mind that all 128bdad did was add an iter function to Bitv and BitvSet and corresponding iterator types. I checked the disassembly of the two benchmarks and it turns out that bench_bitv_small_{each,iter} compiled down to the exact same instruction sequence: https://gist.github.com/sfackler/8cce21f68c222c4dcb37.

Here's the runtimes with the other changes (deriving Clone and removing indirection):

test bench_bitv_big_each ... bench: 3830 ns/iter (+/- 4244)
test bench_bitv_small_each ... bench: 235 ns/iter (+/- 256)
test bench_btv_big_iter ... bench: 3757 ns/iter (+/- 1722)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 150)

The disassembly is more or less the same with one less call to free: https://gist.github.com/sfackler/6cf1c8d5fd7fc0cf53cb

I have a change that removes ones that's compiling right now. I'll push up in a bit.

@alexcrichton
Copy link
Member

Oh wow, that's nice to hear! Although I'm a bit dubious because all of those +/- values (of the very last set of results) are at least half of the actual value. Those look like they're really noisy benchmarks, but if it's the same before/after then I'm not complaining.

@alexcrichton
Copy link
Member

Although am I reading this correctly in that there is either a regression somewhere in master or that when you added iter it "slowed down" each too?

@sfackler
Copy link
Member Author

I think it has to be an issue with the benchmark backend. The version of bench_bitv_small_each compiled on master is instruction-for-instruction identical to the version compiled on 128bdad.

@alexcrichton
Copy link
Member

Ah, sounds good to me then.

@graydon
Copy link
Contributor

graydon commented Jul 17, 2013

Stack crossing? Which platform? Can you pin to a cpu with taskset? The bench backend is definitely still a work in progress. Any further details appreciated. There are queued fixes for it incoming soon.

@sfackler
Copy link
Member Author

Stack crossing?

I'm runnning 64 bit Linux through Virtual Box off of a Windows 7 host. IIRC, I was also getting the same benchmark results on OSX as well. I get the same results pinning the tests to a core with taskset.

@pnkfelix
Copy link
Member

@sfackler In the past, Graydon has used the term "stack crossing" to refer to the overhead if you happen to get unlucky and inject a change that causes us to need to allocate more stack chunks than usual. I think its synonymous with "stack thrashing"

Note that in the IRC log linked above, the suggestion was to play with the setting for RUST_MIN_STACK to see if it makes such performance problems go away; then you'd have some evidence that stack thrashing is to blame.

@sfackler
Copy link
Member Author

Doesn't look like that was it:

~ > env RUST_MIN_STACK=100000000 ./test --bench

running 4 tests
test bench_bitv_big_each ... bench: 3731 ns/iter (+/- 1458)
test bench_bitv_small_each ... bench: 229 ns/iter (+/- 126)
test bench_btv_big_iter ... bench: 3754 ns/iter (+/- 1850)
test bench_btv_small_iter ... bench: 248 ns/iter (+/- 135)


fn size_hint(&self) -> (uint, Option<uint>) {
let rem = self.bitv.nbits - self.next_idx;
(rem, Some(rem))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the lower bound be 0 because all of the bits could be off?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcrichton it looks like this iterates over all the bits yielding false for 0 and true for 1, so it always traverses rem items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I thought this was the ones iterator.

@sfackler
Copy link
Member Author

I reverted the once removal.

@huonw
Copy link
Member

huonw commented Jul 21, 2013

@sfackler needs a rebase.

@sfackler
Copy link
Member Author

@huonw Done

@sfackler
Copy link
Member Author

@alexcrichton r?

@alexcrichton
Copy link
Member

I think in the future we may want to transition ones to an external iterator as well, but that doesn't have to happen right now. This is accumulating enough as is.

bors added a commit that referenced this pull request Jul 23, 2013
Switched Bitv and BitvSet to external iterators. They still use some internal iterators internally (ha).

Derived clone for all Bitv types.

Removed indirection in BitvVariant. It previously held a unique pointer to the appropriate Bitv struct, even though those structs are the size of a pointer themselves. BitvVariant is the same size (16 bytes) as it was previously.
@bors bors closed this Jul 23, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Oct 7, 2021
Implement `non_send_field_in_send_ty` lint

changelog: Implement [`non_send_fields_in_send_ty`] lint

Fixes rust-lang#7703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants