Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace mempool with thread_local #200

Closed
wants to merge 1 commit into from

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Apr 10, 2016

This should result in a slight performance improvement for single-threaded code because it uses native thread IDs instead of TLS. It should also significantly improve performance with multiple threads since it no longer uses a mutex (35ns vs 5ns).

@BurntSushi
Copy link
Member

@Amanieu So, one of my concerns (that I didn't realize before when looking at thread_local) is that it introduces quite a few new dependencies. One of them in particular (void) doesn't build on Rust 1.3. I'm not sure if that's a problem. @alexcrichton Can you weigh in here? What are your thoughts on:

  1. Introducing a new dependency on thread-id libc, kernel32-sys, winapi-build, winapi, unreachable and void.
  2. Requiring Rust 1.6.

My thoughts on (1) is that the big dependencies are as close to fundamental as we can get in the ecosystem, so they are probably already present in a lot of projects. Does that mean we should be OK with them getting pulled in by regex? The others (thread-id, unreachable and void) all seem pretty small.

I'm not sure about (2). I don't think we've really ironed out a policy here regarding the minimum level of support for the rust-lang/rust-lang-nursery crates.

@BurntSushi
Copy link
Member

@Amanieu Thanks for doing this by the way! You beat me to it. :-)

@alexcrichton
Copy link
Member

I'm personally a fan of minimizing dependencies (e.g. a crate like void seems much easier to write inline), but regex already has a lot of dependencies so there's probably not much benefit from trying to prune at this point.

@alexcrichton
Copy link
Member

Oh right, and for minimum Rust version, I still have yet to hear of anyone requiring an older stable release, so requiring 1.6 should be fine. We don't have a policy yet for rust-lang-nursery crates for either version compatibility or dependency lists, so doing the "most reasonable thing" for now should be fine :)

@BurntSushi
Copy link
Member

All right, I went ahead and merged this in 2cda56e (after rebasing on to master). I'm still bummed about the added dependencies, but @alexcrichton has a point---we're kind of already hosed there anyway.

As far as benchmarks go, it looks like you're right about it improving the single threaded case, if only by a tiny bit. Here's the comparison:

$ cargo-benchcmp rust rust-thread-local --threshold 3 
name                                    rust ns/iter          rust-thread-local ns/iter    diff ns/iter   diff %
misc::anchored_literal_long_match       25 (15,600 MB/s)      24 (16,250 MB/s)                       -1   -4.00%
misc::anchored_literal_long_non_match   16 (24,375 MB/s)      13 (30,000 MB/s)                       -3  -18.75%
misc::anchored_literal_short_match      25 (1,040 MB/s)       24 (1,083 MB/s)                        -1   -4.00%
misc::anchored_literal_short_non_match  16 (1,625 MB/s)       13 (2,000 MB/s)                        -3  -18.75%
misc::easy1_1MB                         81 (12,945,629 MB/s)  78 (13,443,538 MB/s)                   -3   -3.70%
misc::easy1_32                          78 (666 MB/s)         73 (712 MB/s)                          -5   -6.41%
misc::hard_1MB                          123 (8,525,227 MB/s)  118 (8,886,466 MB/s)                   -5   -4.07%
misc::hard_32                           100 (590 MB/s)        96 (614 MB/s)                          -4   -4.00%
misc::literal                           17 (3,000 MB/s)       16 (3,187 MB/s)                        -1   -5.88%
misc::match_class_in_range              30 (2,700 MB/s)       29 (2,793 MB/s)                        -1   -3.33%
sherlock::name_alt1                     38,948 (15,275 MB/s)  37,001 (16,078 MB/s)               -1,947   -5.00%
sherlock::name_holmes                   43,572 (13,654 MB/s)  45,114 (13,187 MB/s)                1,542    3.54%
sherlock::repeated_class_negation       96,858,914 (6 MB/s)   88,562,781 (6 MB/s)            -8,296,133   -8.57%
sherlock::the_upper                     45,890 (12,964 MB/s)  47,786 (12,449 MB/s)                1,896    4.13%

I suspect some of this is noise, but I also suspect there was some actual improvement.

Also, in my (currently hand run) grep benchmark, this decreased the number of instructions per search by about 5%. Very nice.

Of course, we aren't observing the (theoretically) biggest wins here since I don't yet have any multithreaded benchmarks that would measure the impact of contention.

In any case, thanks very much!

@BurntSushi BurntSushi closed this Apr 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants