Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify HashMap Bucket interface #40561

Merged
merged 1 commit into from
Apr 6, 2017
Merged

Conversation

arthurprs
Copy link
Contributor

@arthurprs arthurprs commented Mar 15, 2017

Simplify HashMap Bucket interface

  • Store capacity_mask instead of capacity
  • Move bucket index into RawBucket
  • Valid bucket index is now always within [0..table_capacity)
  • Simplify iterators by moving logic into RawBuckets
  • Clone RawTable using RawBucket
  • Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt 
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff % 
 clone_10_000                74,364         39,736              -34,628  -46.57% 
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32% 
 grow_10_000                 817,825        723,958             -93,867  -11.48% 
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78% 
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51% 
 insert_1000                 74,546         58,343              -16,203  -21.74% 
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50% 
 insert_10_000               798,079        719,123             -78,956   -9.89% 
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99% 
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43% 
 insert_str_10_000           316,179        278,896             -37,283  -11.79% 
 insert_string_10_000        770,927        747,449             -23,478   -3.05% 
 iter_keys_100_000           386,099        333,104             -52,995  -13.73% 
 iterate_100_000             387,320        355,707             -31,613   -8.16% 
 lookup_100_000              206,757        193,063             -13,694   -6.62% 
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94% 
 lookup_1_000_000            206,456        205,716                -740   -0.36% 
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59% 
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68% 
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%

Note 2: I may have messed up porting the diff, let's see what CI says.

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -126,9 +126,10 @@ unsafe impl<K: Send, V: Send> Send for RawTable<K, V> {}
unsafe impl<K: Sync, V: Sync> Sync for RawTable<K, V> {}

pub struct RawBucket<K, V> {
hash: *mut HashUint,
_hash: *mut HashUint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this renamed? If it was to disambiguate with the method, I think hash_ would be more appropriate than _hash, since the standard practice is to use prefix _ to denote "unused."

Copy link
Contributor Author

@arthurprs arthurprs Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for disambiguation. I think we can come up with a better name instead, moving the _ to the end hurts the eye. Like hash_start, hash_base, first_hash?

fn first_bucket_raw(&self) -> RawBucket<K, V> {
let hashes_size = self.capacity * size_of::<HashUint>();
let pairs_size = self.capacity * size_of::<(K, V)>();
fn raw_bucket(&self) -> RawBucket<K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a worse name, since it isn't clear which raw bucket is returned. Maybe first_raw_bucket or raw_bucket_first or stick with the original first_bucket_raw (presumably paired with bucket_at_raw).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch callers to raw_bucket_at(0).

pair: self.pair.offset(count),
_marker: marker::PhantomData,
}
unsafe fn hash(&self) -> *mut HashUint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment noting that these functions are unsafe because you can safely create out-of-bounds buckets?

// We use *const to ensure covariance with respect to K and V
pair: *const (K, V),
_pair: *const (K, V),
idx: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always within [0..table_capacity), could you add a comment mentioning that.

RevMoveBuckets {
raw: raw_bucket.offset(self.capacity as isize),
hashes_end: raw_bucket.hash,
raw: self.raw_bucket_at(self.capacity()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said in your description that bucket indices are in [0..table_capacity), but this is one past that. Did you mean [0..table_capacity]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally set to one past the end for the reverse style iterator. I will add a comment line on that.

marker: marker::PhantomData,
}
}
}


impl<'a, K, V> Iterator for RawBuckets<'a, K, V> {
type Item = RawBucket<K, V>;
type Item = (*mut HashUint, *mut (K, V));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to have an iterator named RawBuckets that doesn't return RawBuckets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, hopefully that won't affect performance negatively.

if self.elems_left == 0 {
return None;
}
loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could there be a blank line before this loop?

(self.elems_left, Some(self.elems_left))
}
}
impl<'a, K, V> ExactSizeIterator for RawBuckets<'a, K, V> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could there be a blank line before this impl.

@alexcrichton
Copy link
Member

@bors: delegate=gereeter

@bors
Copy link
Contributor

bors commented Mar 16, 2017

✌️ @gereeter can now approve this pull request

@arthurprs
Copy link
Contributor Author

Thanks for the review @gereeter. I'll updated the PR later.

@bors
Copy link
Contributor

bors commented Mar 19, 2017

☔ The latest upstream changes (presumably #40538) made this pull request unmergeable. Please resolve the merge conflicts.

@arthurprs
Copy link
Contributor Author

Rebased.

@alexcrichton
Copy link
Member

cc @bluss, @pczarn

@bors
Copy link
Contributor

bors commented Mar 25, 2017

☔ The latest upstream changes (presumably #40806) made this pull request unmergeable. Please resolve the merge conflicts.

@arthurprs arthurprs force-pushed the hm-adapt2 branch 2 times, most recently from e190879 to 39802b2 Compare March 25, 2017 22:13
@arthurprs
Copy link
Contributor Author

arthurprs commented Apr 4, 2017

(gentle ping)

@@ -503,7 +498,7 @@ fn robin_hood<'a, K: 'a, V: 'a>(bucket: FullBucketMut<'a, K, V>,
loop {
displacement += 1;
let probe = bucket.next();
debug_assert!(probe.index() != idx_end);
debug_assert!(probe.index() != start_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this invariant not as strict as the old one? Either way it's ok. You could add a comment that this assertion is simplified.

Copy link
Contributor Author

@arthurprs arthurprs Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. I'll updat the code to do the same check as before.

Copy link
Contributor

@pczarn pczarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Please make a rebase

@arthurprs arthurprs force-pushed the hm-adapt2 branch 2 times, most recently from 3451af6 to fdb88fe Compare April 4, 2017 17:37
@arthurprs
Copy link
Contributor Author

Updated with @pczarn review.

* Store capacity_mask instead of capacity
* Move bucket index into RawBucket
* Bucket index is now always within [0..table_capacity)
* Clone RawTable using RawBucket
* Simplify iterators by moving logic into RawBuckets
* Make retain aware of the number of elements
@alexcrichton
Copy link
Member

@bors: r=pczarn

@bors
Copy link
Contributor

bors commented Apr 5, 2017

📌 Commit f07ebd6 has been approved by pczarn

@alexcrichton
Copy link
Member

@bors: delegate=pczarn

@bors
Copy link
Contributor

bors commented Apr 5, 2017

✌️ @pczarn can now approve this pull request

@bors
Copy link
Contributor

bors commented Apr 5, 2017

⌛ Testing commit f07ebd6 with merge b181396...

@bors
Copy link
Contributor

bors commented Apr 5, 2017

💔 Test failed - status-appveyor

@alexcrichton
Copy link
Member

alexcrichton commented Apr 5, 2017 via email

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 5, 2017
Simplify HashMap Bucket interface

> Simplify HashMap Bucket interface
>
> * Store capacity_mask instead of capacity
> * Move bucket index into RawBucket
> * Valid bucket index is now always within [0..table_capacity)
> * Simplify iterators by moving logic into RawBuckets
> * Clone RawTable using RawBucket
> * Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

```
➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff %
 clone_10_000                74,364         39,736              -34,628  -46.57%
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32%
 grow_10_000                 817,825        723,958             -93,867  -11.48%
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78%
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51%
 insert_1000                 74,546         58,343              -16,203  -21.74%
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50%
 insert_10_000               798,079        719,123             -78,956   -9.89%
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99%
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43%
 insert_str_10_000           316,179        278,896             -37,283  -11.79%
 insert_string_10_000        770,927        747,449             -23,478   -3.05%
 iter_keys_100_000           386,099        333,104             -52,995  -13.73%
 iterate_100_000             387,320        355,707             -31,613   -8.16%
 lookup_100_000              206,757        193,063             -13,694   -6.62%
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94%
 lookup_1_000_000            206,456        205,716                -740   -0.36%
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59%
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68%
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%
```

Note 2: I may have messed up porting the diff, let's see what CI says.
arielb1 pushed a commit to arielb1/rust that referenced this pull request Apr 5, 2017
Simplify HashMap Bucket interface

> Simplify HashMap Bucket interface
>
> * Store capacity_mask instead of capacity
> * Move bucket index into RawBucket
> * Valid bucket index is now always within [0..table_capacity)
> * Simplify iterators by moving logic into RawBuckets
> * Clone RawTable using RawBucket
> * Make retain aware of the number of elements

The idea was to put idx in RawBucket instead of the other Bucket types and simplify next() and prev() as much as possible. The rest was a side-effect of that change, except maybe the last 2.

This change makes iteration and other next/prev() heavy operations noticeably faster. Clone is way faster.

```
➜  hashmap2 git:(adapt) ✗ cargo benchcmp pre:: adp:: bench.txt
 name                        pre:: ns/iter  adp:: ns/iter  diff ns/iter   diff %
 clone_10_000                74,364         39,736              -34,628  -46.57%
 grow_100_000                8,343,553      8,233,785          -109,768   -1.32%
 grow_10_000                 817,825        723,958             -93,867  -11.48%
 grow_big_value_100_000      18,418,979     17,906,186         -512,793   -2.78%
 grow_big_value_10_000       1,219,242      1,103,334          -115,908   -9.51%
 insert_1000                 74,546         58,343              -16,203  -21.74%
 insert_100_000              6,743,770      6,238,017          -505,753   -7.50%
 insert_10_000               798,079        719,123             -78,956   -9.89%
 insert_1_000_000            275,215,605    266,975,875      -8,239,730   -2.99%
 insert_int_bigvalue_10_000  1,517,387      1,419,838           -97,549   -6.43%
 insert_str_10_000           316,179        278,896             -37,283  -11.79%
 insert_string_10_000        770,927        747,449             -23,478   -3.05%
 iter_keys_100_000           386,099        333,104             -52,995  -13.73%
 iterate_100_000             387,320        355,707             -31,613   -8.16%
 lookup_100_000              206,757        193,063             -13,694   -6.62%
 lookup_100_000_unif         219,366        193,180             -26,186  -11.94%
 lookup_1_000_000            206,456        205,716                -740   -0.36%
 lookup_1_000_000_unif       659,934        629,659             -30,275   -4.59%
 lru_sim                     20,194,334     18,442,149       -1,752,185   -8.68%
 merge_shuffle               1,168,044      1,063,055          -104,989   -8.99%
```

Note 2: I may have messed up porting the diff, let's see what CI says.
@bors
Copy link
Contributor

bors commented Apr 5, 2017

⌛ Testing commit f07ebd6 with merge 1e10204...

@arielb1
Copy link
Contributor

arielb1 commented Apr 5, 2017

@bors retry - prioritizing rollup

bors added a commit that referenced this pull request Apr 6, 2017
Rollup of 12 pull requests

- Successful merges: #40479, #40561, #40709, #40815, #40909, #40927, #40943, #41015, #41028, #41052, #41054, #41065
- Failed merges:
@bors bors merged commit f07ebd6 into rust-lang:master Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants