Don't reuse RandomState seeds #37470

arthurprs · 2016-10-29T21:14:44Z

rust-highfive · 2016-10-29T21:15:00Z

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2016-10-30T10:01:59Z

Looks good to me, thanks @arthurprs! Want to run it by @rust-lang/libs just to be super sure though we're on board with this solution to #36481

rfcbot · 2016-10-30T10:36:16Z

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

sfackler · 2016-10-30T17:26:09Z

I would not describe this as a solution to #36481. Performance is still unnacceptable with non-default hashers. It does work around the issue in the common case, and we should land it though.

arthurprs · 2016-10-31T09:34:32Z

Do you think I need to reword the comment mentioning #36481?

alexcrichton · 2016-10-31T17:25:59Z

src/libstd/collections/hash/map.rs

            let r = rand::OsRng::new();
            let mut r = r.expect("failed to create an OS RNG");
-            (r.gen(), r.gen())
+            UnsafeCell::new((r.gen(), r.gen()))


I believe this UnsafeCell can be Cell, right?

True, I'll change it.

alexcrichton · 2016-10-31T17:26:25Z

The in-code comment is fine to me, but I changed Solves #... in the description to cc #...

brson · 2016-10-31T20:32:44Z

I agree this is a good and efficient interim solution. To be clear, how much confidence do we have that addition here is actually a mathematically valid way to ensure that merging two hashmaps won't collide? My understanding is that it is clearly better than the status quo, that we suspect that in practice it will prevent collisions, but there's not any mathematical basis for knowing it will be effective (i.e. some clever person may find a way to generate collisions by incrementing the initialization value).

brson · 2016-10-31T20:38:12Z

Here are the benchmarks comparing this approach of incrementing this value to completely regenerating it. This approach is 2ns vs 6-15ns.

arthurprs · 2016-10-31T20:45:57Z

The fix relies on the backing hash function being high-quality, as in "Every input bit should affect every output bit about ~50% of the time" (seed is also an input).

rfcbot · 2016-11-03T23:55:28Z

🔔 This is now entering its final comment period, as per the review above. 🔔

psst @alexcrichton, I wasn't able to add the final-comment-period label, please do so.

alexcrichton · 2016-11-03T23:58:02Z

@bors: r+

bors · 2016-11-03T23:58:03Z

📌 Commit eba93c3 has been approved by alexcrichton

Don't reuse RandomState seeds cc rust-lang#36481

bors · 2016-11-05T05:23:03Z

⌛ Testing commit eba93c3 with merge b243964...

bors · 2016-11-05T05:25:48Z

💔 Test failed - auto-win-msvc-64-opt-rustbuild

alexcrichton · 2016-11-05T05:39:12Z

@bors: retry

On Fri, Nov 4, 2016 at 10:25 PM, bors notifications@github.com wrote:

💔 Test failed - auto-win-msvc-64-opt-rustbuild
https://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/2925

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37470 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD95DLFh58rBk3IDkcYSThEg1WQzM1Sks5q7BNegaJpZM4KkOFr
.

Rollup of 24 pull requests - Successful merges: #37255, #37317, #37408, #37410, #37422, #37427, #37470, #37501, #37537, #37556, #37557, #37564, #37565, #37566, #37569, #37574, #37577, #37579, #37583, #37585, #37586, #37587, #37589, #37596 - Failed merges: #37521, #37547

bors · 2016-11-05T11:32:04Z

⌛ Testing commit eba93c3 with merge cae6ab1...

Don't reuse RandomState seeds cc #36481

bors · 2016-11-05T14:55:44Z

Don't reuse RandomState seeds cc rust-lang#36481

Rollup of 24 pull requests - Successful merges: #37255, #37317, #37408, #37410, #37422, #37427, #37470, #37501, #37537, #37556, #37557, #37564, #37565, #37566, #37569, #37574, #37577, #37579, #37583, #37585, #37586, #37587, #37589, #37596 - Failed merges: #37521, #37547

pczarn · 2016-11-13T19:49:54Z

@arthurprs The seed is not an input. The seed describes a choice of a function from a family of functions.

Regardless, the seed and the input are mixed in a similar way. Perhaps the solution is effective. I don't know cryptography. I can only read what cryptographers have explained.

Manishearth · 2016-11-24T02:16:22Z

Shouldn't Clone also reinitialize the random state? We should never be copying the random state afaict.

funny-falcon · 2016-11-24T05:16:04Z

But what if change order of iteration just will fix original issue?

We can use just simplest lcg generator for walking order:

  // Pseudo code for iterator
  if (hash.size == 0) return
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i *5 + 1) & hash.mask
  } until (i != 0) /* returned to first position, so we walked all hash */

funny-falcon · 2016-11-24T05:36:39Z

@sfackler points out that given output of iteration one may construct new input values in desired order to DoS the application.

Ok, lcg is very tunable, so we can show different order on every iteration:

  // Pseudo code for iterator
  if (hash.size == 0) return
  ptis = ptis * 5 + 1 //per_thread_iteration_seed
  delta := ptis * 2 + 1
  mult := (ptis ^ (ptis>>16)) * 4 + 1
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i * mult + delta) & hash.mask
   } until (i != 0)

arthurprs · 2016-11-24T07:46:12Z

@funny-falcon the problem with that approach is that the performance hit for iteration is possibly large. It should be easily measured though.

funny-falcon · 2016-11-24T07:52:08Z

@arthurprs you right: cache misses on big hashtables will destroy performance. So, my suggestion is not acceptable.

sacundim · 2016-11-26T02:51:45Z

Quoting the SipHash paper, pp. 5-6 (my boldface):

Note that the standard PRF and MAC security goals allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker. However, they do not allow access to any “leaked” information such as bits of the key or the internal state. They also do not allow “related keys”, “known keys”, “chosen keys”, etc.

In the patch attached to this pull request, the RandomState::new() function increments the second word of the key on each call. This means that RandomStates created in the same thread have related keys, which violates the preconditions of SipHash's security claims. Does anybody have an argument that SipHash achieves its security goal under these conditions?

One candidate solution that does not fall afoul of the random keys precondition would be:

Use the same seed for many maps.
Associate each map with unique ID that differs between all maps that share the same seed.
Instead of hashing inputs verbatim, prefix them with the map's unique ID.

This way the incrementing counter is part of the message, not the seed, which is fine because the PRF security claim allows the even more adverse case of "messages chosen adaptively by the attacker."

With thread-local seeds, the unique ID can be an additional u64 counter. Since SipHash supports incremental computation, one simple optimization is to precompute, cache and reuse the hasher's state after absorbing the prefixed counter. A further optimization would be to defer this step until the first time the map needs to hash an input so that creating empty maps does not pay the cost of that.

funny-falcon · 2016-11-26T07:48:55Z

I think, it is a time to ask Jean-Phillipe Aumasson @veorq : does SipHash allows incrementing key, or will it fail? even though attacker never sees whole output of hashsum (it can only guess low bits, and even those bits are not exact cause of collision resolution)?

veorq · 2016-11-26T09:47:49Z

As noted above we didn't make claims in the case of "related keys", but I haven't seen attack in the related-key model.

TL;DR: I think it's ok.

Details: IIUC the scenario here is the following:

at time t1 you compute SipHash( key, _)
at time t2 you'll compute SipHash( key + 1, _)
at time t2 you'll compute SipHash( key + 2, _)
etc.

The attacker can choose what messages are hashed. We can assume that the attacker sees the hashes of the messages with key + i for different i's and wants to predit SipHash(key + j, m) for some j>i. The attacker will only succeed if there's some relation between hashes of different messages (standard PRF security) or if there's some relation between hashes of different or identical messages with different keys (related-key PRF security).

Given the previously published cryptanalysis results on SipHash, I'm confident that SipHash will remain secure under related keys and as used in Rust.

@pczarn

Adaptive hashmap implementation All credits to @pczarn who wrote rust-lang/rfcs#1796 and contain-rs/hashmap2#5 **Background** Rust std lib hashmap puts a strong emphasis on security, we did some improvements in #37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481). This is a simplified version of rust-lang/rfcs#1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself. **Proposal** This PR adds code checking for extra long probe and shifts lengths (see code comments and rust-lang/rfcs#1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case. We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe. **Drawbacks** * May interact badly with poor hashers. Maps using those may not use the desired capacity. * It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches. * May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor. **Example** Example code that exploit the exposure of iteration order and weak hasher. ``` const MERGE: usize = 10_000usize; #[bench] fn merge_dos(b: &mut Bencher) { let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect(); let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect(); b.iter(|| { let mut merged = first_map.clone(); for (&k, &v) in &second_map { merged.insert(k, v); } ::test::black_box(merged); }); } ``` _91 is stdlib and _ad is patched (the end capacity in both cases is the same) ``` running 2 tests test _91::merge_dos ... bench: 47,311,843 ns/iter (+/- 2,040,302) test _ad::merge_dos ... bench: 599,099 ns/iter (+/- 83,270) ```

rust-highfive assigned alexcrichton Oct 29, 2016

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 30, 2016

alexcrichton reviewed Oct 31, 2016

View reviewed changes

Don't reuse RandomState seeds

eba93c3

arthurprs force-pushed the sip-smaller branch from d70340c to eba93c3 Compare October 31, 2016 20:12

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 31, 2016

sfackler mentioned this pull request Nov 3, 2016

Exposure of HashMap iteration order allows for O(n²) blowup. #36481

Open

sophiajt pushed a commit to sophiajt/rust that referenced this pull request Nov 4, 2016

Rollup merge of rust-lang#37470 - arthurprs:sip-smaller, r=alexcrichton

7e6799d

Don't reuse RandomState seeds cc rust-lang#36481

sophiajt mentioned this pull request Nov 4, 2016

Rollup of 17 pull requests #37581

Closed

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 4, 2016

Rollup merge of rust-lang#37470 - arthurprs:sip-smaller, r=alexcrichton

b0129c2

Don't reuse RandomState seeds cc rust-lang#36481

alexcrichton mentioned this pull request Nov 4, 2016

Rollup of 24 pull requests #37597

Merged

bors added a commit that referenced this pull request Nov 5, 2016

Auto merge of #37470 - arthurprs:sip-smaller, r=alexcrichton

cae6ab1

Don't reuse RandomState seeds cc #36481

bors merged commit eba93c3 into rust-lang:master Nov 5, 2016

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Nov 5, 2016

Rollup merge of rust-lang#37470 - arthurprs:sip-smaller, r=alexcrichton

1d41d5b

Don't reuse RandomState seeds cc rust-lang#36481

bluss mentioned this pull request Nov 26, 2016

Serialization performance regression #38021

Closed

arthurprs mentioned this pull request Dec 14, 2016

Adaptive hashmap implementation #38368

Merged

briansmith mentioned this pull request Jan 18, 2017

implement std::collections japaric/steed#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't reuse RandomState seeds #37470

Don't reuse RandomState seeds #37470

arthurprs commented Oct 29, 2016 •

edited by alexcrichton

Loading

rust-highfive commented Oct 29, 2016

alexcrichton commented Oct 30, 2016

rfcbot commented Oct 30, 2016 •

edited

Loading

sfackler commented Oct 30, 2016

arthurprs commented Oct 31, 2016

alexcrichton Oct 31, 2016

arthurprs Oct 31, 2016

alexcrichton commented Oct 31, 2016

brson commented Oct 31, 2016 •

edited

Loading

brson commented Oct 31, 2016

arthurprs commented Oct 31, 2016

rfcbot commented Nov 3, 2016

alexcrichton commented Nov 3, 2016

bors commented Nov 3, 2016

bors commented Nov 5, 2016

bors commented Nov 5, 2016

alexcrichton commented Nov 5, 2016

bors commented Nov 5, 2016

bors commented Nov 5, 2016

pczarn commented Nov 13, 2016

Manishearth commented Nov 24, 2016

funny-falcon commented Nov 24, 2016

funny-falcon commented Nov 24, 2016 •

edited

Loading

arthurprs commented Nov 24, 2016 •

edited

Loading

funny-falcon commented Nov 24, 2016

sacundim commented Nov 26, 2016 •

edited

Loading

funny-falcon commented Nov 26, 2016

veorq commented Nov 26, 2016

Don't reuse RandomState seeds #37470

Don't reuse RandomState seeds #37470

Conversation

arthurprs commented Oct 29, 2016 • edited by alexcrichton Loading

rust-highfive commented Oct 29, 2016

alexcrichton commented Oct 30, 2016

rfcbot commented Oct 30, 2016 • edited Loading

sfackler commented Oct 30, 2016

arthurprs commented Oct 31, 2016

alexcrichton Oct 31, 2016

Choose a reason for hiding this comment

arthurprs Oct 31, 2016

Choose a reason for hiding this comment

alexcrichton commented Oct 31, 2016

brson commented Oct 31, 2016 • edited Loading

brson commented Oct 31, 2016

arthurprs commented Oct 31, 2016

rfcbot commented Nov 3, 2016

alexcrichton commented Nov 3, 2016

bors commented Nov 3, 2016

bors commented Nov 5, 2016

bors commented Nov 5, 2016

alexcrichton commented Nov 5, 2016

bors commented Nov 5, 2016

bors commented Nov 5, 2016

pczarn commented Nov 13, 2016

Manishearth commented Nov 24, 2016

funny-falcon commented Nov 24, 2016

funny-falcon commented Nov 24, 2016 • edited Loading

arthurprs commented Nov 24, 2016 • edited Loading

funny-falcon commented Nov 24, 2016

sacundim commented Nov 26, 2016 • edited Loading

funny-falcon commented Nov 26, 2016

veorq commented Nov 26, 2016

arthurprs commented Oct 29, 2016 •

edited by alexcrichton

Loading

rfcbot commented Oct 30, 2016 •

edited

Loading

brson commented Oct 31, 2016 •

edited

Loading

funny-falcon commented Nov 24, 2016 •

edited

Loading

arthurprs commented Nov 24, 2016 •

edited

Loading

sacundim commented Nov 26, 2016 •

edited

Loading