Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Cache HashMap keys in TLS #33318

Merged
merged 1 commit into from
May 20, 2016
Merged

Conversation

alexcrichton
Copy link
Member

This is a rebase and extension of #31356 where we not only cache the keys in
thread local storage but we also bump each key every time a new HashMap is
created. This should give us a nice speed bost in creating hash maps along with
retaining the property that all maps have a nondeterministic iteration order.

Closes #27243

@rust-highfive
Copy link
Collaborator

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

@sfackler
Copy link
Member

ping?

@briansmith
Copy link
Contributor

Why is it safe to key different HashMaps with keys that are known to each differ by only 1?

@eternaleye
Copy link
Contributor

eternaleye commented May 10, 2016

@briansmith: Because if it was not safe that would constitute a related-key attack, and none is known for SipHash?

@briansmith
Copy link
Contributor

briansmith: Because if it was not safe that would constitute a related-key attack, and none is known for SipHash?

  1. The hash function isn't necessarily SipHash. It's pluggable per-hashmap and also the default may be changed.
  2. I don't find "we don't currently know of a related-key attack on SipHash" very encouraging. There are lots of ways to derive N secret keys from an initial secret key that are fast and don't depend on the related-key-security of the hashing function.

@eternaleye
Copy link
Contributor

@briansmith: Fair enough on both points - Not being assured of SipHash means that it may well be an issue, and I agree that the margin of security even so is unnecessarily thin.

@sfackler
Copy link
Member

This logic is only used by the SipHash hasher.

@aturon
Copy link
Member

aturon commented May 10, 2016

r=me on the code and perf front. But I don't feel qualified to judge the security/DDoS protection side of this.

@briansmith
Copy link
Contributor

But I don't feel qualified to judge the security/DDoS protection side of this.

Others have quoted this, but again, this is what the SipHash paper says:

On startup a program reads a secret SipHash key from the operating system’s cryptographic random-number generator; the program then uses SipHash for all of its hash tables.

Thus, the easiest thing to do is what the SipHash authors recommend: Use std::sync::Once to generate one key using the OsRng, and then use that same key for every hash table.

Think about the threat model: We assume that the attacker can add or remove arbitrary (key, value) entries from any hash table used in the program. From this, it follows that we assume the attacker can change any hash table A into any other hash table B by removing all the items from A and then copying all the entries from B into A. Thus, it seems to not help if A or B have different keys, at least under this threat model. If you have a different threat model, it would be a good idea to document it.

More generally, crypto people never generate a secret key by adding a constant value to another secret key. See https://en.wikipedia.org/wiki/Related-key_attack for an introduction to why. tl;dr: Knowing the difference of two secrets can help an attacker find the value of one (usually both) secrets, even if they wouldn't be able to find the values any other way. Because no crypto people would do this, it is unlikely that somebody will seriously study the problems that may or may not occur when somebody does what is proposed in this PR because we generally assume it is a-priori wrong to do.

HTH.

@aturon
Copy link
Member

aturon commented May 17, 2016

@briansmith Thanks for the comment! That does indeed help highlight some of the tradeoffs here.

AIUI, the motivation for using these distinct (but related) keys is just to avoid clients of the default hashmap from accidentally assuming that all instances share a common key -- a behavior we could conceivably want to change in the future. But it could easily be that this cure is worse than the disease, and we'd be better off just very clearly documenting that you cannot rely on the apparent determinism. We just risk de facto lock-in to that behavior, but that seems (to me) better than taking a step that could easily end up revealing hashmap keys.

@rust-lang/libs Thoughts here?

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 18, 2016
@alexcrichton
Copy link
Member Author

Yeah I'm not too worried about switching to a per-process key with the risk of relying on a per-process deterministic iteration order. It's just a "nice to have" to make everything nondeterministic really I think.

@huonw
Copy link
Member

huonw commented May 18, 2016

It seems to me that we could have this per-thread without the adjustment, but maybe having inter-thread differences isn't worth the slightly higher complexity vs. just being uniform through a process.

@aturon
Copy link
Member

aturon commented May 19, 2016

Per-thread sounds like a good compromise across the board. @alexcrichton, want to update accordingly?

This is a rebase and extension of rust-lang#31356 where we cache the keys in thread local
storage. This should give us a nice speed bost in creating hash maps along with
mostly retaining the property that all maps have a nondeterministic iteration
order.

Closes rust-lang#27243
@alexcrichton
Copy link
Member Author

Sounds like a plan to me, I've updated the PR, the comment, and I also tweaked to use OsRng directly. The thread_rng isn't necesssary when we cache per-thread b/c we're gonna hit OsRng for the first time on each thread anyway.

@aturon
Copy link
Member

aturon commented May 20, 2016

Thanks!

@bors: r+

@bors
Copy link
Contributor

bors commented May 20, 2016

📌 Commit eaeef3d has been approved by aturon

@bors
Copy link
Contributor

bors commented May 20, 2016

⌛ Testing commit eaeef3d with merge 179539f...

bors added a commit that referenced this pull request May 20, 2016
std: Cache HashMap keys in TLS

This is a rebase and extension of #31356 where we not only cache the keys in
thread local storage but we also bump each key every time a new `HashMap` is
created. This should give us a nice speed bost in creating hash maps along with
retaining the property that all maps have a nondeterministic iteration order.

Closes #27243
@bors bors merged commit eaeef3d into rust-lang:master May 20, 2016
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label May 21, 2016
@alexcrichton alexcrichton deleted the hashmap-seed branch May 25, 2016 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants