-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add typeseed=e
parameter to @auto_hash_equals
.
#44
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #44 +/- ##
=========================================
Coverage ? 92.50%
=========================================
Files ? 3
Lines ? 280
Branches ? 0
=========================================
Hits ? 259
Misses ? 21
Partials ? 0 ☔ View full report in Codecov by Sentry. |
is a value in one case and a function in another, depending on the value of `typearg`.
@mcmcgrath13 I think I have addressed all of your concerns. |
Why not change the default "type seed" to be something more sensible if hashing types is generally fraught with problems? I'd rather have more correct behavior out of the box than have to specify a "type seed" separately for every type I use this with. |
@ORBAT If we change the default then it would be a breaking change (i.e. change the computed hash value) for all clients. The current default seed is the hash of the symbol which is the type's name. Assuming the hash of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outside of the few comments left, looks good to me!
@gafter ah true, the type is only hashed when But still, would it be worthwhile in those cases to seed the hash with something along the lines of |
@ORBAT That's a great idea. Let me think about whether it should be done in this PR or separately (before registering a new version). |
@ORBAT @mcmcgrath13 I've added a new commit to this PR that implements @ORBAT's suggestion. Please have a look! |
""" | ||
type_seed(x) | ||
|
||
Computes a value to use as a seed for computing the hash value of a type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NHDaly could you have someone take a look at the implementation in this file? It makes sense to me, but I'm not sure if there are edges/assumptions/things I'm not aware of here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comnik Could you please look at this and give your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sacha0 is the most expert expert on this topic that I know of, though he isn't working on this type of thing right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look tomorrow, sorry I wasn't able to follow the discussion here so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing we're looking for is your thoughts on this approach to computing a stable type seed. The type seed is now stable by default rather than using the unstable Base.hash(type)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have to say I'm not really an expert in this, but I read the PR and the approach seems sensible to me. Being sensitive to the fully qualified name by default seems fine as well, as long as we have the ability to overwrite the type seed after refactorings.
For use in the RAI code base I am worried about the performance implications, see my two other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'd like a second set of eyes on the typeseed implementation before merging ideally
That is exactly what
We do change the seed in that case, based on your advice. With this PR, the seed doesn't ever use the specified hash function, but uses the new |
hash_init = | ||
if isnothing(typeseed) | ||
if typearg | ||
:($type_seed($full_type_name, h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we mix in h
via addition and pull out the type_seed
computation to macro runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we don't know the runtime type until runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, in some performance sensitive places we eval
to get the runtime argument types, but here we're literally in the process of defining the type 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also don't know the (runtime) values of the type parameters.
return h | ||
end | ||
|
||
function type_seed(t::DataType, h::UInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation seems conceptually good, but do you know how much costlier it is compared to Base.hash(type)
? Although I only care because it seems like we run this every time we hash an instance of an @auto_hash_equals
type, see my other comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't know. But since Base.hash
isn't stable, that isn't really an option, is it?
This isn't used by default. It is only used if you ask the type to be included but don't provide your own type seed function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I followed up on Slack with a clarification question regarding our internal use
Specifying the "type seed"
When we compute the hash function, we start with a "seed" particular to the type being hashed. You can select the seed to be used by specifying
typeseed=e
.The seed provided (
e
) is used in one of two ways, depending on the setting fortypearg
.If
typearg=false
(the default), thene
will be used as the type seed.If
typearg=true
, thene(t)
is used as the type seed, wheret
is the type of the object being hashed.This PR also adds a default type seed that is stable from 1.6 through 1.10.