Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typeseed=e parameter to @auto_hash_equals. #44

Merged
merged 12 commits into from
Aug 29, 2023
Merged

Add typeseed=e parameter to @auto_hash_equals. #44

merged 12 commits into from
Aug 29, 2023

Conversation

gafter
Copy link
Member

@gafter gafter commented Aug 18, 2023

Specifying the "type seed"

When we compute the hash function, we start with a "seed" particular to the type being hashed. You can select the seed to be used by specifying typeseed=e.

The seed provided (e) is used in one of two ways, depending on the setting for typearg.
If typearg=false (the default), then e will be used as the type seed.
If typearg=true, then e(t) is used as the type seed, where t is the type of the object being hashed.

This PR also adds a default type seed that is stable from 1.6 through 1.10.

@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@81ebfb3). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff            @@
##             master      #44   +/-   ##
=========================================
  Coverage          ?   92.50%           
=========================================
  Files             ?        3           
  Lines             ?      280           
  Branches          ?        0           
=========================================
  Hits              ?      259           
  Misses            ?       21           
  Partials          ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gafter gafter marked this pull request as ready for review August 19, 2023 00:18
src/impl.jl Outdated Show resolved Hide resolved
is a value in one case and a function in another,
depending on the value of `typearg`.
test/runtests.jl Outdated Show resolved Hide resolved
@gafter
Copy link
Member Author

gafter commented Aug 19, 2023

@mcmcgrath13 I think I have addressed all of your concerns.

@ORBAT
Copy link

ORBAT commented Aug 20, 2023

Why not change the default "type seed" to be something more sensible if hashing types is generally fraught with problems? I'd rather have more correct behavior out of the box than have to specify a "type seed" separately for every type I use this with.

@gafter
Copy link
Member Author

gafter commented Aug 21, 2023

@ORBAT If we change the default then it would be a breaking change (i.e. change the computed hash value) for all clients. The current default seed is the hash of the symbol which is the type's name. Assuming the hash of a Symbol is stable, which it appears to be, that is stable and only collides for types with the same simple name. So I think the default is a reasonable default. Having said that, some clients might want a different default, and that is why this option is being provided.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@mcmcgrath13 mcmcgrath13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside of the few comments left, looks good to me!

@ORBAT
Copy link

ORBAT commented Aug 21, 2023

@gafter ah true, the type is only hashed when typearg=true.

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}, to be consistent with how the hash is seeded in other cases? Especially if the assumption is that people rely on the hashes being stable between package versions? At least Base.hash(MyType{Vector{Int}}) will change every time MyType is recompiled (or Vector's hash changes), and the same would apply to any custom hash function that does something like what Base.hash does (or outright uses it). hashfn(:MyType, ...) would be stable as long as hashfn(::Symbol) is stable. Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

@gafter
Copy link
Member Author

gafter commented Aug 21, 2023

@ORBAT That's a great idea. Let me think about whether it should be done in this PR or separately (before registering a new version).

@gafter
Copy link
Member Author

gafter commented Aug 24, 2023

@ORBAT @mcmcgrath13 I've added a new commit to this PR that implements @ORBAT's suggestion. Please have a look!

src/type_key.jl Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/type_seed.jl Outdated Show resolved Hide resolved
"""
type_seed(x)

Computes a value to use as a seed for computing the hash value of a type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NHDaly could you have someone take a look at the implementation in this file? It makes sense to me, but I'm not sure if there are edges/assumptions/things I'm not aware of here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comnik Could you please look at this and give your thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sacha0 is the most expert expert on this topic that I know of, though he isn't working on this type of thing right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look tomorrow, sorry I wasn't able to follow the discussion here so far.

Copy link
Member Author

@gafter gafter Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing we're looking for is your thoughts on this approach to computing a stable type seed. The type seed is now stable by default rather than using the unstable Base.hash(type).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have to say I'm not really an expert in this, but I read the PR and the approach seems sensible to me. Being sensitive to the fully qualified name by default seems fine as well, as long as we have the ability to overwrite the type seed after refactorings.

For use in the RAI code base I am worried about the performance implications, see my two other comments.

src/type_seed.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
Copy link
Collaborator

@mcmcgrath13 mcmcgrath13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I'd like a second set of eyes on the typeseed implementation before merging ideally

@gafter
Copy link
Member Author

gafter commented Aug 24, 2023

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}

That is exactly what typearg turns on or off. When it is off, we definitely do not want to include the type arguments in the hash, otherwise objects that test equal would have different hashes, which violates their invariants.

Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

We do change the seed in that case, based on your advice. With this PR, the seed doesn't ever use the specified hash function, but uses the new type_seed function.

hash_init =
if isnothing(typeseed)
if typearg
:($type_seed($full_type_name, h))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we mix in h via addition and pull out the type_seed computation to macro runtime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we don't know the runtime type until runtime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, in some performance sensitive places we eval to get the runtime argument types, but here we're literally in the process of defining the type 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't know the (runtime) values of the type parameters.

return h
end

function type_seed(t::DataType, h::UInt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation seems conceptually good, but do you know how much costlier it is compared to Base.hash(type)? Although I only care because it seems like we run this every time we hash an instance of an @auto_hash_equals type, see my other comment above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't know. But since Base.hash isn't stable, that isn't really an option, is it?

This isn't used by default. It is only used if you ask the type to be included but don't provide your own type seed function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I followed up on Slack with a clarification question regarding our internal use

@gafter gafter merged commit 8967719 into master Aug 29, 2023
7 checks passed
@NHDaly NHDaly deleted the typeseed branch October 25, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a way to specify the type seed hash(Type) is not stable across runs
5 participants