Add `typeseed=e` parameter to `@auto_hash_equals`. #44

gafter · 2023-08-18T22:50:58Z

Specifying the "type seed"

When we compute the hash function, we start with a "seed" particular to the type being hashed. You can select the seed to be used by specifying typeseed=e.

The seed provided (e) is used in one of two ways, depending on the setting for typearg.
If typearg=false (the default), then e will be used as the type seed.
If typearg=true, then e(t) is used as the type seed, where t is the type of the object being hashed.

This PR also adds a default type seed that is stable from 1.6 through 1.10.

codecov · 2023-08-18T22:52:36Z

Codecov Report

❗ No coverage uploaded for pull request base (master@81ebfb3). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@            Coverage Diff            @@
##             master      #44   +/-   ##
=========================================
  Coverage          ?   92.50%           
=========================================
  Files             ?        3           
  Lines             ?      280           
  Branches          ?        0           
=========================================
  Hits              ?      259           
  Misses            ?       21           
  Partials          ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/impl.jl

is a value in one case and a function in another, depending on the value of `typearg`.

test/runtests.jl

gafter · 2023-08-19T20:27:04Z

@mcmcgrath13 I think I have addressed all of your concerns.

ORBAT · 2023-08-20T23:17:48Z

Why not change the default "type seed" to be something more sensible if hashing types is generally fraught with problems? I'd rather have more correct behavior out of the box than have to specify a "type seed" separately for every type I use this with.

gafter · 2023-08-21T15:06:28Z

@ORBAT If we change the default then it would be a breaking change (i.e. change the computed hash value) for all clients. The current default seed is the hash of the symbol which is the type's name. Assuming the hash of a Symbol is stable, which it appears to be, that is stable and only collides for types with the same simple name. So I think the default is a reasonable default. Having said that, some clients might want a different default, and that is why this option is being provided.

README.md

test/runtests.jl

mcmcgrath13

outside of the few comments left, looks good to me!

ORBAT · 2023-08-21T21:04:24Z

@gafter ah true, the type is only hashed when typearg=true.

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}, to be consistent with how the hash is seeded in other cases? Especially if the assumption is that people rely on the hashes being stable between package versions? At least Base.hash(MyType{Vector{Int}}) will change every time MyType is recompiled (or Vector's hash changes), and the same would apply to any custom hash function that does something like what Base.hash does (or outright uses it). hashfn(:MyType, ...) would be stable as long as hashfn(::Symbol) is stable. Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

gafter · 2023-08-21T23:16:35Z

@ORBAT That's a great idea. Let me think about whether it should be done in this PR or separately (before registering a new version).

gafter · 2023-08-24T00:25:15Z

@ORBAT @mcmcgrath13 I've added a new commit to this PR that implements @ORBAT's suggestion. Please have a look!

src/type_key.jl

README.md

src/type_seed.jl

mcmcgrath13 · 2023-08-24T18:54:41Z

src/type_seed.jl

+"""
+    type_seed(x)
+
+Computes a value to use as a seed for computing the hash value of a type.


@NHDaly could you have someone take a look at the implementation in this file? It makes sense to me, but I'm not sure if there are edges/assumptions/things I'm not aware of here

@comnik Could you please look at this and give your thoughts?

@Sacha0 is the most expert expert on this topic that I know of, though he isn't working on this type of thing right now.

I'll take a look tomorrow, sorry I wasn't able to follow the discussion here so far.

The main thing we're looking for is your thoughts on this approach to computing a stable type seed. The type seed is now stable by default rather than using the unstable Base.hash(type).

So I have to say I'm not really an expert in this, but I read the PR and the approach seems sensible to me. Being sensitive to the fully qualified name by default seems fine as well, as long as we have the ability to overwrite the type seed after refactorings.

For use in the RAI code base I am worried about the performance implications, see my two other comments.

src/type_seed.jl

test/runtests.jl

mcmcgrath13

Looks good to me! I'd like a second set of eyes on the typeseed implementation before merging ideally

src/type_seed.jl

gafter · 2023-08-24T22:26:39Z

But still, would it be worthwhile in those cases to seed the hash with something along the lines of hashfn(:MyType, hashfn(:Vector, hashfn(:Int))) for the type MyType{Vector{Int}}

That is exactly what typearg turns on or off. When it is off, we definitely do not want to include the type arguments in the hash, otherwise objects that test equal would have different hashes, which violates their invariants.

Seems like least cases where hashfn=Base.hash typearg=true could change the default seed, as that'd arguably be a bug fix and not a breaking change, as any guarantees about stability are out with those options anyhow?

We do change the seed in that case, based on your advice. With this PR, the seed doesn't ever use the specified hash function, but uses the new type_seed function.

comnik · 2023-08-26T20:55:23Z

src/impl.jl

+        hash_init =
+            if isnothing(typeseed)
+                if typearg
+                    :($type_seed($full_type_name, h))


Could we mix in h via addition and pull out the type_seed computation to macro runtime?

No, because we don't know the runtime type until runtime.

Oh right, in some performance sensitive places we eval to get the runtime argument types, but here we're literally in the process of defining the type 😅

We also don't know the (runtime) values of the type parameters.

comnik · 2023-08-26T20:56:12Z

src/type_seed.jl

+    return h
+end
+
+function type_seed(t::DataType, h::UInt)


This implementation seems conceptually good, but do you know how much costlier it is compared to Base.hash(type)? Although I only care because it seems like we run this every time we hash an instance of an @auto_hash_equals type, see my other comment above.

No, I don't know. But since Base.hash isn't stable, that isn't really an option, is it?

This isn't used by default. It is only used if you ask the type to be included but don't provide your own type seed function.

Ok I followed up on Slack with a clarification question regarding our internal use

…als.jl into typeseed

Add typeseed=e parameter to @auto_hash_equals.

e2cbde5

gafter requested review from comnik and mcmcgrath13 August 18, 2023 22:53

mcmcgrath13 reviewed Aug 19, 2023

View reviewed changes

src/impl.jl Show resolved Hide resolved

gafter marked this pull request as ready for review August 19, 2023 00:18

mcmcgrath13 reviewed Aug 19, 2023

View reviewed changes

src/impl.jl Outdated Show resolved Hide resolved

Add a sentence to the doc to clarify that typeseed

d1f4e84

is a value in one case and a function in another, depending on the value of `typearg`.

mcmcgrath13 reviewed Aug 19, 2023

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

Undo unnnecessary changes to runtests.jl

505e86f

gafter mentioned this pull request Aug 19, 2023

hash(Type) is not stable across runs #39

Closed

This was linked to issues Aug 19, 2023

hash(Type) is not stable across runs #39

Closed

Add a way to specify the type seed #42

Closed

mcmcgrath13 reviewed Aug 21, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 21, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 21, 2023

View reviewed changes

test/runtests.jl Show resolved Hide resolved

mcmcgrath13 approved these changes Aug 21, 2023

View reviewed changes

Minor changes per PR review.

82ad467

Implement a stable hash function for types for the type seed.

fa4cf4b

gafter requested a review from mcmcgrath13 August 24, 2023 00:25

Rename type_key to type_seed (the default type seed function)

cfc3d2f

ORBAT reviewed Aug 24, 2023

View reviewed changes

src/type_key.jl Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 24, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 24, 2023

View reviewed changes

src/type_seed.jl Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 24, 2023

View reviewed changes

src/type_seed.jl Outdated Show resolved Hide resolved

mcmcgrath13 reviewed Aug 24, 2023

View reviewed changes

test/runtests.jl Outdated Show resolved Hide resolved

mcmcgrath13 approved these changes Aug 24, 2023

View reviewed changes

gafter commented Aug 24, 2023

View reviewed changes

src/type_seed.jl Show resolved Hide resolved

gafter added 2 commits August 24, 2023 12:09

Minor changes per review.

80c043c

Improve the docs.

2fd38e8

gafter added 3 commits August 24, 2023 17:33

Make type_seed stable going back to Julia 1.6.

2f1da19

Fix some type seed stability issues.

5d7a78c

A couple more tests.

3e2ef40

comnik reviewed Aug 26, 2023

View reviewed changes

comnik approved these changes Aug 26, 2023

View reviewed changes

Merge branch 'master' of https://github.com/JuliaServices/AutoHashEqu…

4a7b692

…als.jl into typeseed

gafter merged commit 8967719 into master Aug 29, 2023
7 checks passed

NHDaly deleted the typeseed branch October 25, 2023 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `typeseed=e` parameter to `@auto_hash_equals`. #44

Add `typeseed=e` parameter to `@auto_hash_equals`. #44

gafter commented Aug 18, 2023 •

edited

Loading

codecov bot commented Aug 18, 2023 •

edited

Loading

gafter commented Aug 19, 2023

ORBAT commented Aug 20, 2023

gafter commented Aug 21, 2023

mcmcgrath13 left a comment

ORBAT commented Aug 21, 2023

gafter commented Aug 21, 2023

gafter commented Aug 24, 2023

mcmcgrath13 Aug 24, 2023

gafter Aug 24, 2023

NHDaly Aug 24, 2023

comnik Aug 24, 2023

gafter Aug 25, 2023 •

edited

Loading

comnik Aug 26, 2023

mcmcgrath13 left a comment

gafter commented Aug 24, 2023 •

edited

Loading

comnik Aug 26, 2023

gafter Aug 26, 2023

comnik Aug 26, 2023

gafter Aug 29, 2023

comnik Aug 26, 2023

gafter Aug 26, 2023

comnik Aug 26, 2023

Add typeseed=e parameter to @auto_hash_equals. #44

Add typeseed=e parameter to @auto_hash_equals. #44

Conversation

gafter commented Aug 18, 2023 • edited Loading

Specifying the "type seed"

codecov bot commented Aug 18, 2023 • edited Loading

Codecov Report

gafter commented Aug 19, 2023

ORBAT commented Aug 20, 2023

gafter commented Aug 21, 2023

mcmcgrath13 left a comment

Choose a reason for hiding this comment

ORBAT commented Aug 21, 2023

gafter commented Aug 21, 2023

gafter commented Aug 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gafter Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcmcgrath13 left a comment

Choose a reason for hiding this comment

gafter commented Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `typeseed=e` parameter to `@auto_hash_equals`. #44

Add `typeseed=e` parameter to `@auto_hash_equals`. #44

gafter commented Aug 18, 2023 •

edited

Loading

codecov bot commented Aug 18, 2023 •

edited

Loading

gafter Aug 25, 2023 •

edited

Loading

gafter commented Aug 24, 2023 •

edited

Loading