Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pyclass hash option #4206

Merged
merged 7 commits into from
Jun 1, 2024
Merged

add pyclass hash option #4206

merged 7 commits into from
Jun 1, 2024

Conversation

Icxolu
Copy link
Contributor

@Icxolu Icxolu commented May 25, 2024

For more dataclass like usage and similar to #4202, this adds a hash pyclass option to implement the __hash__ slot using the type's Hash implementation.

@Icxolu Icxolu force-pushed the hash branch 3 times, most recently from 981ec46 to df991b6 Compare May 25, 2024 20:52
davidhewitt
davidhewitt previously approved these changes May 25, 2024
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me! I love to see this, I've wanted this for a while and never got around to it 👍

@davidhewitt
Copy link
Member

Actually, one thought - should we make this only available with #[pyclass(frozen)]? Otherwise I can foresee accidents where users mutate data and corrupt their hash maps.

(If they really want a __hash__ on a mutable pyclass, they then just add it manually the old way.)

@davidhewitt davidhewitt dismissed their stale review May 26, 2024 08:01

Interaction with frozen is worth discussing now

@davidhewitt davidhewitt mentioned this pull request May 26, 2024
5 tasks
@Icxolu
Copy link
Contributor Author

Icxolu commented May 26, 2024

Actually, one thought - should we make this only available with #[pyclass(frozen)]? Otherwise I can foresee accidents where users mutate data and corrupt their hash maps.

Given that it can be implemented trivially if needed, I think that makes sense to take the safe route here 👍.

On that note, should we also require #[pyclass(eq)] (once we have that) or is that not needed?

@davidhewitt
Copy link
Member

davidhewitt commented May 26, 2024

On that note, should we also require #[pyclass(eq)] (once we have that) or is that not needed?

Great question. So, quoting the Python docs:

If a class does not define an eq() method it should not define a hash() operation either

I guess that wording is strong enough that we should indeed be encouraging users to define eq if they want hash? So perhaps yes, we should require eq, especially given that this is designed to be a convenience for correct implementations and the #[pymethods] escape hatch for weird cases will remain.

I'm happy to either block this PR on landing eq first or proceed with this PR and add a note in #4207 that we should require this later.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feel free to decide what route to go down, also just one suggestion on wording...)

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
@Icxolu
Copy link
Contributor Author

Icxolu commented May 26, 2024

I think I might just implement the eq option, so we can merge this in the proper order.

@Icxolu
Copy link
Contributor Author

Icxolu commented May 31, 2024

Rebased. hash now requires eq in addition to frozen. I've also changed the generated slot the same way as in eq, so that manual __hash__ with the hash option errors.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks brilliant thank you! I have one small thought about a possible combined check for both eq and frozen at the same time, otherwise this looks ready to me 🎉

) -> Result<(Option<syn::ImplItemFn>, Option<MethodAndSlotDef>)> {
if options.hash.is_some() {
ensure_spanned!(options.frozen.is_some(), options.hash.span() => "The `hash` option requires the `frozen` option.");
ensure_spanned!(options.eq.is_some(), options.hash.span() => "The `hash` option requires the `eq` option.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these error messages I think it's quite likely users might hit the frozen error, fix that, and then hit the eq error. If you agree, it might be nice if we also special-case if both are missing and emit an error message requesting both options so users can fix in a single compile.

Or we could keep it simple and just always check for both in one ensure_spanned and just emit the combined message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I added a variant to the ensure_spanned macro that evaluates multiple branches and returns all errors.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant 💯

@davidhewitt davidhewitt added this pull request to the merge queue Jun 1, 2024
Merged via the queue into PyO3:main with commit a7a5c10 Jun 1, 2024
41 checks passed
@Icxolu Icxolu deleted the hash branch June 1, 2024 15:02
@Stargateur
Copy link

This option is only available for frozen classes to prevent accidental hash changes from mutating the object. If you need
an __hash__ implementation for a mutable class, use the manual method from above.

I don't get it, how accidental ??? If I mutate an object I expect the hash change. This seriously limit the use of this automatic derive.

@davidhewitt
Copy link
Member

Mutating an object hash in Python is usually a mistake - e.g. because it breaks dictionaries. From the Python docs:

If a class does not define an eq() method it should not define a hash() operation either; if it defines eq() but not hash(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an eq() method, it should not implement hash(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

@birkenfeld
Copy link
Member

To add to that, in Rust you can't easily mutate an object while it's in a HashMap (only via interior mutability or unsafe) since it can control that you never get a &mut reference to it.

Python can't offer such a guarantee, and therefore only lets you use immutable objects as keys or set items by default. Of course you can implement your own hash and wreak havoc.

@Stargateur
Copy link

Yes of course like in Rust key inside a hashmap should not be modified in a way that change the hash but that a solf error from the user, it's NOT enforced at compile time even in Rust. Well I don't know exactly how python works, but this mean I can't use this derive if I have a rust type that CAN be mutable. My use case is pretty simple I want my Rust struct to be hashable, and I have serialization and deserialisation feature that somehow need to construct dummy value and then set state so they need to be mutable... I blame python pickle for asking user non sense. Anyway, the only requirement is the user should not mutate the key of a hashmap (like in rust) and that for me obvious, but require this at compile time is very annoying making this derive not very useful.

@birkenfeld
Copy link
Member

If pickle is the reason why your class needs to be mutable, you should be able to use the __getnewargs__ protocol which will not set state after the object is constructed.

@Stargateur
Copy link

it's impossible my object can't be construct without specific parameter that I don't have access anymore after and this doesn't change my main argument, any non trivial struct that could be mutable will not be able to use this derive. Please ignore my specific usecase and just accept this struct also have mutable feature that user will not use when the struct is in a dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants