Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relax multiple-import check to only prevent subinterpreters #3446

Merged
merged 3 commits into from
Sep 29, 2023

Conversation

davidhewitt
Copy link
Member

Some issues observed in the wild from the current implementation of the multiple-import check:

The motivation for the check is from #2346 (comment) - it's to prevent use of PyO3 modules in sub-interpreters, which we are definitely not ready for yet.

Given the issues above don't actually use sub-interpreters, it seemed worth trying to fix them. This PR does so by storing the current interpreter ID when initializing a #[pymodule], and checking that does not change on subsequent imports.

I've also updated the error message to point at #576 so that discussion on the community can at least be centralized here.

src/impl_/pymodule.rs Outdated Show resolved Hide resolved
return Err(PyImportError::new_err(
"PyO3 modules may only be initialized once per interpreter process",
"PyO3 modules do not yet support subinterpreters, see https://github.com/PyO3/pyo3/issues/576",
));
}
(self.initializer.0)(py, module.as_ref(py))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to call the initializer multiple times or should the second import become a no-op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spirit of PEP 489 (which we don't support yet, but hopefully are heading towards) is that modules can be torn down and re-imported, so I think it's correct to call the initializer multiple times.

Copy link
Member

@adamreichold adamreichold Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which we don't support yet, but hopefully are heading towards

I think this is the main issue here, isn't it? Does this change mean we opt into allowing tear-down and reinitialization of PyO3-based modules? What invariants would this break? Do we test this somewhere?

(I am not trying to say that it doesn't work. Just that I myself do not understand the consequences of this change and have little confidence about its effects on existing code.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pydantic/pydantic#6584 (comment), you mention opt-in. Might this be advisable even for this limited relaxation? What if the module is torn down but static data still contains pointers into the old module (e.g. capsules) which are now invalid? For example, is the code in https://github.com/PyO3/rust-numpy/blob/c16fbb1e630b0538fdf17cbc53043bf0467459f9/src/npyffi/mod.rs#L30-L32 valid after this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I found the Cython implementation here:

https://github.com/cython/cython/blob/ce1aa59ab6300e18e50c752b5c9e28a04b665060/Cython/Utility/ModuleSetupCode.c#L1648C18-L1648C18

That code:

  • Has a similar check that the interpreter hasn't changed ID
  • Caches a module instance in a C static and returns the single module instance always.

So I guess let's implement this strategy too and defer the question of PEP 489 compatibility to a future PR. My goal for this PR is primarily to fix the use cases in the OP, which I believe the Cython strategy would cover.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clearly awkward to get this right. I think on balance it's desirable if we match Cython so that more of the ecosystem fails in the same way and it seems better that we only ever run the module initializer once for now. We can make progress towards per-module state to eventually remove this worry.

I agree with that. Do you know the upstream position on this, i.e. would they consider fixing this or does it work as intended?

(I made with_embedded_python_interpreter unsafe for this reason.)

I am somewhat confused as to what the consequences of this discovery are. Especially with the Cython module immediately showing the failure mode I was concerned about. Can we still relax this? Is the unsafe on with_embedded_python_interpreter sufficient? (I guess another other way to re-initialize Python in the current process would call unsafe Rust code or code in inherently unsafe languages like C, i.e. is trusted?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just posted python/cpython#109785 - let's see what the outcome of that discussion is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No movement yet on the upstream issue. @adamreichold are you happy if I proceed to merge this? As above at least we'll then be consistent with Cython and I think the generally accepted position is that things are expected to be wonky after finalizing and re-initializing the interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the language on with_embedded_python_interpreter is strong enough. This is the only API we provide to finalize and then re-initialize the interpreter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (apart from the raw FFI bindings). auto-initialize never bothers finalizing the interpreter.

src/impl_/pymodule.rs Outdated Show resolved Hide resolved
pyo3-build-config/src/lib.rs Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member Author

Ok, I've pushed a new commit to this PR which:

  • Removes the conditional code with OnceLock to just use AtomicI64
  • Keeps the existing behaviour on CPython 3.7 and 3.8, where we cannot detect subinterpreter changes
  • Makes 3.9 and newer cache the module object in a GILOnceCell and repeatedly return it as long as the interpreter is the same one which created the module object.

The use of GILOnceCell may at first seem at odds with the discussion in #576 to remove it, but hey, I figure this is explicitly aimed at preventing subinterpreters so it seems fine to use it for now 😄

@davidhewitt
Copy link
Member Author

(I'm not a huge fan of the divergence of behaviour between Python 3.7/3.8 and 3.9+, but I think in this case it is justified as the 3.9 behaviour is now a relaxation of the existing constraint to improve ergonomics where we can safely do so.)

@davidhewitt davidhewitt mentioned this pull request Sep 16, 2023
6 tasks
src/impl_/pymodule.rs Outdated Show resolved Hide resolved
@davidhewitt davidhewitt added this pull request to the merge queue Sep 28, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 28, 2023
@davidhewitt davidhewitt added this pull request to the merge queue Sep 29, 2023
Merged via the queue into PyO3:main with commit f335f42 Sep 29, 2023
61 checks passed
@davidhewitt davidhewitt deleted the relax-import-check branch September 29, 2023 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants