Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Interface (.pyi) generation and runtime inspection #2454

Open
1 of 7 tasks
CLOVIS-AI opened this issue Jun 15, 2022 · 51 comments
Open
1 of 7 tasks

Python Interface (.pyi) generation and runtime inspection #2454

CLOVIS-AI opened this issue Jun 15, 2022 · 51 comments

Comments

@CLOVIS-AI
Copy link
Contributor

CLOVIS-AI commented Jun 15, 2022

Hi!

This issue is going to be a summary of my prototypes to generate Python Interface files ('stub files', '.pyi files') automatically. The prototypes are available as #2379 and #2447.

  • [Prototype] Python stubs generation #2379 aimed to generate Python stub files entirely at compile-time. This is not possible because proc-macros run before type inference and trait analysis, so proc-macros cannot know if a type implements a trait or not.
  • Prototype: Runtime inspection and Pyi generation #2447 aims to generate information at compile-time that represents the structure of #[pyclass] structs (which methods exist, what arguments do they accept) to be read at run-time by the stub generator.

I'm presenting the results here to get feedback on the current approach. I'm thinking of extracting parts of the prototypes as standalone features and PRs.

Progress

Accessing type information at runtime:

Accessing structural information at runtime:

  • Declare the inspection API
  • Generate the inspection API structures as part of #[pyclass] and #[pymethods]
  • Collect the inspection data per module

Python interface generation:

  • Generate the Python interface files
  • Document how to generate PYI files in end user's projects

Summary

The final goal is to provide a way for developers who use PyO3 to automatically generate Python Interface files (.pyi) with type information and documentation, to enable Rust extensions to 'feel' like regular Python code for end users via proper integration in various tools (MyPy, IDEs, Python documentation generators).

I have identified the following steps to achieve this goal. Ideally, each step will become its own PR as a standalone feature.

  1. provide a way to extract the full Python type information from any object passed to/retrieved from Python (e.g. List[Union[str]], not just PyList).
  2. provide an API to describe Python objects at run-time (list of classes, list of methods for these classes, list of arguments of each method, etc).
  3. improve the macros so they generate at compile-time the various inspection data structures (the API from 2.)
  4. write a run-time pyi generator based on the inspection API

1 and 2 are independent, 3 and 4 are independent.

Full type information

The goal of this task is to provide a simple way to access the string representation of the Python type of any object exposed to Python. This string representation should follow the exact format of normal Python type hints.

First, a structure representing the various types is created (simplified version below, prototype here):

struct TypeInfo {
    Any,
    None,
    Optional(Box<TypeInfo>),
    ...
    Builtin(&str),
    Class {
        module: Option<&str>,
        name: &str,
    }
}

impl Display for TypeInfo {
    // Convert to a proper String
}

PyO3 already has traits that represent conversion to/from Python: IntoPy and FromPyObject. These traits can be enhanced to return the type information. The Python convention is that all untyped values should be considered as Any, so the methods can be added with Any as a default to avoid breaking changes (simplified version below, prototype here):

pub trait IntoPy<T> {
    // current API omitted

    fn type_output() -> TypeInfo {
        TypeInfo::Any
    }
}

pub trait FromPyObject {
    // current API omitted

    fn type_input() -> TypeInfo {
        TypeInfo::Any
    }
}

The rationale for adding two different methods is:

  • Some structs implement one trait but not the other (e.g. enums which use derive(FromPyObject)), so adding the method to only one of the trait would not work in all cases,
  • Creating a new trait with a single method would be inconvenient for PyO3 users in general, as it would mean implementing one more trait for each Python-exposed object
  • Both methods have a sensible default, and are both trivial to implement so I don't believe there are any downsides,
  • Some Python classes should have a different type when appearing as a function input and output, for example Mapping<K, V> as input and Dict<K, V> as output. Using two different methods supports this use case out-of-the-box.

After this is implemented for built-in types (prototype here), using them becomes as easy as format!("The type of this value is {}", usize::type_input()) which gives "The type of this value is int".

Inspection API

This section consists of creating an API to represent Python objects.

The main entry point for users would be the InspectClass trait (simplified, prototype here):

pub trait InspectClass {
    fn inspect() -> ClassInfo;
}

A similar trait would be created for modules, so it becomes possible to access the list of classes in a module.
This requires creating a structure for each Python language element (ModuleInfo, ClassInfo, FieldInfo, ArgumentInfo…, prototype here).

At this point, using this API would require instantiating all structures by hand.

Compile-time generation

Proc-macros can statically generate all information needed to automatically implement the inspection API: structural information (fields, etc) are already known, and type information can simply be delegated to the IntoPy and FromPyObject traits, since all parameters and return values must implement at least one of them.

Various prototypes:

  • 38f0a59: extract classes
  • 56b85cf: extract the list of functions
  • 8125521: extract a function's kind (function, class method, static method…)
  • 4070ad4: extract the function's return type,
  • 53f2e94: extract attributes annotated with #[pyo3(get, set)],
  • 003d275: extract argument names and type

This is done via two new traits, InspectStruct, InspectImpl which respectively contain the information captured from #[pyclass] and #[pymethods]. Due to this, this prototype is not compatible with multiple-pymethods. I do not know whether it is possible to make it compatible in the future.

Python Interface generator

Finally, a small runtime routine can be provided to generate the .pyi file from the compile-time extracted information (prototype here).

Thanks to the previous steps, it is possible to retrieve all information necessary to create a complete typed interface file with no further annotations from a user of the PyO3 library. I think that's pretty much the perfect scenario for this feature, and although it seemed daunting at first, I don't think it's so far fetched now 😄

The current state of the prototype is described here: #2447 (comment).

@davidhewitt
Copy link
Member

@CLOVIS-AI just a ping to say I haven't forgotten about this; have been ill / busy and was just about cleared through the backlog enough to read this when #2481 came up. I think I need to push through some security releases first, after which my plan was to finish #2302 and then loop back here with us ready to support a syntax for annotations. Sorry for the delay.

@CLOVIS-AI
Copy link
Contributor Author

@davidhewitt don't worry about the full review for now, it's just a prototype. If you have time, please just read this issue and give me your general feedback on the idea. If it seems good with you, I'll be able to start writing a real PR for at least a part of it and we can do a full review then 👍

@davidhewitt
Copy link
Member

Hey, so I finally found a moment to sit down and think about this. Thank you for working on this and for having patience with me.

This looks great, I think this is definitely the right way to go. In particular splitting into two traits for the input/output I think is correct.

Some thoughts and questions:

  • I've wanted something like FromPyObject::type_input for a long time, I think it can be used to implement improved error messages. In particular in PyDowncastError we currently store the type name, e.g. PyString, but with type_input we could do something better (as long as the input is not Any, I guess).
  • I think it would be nice to feature-gate the inspection API and macro codegen for it - presumably it would be used with a debug build to emit type stubs, and then it wouldn't be needed in the final release build.
  • To be valid .pyi files they often need to import type dependencies. Do you have a vision how we might be able to make this work?
  • Imagine users create custom MyDict[K,V] class which would be a generic type (in Python's eyes, the Rust code would potentially just use PyAny). Can we support it with this proposal?
  • For cases like the above, if we can't support them, we might need mechanisms to load external type stub fragments to combine into the final artifact.

Overall, yes, I'm happy to proceed with this - as a first step I'd suggest we get type_input and type_output merged. We could already use them to improve error messages and also maybe add function signatures in their docstrings. That would buy us some time to figure out the introspection API, which I think will have some complexity.

@CLOVIS-AI
Copy link
Contributor Author

I was thinking of feature-gating the macro generation but not the inspection API itself (so you would be able to construct inspection APIs yourself in all cases, but would need to enable the automatic implementation). I assume that the API itself will not have any significant effect on compile-time, since it's just normal structs. What do you think?

The TypeInfo struct has a module method that returns the name of the module a class is from. When generating a .pyi file, a recursive visit of all types appearing in the file would yield the list of modules to import. This may be too simplistic (and will probably fail if two classes in different modules have the same name), but it should be good enough for most users (for example, in my case, we have a single module anyway).

About custom generics: the approach in this documentation couldn't, but the one in #2490 can (however, it must be through user input, I don't see a way in which the macros could guess that).

Combining external information with the generated ones will be trivial: because the macros will generate an implementation of the inspection API, and the .pyi generation will take an implementation as parameter, users can simply edit the generated implementation before passing it to the .pyi generation.

@CLOVIS-AI
Copy link
Contributor Author

It seems like #2490 will be merged soon. I won't have a lot of time on my hands in the close future, so if someone else wants to help in the meantime, the next big question is the way to represent the program (Python classes, Python methods, Python modules) as Rust structures.

My prototypes are close to solving the problem, except that I'm not a fan of how they deal with modules. The structures themselves seem fine, but the way to convert a #[pymodule] function into them is unclear.

@Tpt
Copy link
Contributor

Tpt commented Oct 10, 2022

Not sure if it relevant here:

I have written a small python script to generate type stubs from pyo3 libraries with doc strings including type annotations (using the :type and :rtype: format).
It works on the already built libraries using the Python introspection feature.
To use it run python generate_stubs.py MY_PACKAGE MY_FILE.pyi
Here is an example of generated stubs.

@CLOVIS-AI
Copy link
Contributor Author

@Tpt That's great! However if I understand correctly you still have to declare the type twice (first as a Rust type, then as a Python type in the documentation), which is error-prone, and what this issue tries to avoid. I agree that it's already a great step up from the current situation of writing the .pyi entirely manually.

@Tpt
Copy link
Contributor

Tpt commented Oct 12, 2022

@CLOVIS-AI Yes! Exactly. Indeed, avoiding to duplicate types would be much better. I wanted to get something working quickly for now instead of having to enter the auto generation from Rust rabbit hole.

@kylecarow
Copy link

Would love to have this!

@fzyzcjy
Copy link

fzyzcjy commented Jan 4, 2023

Looking forward to the features!

@CLOVIS-AI
Copy link
Contributor Author

Hi, I have changed workplace and do not have time to contribute to this project anymore. If someone wants to continue this PR, please feel free to. My prototype is still online, and the outline described here should be good.

@PierreMardon
Copy link

I'm not experienced enough to help on this, just testifying about my use case it would be of great help. Well in the meantime, I'm going to write the pyi files by hand.

bors bot added a commit that referenced this issue Jan 17, 2023
2882: inspect: gate behind `experimental-inspect` feature r=davidhewitt a=davidhewitt

This is the last thing I want to do before preparing 0.18 release.

The `pyo3::inspect` functionality looks useful as a first step towards #2454. However, we don't actually make use of this anywhere within PyO3 yet (we could probably use it for better error messages). I think we also have open questions about the traits which I'd like to resolve before committing to these additional APIs. (For example, this PR adds `IntoPy::type_output`, which seems potentially misplaced to me, the `type_output` function probably wants to be on a non-generic trait e.g. `ToPyObject` or maybe #2316.) 

As such, I propose putting these APIs behind an `experimental-inspect` feature gate for now, and invite users who find them useful to contribute a finished-off design.

Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
@davidhewitt davidhewitt removed this from the 0.18 milestone Jan 20, 2023
@op8867555
Copy link

I've played around with #2447 in the last few days and I tried to fix some failed tests.

I got stuck at missing IntoPy impl for PyResult, such that currently the macro generates:

<crate::PyResult<&crate::PyAny,> as _pyo3::conversion::IntoPy<_>>::type_output()

Simply providing impl IntoPy for PyResult<T> will not work because there're already have IntoPyCallbackOutput impls.

I was thinking about move type_input()/type_output() into a separated trait,

pub trait WithTypeInfo {
    fn type_output() -> TypeInfo;
    fn type_input() -> TypeInfo;
 }

with impl <T: WithTypeInfo> WithTypeInfo for PyResult<T>. Is is a good idea?

@davidhewitt
Copy link
Member

@op8867555 good question, and I'm not sure I can give you the answer easily. The downside of moving into a separate trait is that you might find without specialization this creates a lot of work. Having the methods on the IntoPy trait allows for default implementations for &PyAny.

I think the best answer is - if you're willing to give it a go, please do, and let's see how that works out :)

@op8867555
Copy link

I've tried the separate trait approach, and it solved the PyResult issue I mentioned before. however, I haven't figure out how to support specialization. (e.g. Vec<u8> generates List[int] instead of bytes for now), I've tried to apply this autoref tricks but I didn't get it works with user-created WithTypeInfo impls 1.

Also, I've tried embedding field info into PyClassItems2, this makes multiple pymethods could be supported easily.

Footnotes

  1. https://github.com/op8867555/pyo3/commit/34f3fb91ef9080c0762f6050c78cd22148fde82b

  2. https://github.com/op8867555/pyo3/commit/f334bac071bd4834dd46173c5db337cf9fade432

@davidhewitt
Copy link
Member

davidhewitt commented Feb 10, 2023

(e.g. Vec generates List[int] instead of bytes for now)

Note that this is very much the case in PyO3 that Vec<u8> creates List[int], your annotation is correct 😄

I've tried to apply this autoref tricks but I didn't get it works with user-created WithTypeInfo impls.

This setup may potetentially work:

struct TypeAnnotation<T>(PhantomData<T>);

impl<T> WithTypeInfo for &'_ TypeAnnotation<T> {
    fn type_input() -> TypeInfo { TypeInfo::Any }
    fn type_output() -> TypeInfo { TypeInfo::Any }
}

and specific implementations can then use impl WithTypeInfo for TypeAnnotation<T>. Or maybe there's some context I am missing?

Also, I've tried embedding field info into PyClassItems, this makes multiple pymethods could be supported easily.

Yep that should work fine 👍

@op8867555
Copy link

Note that this is very much the case in PyO3 that Vec<u8> creates List[int], your annotation is correct

Oh, I didn't notice that 😅 . Are there any other specialization cases PyO3 creates?

This setup may potetentially work:

struct TypeAnnotation<T>(PhantomData<T>);

impl<T> WithTypeInfo for &'_ TypeAnnotation<T> {
    fn type_input() -> TypeInfo { TypeInfo::Any }
    fn type_output() -> TypeInfo { TypeInfo::Any }
}

and specific implementations can then use impl WithTypeInfo for TypeAnnotation<T>. Or maybe there's some context I am missing?

I tried this (with some modification1) and didn't manage to make it work with user defined datatypes (e.g. provide type annotation for a rust enum like this). There will be an error when trying to provide an impl for a non-pyclass datatype since both WithTypeInfo and TypeAnnotation are defined outside of the crate.
Also, there will be a conflict when both impl<T> _ for TypeAnnotation<Vec<T>> and impl _ for TypeAnnotation<Vec<u8>> being provided. It seems an another layer of specialization can't be made this way.

Footnotes

  1. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c3b372cdea67f7cd0e2a875e0bc94341

@Tpt
Copy link
Contributor

Tpt commented Feb 2, 2024

I started to investigate this a bit more. It seems to me that one of the last major blocker not solved yet by the linked draft pull request is figuring out module introspection: we would like to be able to list the elements of the module (classes, functions...) with access to the introspection data of each module element.

The current API to build modules relies directly on building a Python module, erasing easy access to the Rust structures that declared the classes/functions/... added to the module. I see multiple ways to work around it:

  1. Make a very bad addition to the #[pymodule] proc macro and look for all .add_* function calls to build this list. We would then build a static function returning the list of elements and their introspection. This seems quite fragile and error prone but does not require the user to change their code.
  2. Use in the function building the module a PyModuleBuilder instead of a Bound<'py, PyModule>. This builder would keep a readable registry of the added elements (idea from this comment. The function building type stub would then just need to run this function again to get the list of elements.
  3. Move forward on add #[pymodule] mod some_module { ... }, v2 #3294 to get a clear list of module content and generate introspection data at build time like option 1.

I tend to prefer option 3, it seems the cleanest approach. Introspection and stub generation is a new feature, so it is not a big deal to gate it on adopting an other new thing.

@davidhewitt What do you think?

@davidhewitt
Copy link
Member

I am definitely keen on option 3!

I think that we might need to consider option 2 as well, because it's possible that even I'm with a declarative #[pymodule] some users may want to build modules using imperative code

Personally I would be very excited to start with option 3 and once we get something working for that we can consider option 2 later on.

@Tpt
Copy link
Contributor

Tpt commented Feb 3, 2024

@davidhewitt Great! It sounds perfect! Thank you! I am going to rebase #2367 and apply on it the code review comments you made on #3294

@davidhewitt
Copy link
Member

Worth noting that CPython is currently talking about improving __text_signature__ / __signature__, but this will be a future feature at best: https://discuss.python.org/t/signatures-a-call-to-action/23580/37

I gave them a brief heads up of what we're up to in https://discuss.python.org/t/request-for-review-of-gh-81677-basic-support-for-annotations-in-text-signature-s/43914/12

@davidhewitt
Copy link
Member

The CPython effort has fallen short for now, so I think this reinforces that proceeding with experimental-inspect is the right solution.

@deoqc
Copy link

deoqc commented Jun 11, 2024

I would like to offer a possible alternative path:

Use the "structured representation" of types in json format (nightly feature), similar of pavex.

Since the type are used outside of the crate, you don't need to have any rust code generated, so having the final representation of types like this would avoid lots of problems and rough edges of macro based solution.

Also, the crate would not need to be nightly to have this, only have a nightly installation to run the script to build the type hints.

@Tpt
Copy link
Contributor

Tpt commented Jun 11, 2024

Use the "structured representation" of types in json format (nightly feature), similar of pavex.

Since the type are used outside of the crate, you don't need to have any rust code generated, so having the final representation of types like this would avoid lots of problems and rough edges of macro based solution.

Also, the crate would not need to be nightly to have this, only have a nightly installation to run the script to build the type hints.

If I understand correctly, the "structured representation" of types is the JSON output of Rustdoc.

If yes, I find it is definitely an interesting idea, thank you! I see two advantages: 1. it does not require to play with the cdylib objects to emit introspection data 2. it offers a full view of the source code including elements only built on a given target.

However, I see a major downside: the approach of making the macro emit the introspection data allows to avoid a lot of code duplication: the piece of the responsible to emit the cpython-compatible data structure (class descriptor...) is also responsible to emit the introspection data, allowing to have a single place of definition. An external introspection system based on Rustdoc JSON would have to reimplement a big chunk of this logic, making discrepancies easier to introduce and, probably, leading to a significant amount of slightly duplicated code.

@davidhewitt What do you think about it?

@deoqc
Copy link

deoqc commented Jun 12, 2024

If I understand correctly, the "structured representation" of types is the JSON output of Rustdoc.

Yep, that's right.

However, I see a major downside: the approach of making the macro emit the introspection data allows to avoid a lot of code duplication

Having the same logic split in 2 places indeed looks bad.

@Jgfrausing
Copy link
Contributor

I made an alternative option for just creating the pyi file using a proc macro. It has a lot of rough edges but it works for our use case.

#4267

@termoshtt
Copy link

We've release our stub file generator crate:
https://github.com/Jij-Inc/pyo3-stub-gen

This crate try to extract type information in Rust side using proc-macro, and gather these information with inventory crate like multiple-pymethods feature does. The stub file is generated where the maturin can read both for pure Rust and mixed Rust/Python project.

Please see README for usage, and docs.rs for its mechanism.

@Tpt
Copy link
Contributor

Tpt commented Aug 20, 2024

@termoshtt Amazing! Thank you! If I try to summarize the differences between your crate and my PR #3977:

  • pyo3-stub-gen require extra proc-macros to be added (unavoidable because it's not part of PyO3)
  • pyo3-stub-gen is easier to implement (no need to encode the introspection data in a Rust const like my PR does)
  • pyo3-stub-gen requires all introspected objects to be part of a single crate (my PR supports multiple crates)
  • pyo3-stub-gen requires to run the built code and so is harder to use in cross-compilation settings (one would need an emulator)
  • my PR might not be able to support properly generic types like dict or list (building const values recursively is hard in Rust)

@CLOVIS-AI
Copy link
Contributor Author

@Tpt have you looked at the linked PRs in the initial issue? They can manage generic types with no issues.

@Tpt
Copy link
Contributor

Tpt commented Aug 20, 2024

@CLOVIS-AI Yes! Thank you so much for them. They have been a great inspiration. If I understood them correctly, they follow the same approach as pyo3-stub-gen, i.e. macros generate code that must be executed at runtime instead of generating const elements that could be extracted from the built cdylib. I went with the const approach to support compilations where a compatible runtime environment is not present (cross compilation...).

@daemontus
Copy link
Contributor

I apologize for the (low-key) spam, but...thank you very much @termoshtt! This is very much what we've been looking for in our projects and should save us quite a bit of time once we get this up and rolling!

@termoshtt
Copy link

@Tpt Thanks summarize! I apologize if I'm mistaken, as I haven't fully read through the #3977 , but my brief response is as follows

pyo3-stub-gen require extra proc-macros to be added (unavoidable because it's not part of PyO3)

Yes. In addition, our macro #[gen_stub_pyclass] and others re-implement a parser for #[pyo3(...)] annotations to get information of module, name and so on. So this is not the best solution, this should be integrated into pyo3::pyclass macro itself. But we have started pyo3-stub-gen project to proof this approach really works.

pyo3-stub-gen is easier to implement (no need to encode the introspection data in a Rust const like my PR does)

pyo3-stub-gen passes the introspection data from proc-macro-generated code to stub file generator via function pointers fn() -> TypeInfo.
https://docs.rs/pyo3-stub-gen/latest/pyo3_stub_gen/type_info/struct.ArgInfo.html
Since the inventory crate requires submit!ted data be const, we store the function pointer of PyStubInfo::<T>::type_output (this pointer is different for different T) in proc-macro generated code, and call it in stub file generator. Thus

pyo3-stub-gen requires all introspected objects to be part of a single crate (my PR supports multiple crates)

would be true. Honestly, I never considered the multi-crate case.

pyo3-stub-gen requires to run the built code and so is harder to use in cross-compilation settings (one would need an emulator)

Since I think generated stub file is platform independent usually, I intended to use pyo3-stub-gen to generate stub file on the machine of developer and git add the generated stub file (and check it is updated on CI by regenerating it). Our (private) product uses it with feature gate like

#[cfg_attr(feature = "stub_gen", gen_stub_pyclass)]
#[pyclass]
struct A;

Then maturin does not distinguish the stub file is manually written or automatically generated.

@CLOVIS-AI
Copy link
Contributor Author

i.e. macros generate code that must be executed at runtime instead of generating const elements that could be extracted from the built cdylib

Yes. My long-term goal was to have it be executed by Maturin as part of the build step, so the result could be embedded in the package directly.

@Tpt
Copy link
Contributor

Tpt commented Aug 20, 2024

@termoshtt Thank you! I agree with all you said. I think the difference of design between your crate and my MR is mostly because I wanted to support the multi-crate use case and having stubs different by platforms (I got some code triggering these two edge cases). If we make the choice to don't support these two use cases, your design is indeed much better.

@abrisco
Copy link
Contributor

abrisco commented Sep 14, 2024

@termoshtt Thank you! I agree with all you said. I think the difference of design between your crate and my MR is mostly because I wanted to support the multi-crate use case and having stubs different by platforms (I got some code triggering these two edge cases). If we make the choice to don't support these two use cases, your design is indeed much better.

It would be awesome to upstream it into PyO3 some day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests