Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trusted publishing: pre-fill registration fields, when possible #13661

Closed
woodruffw opened this issue May 12, 2023 · 8 comments · Fixed by #16399
Closed

Trusted publishing: pre-fill registration fields, when possible #13661

woodruffw opened this issue May 12, 2023 · 8 comments · Fixed by #16399

Comments

@woodruffw
Copy link
Member

Originally raised by @webknjaz:

So... It just occurred to me that having to copy-and-paste some the same values for many projects was a bit of a "monkey work".
I realized that many projects have their GitHub URL in the structured metadata. And warehouse even has some classification code for showing different icons next to the links... So why not take advantage of that?
I think, warehouse could check the project's URLs for whether they point at GitHub and extract owner+repo segments from that.
Moreover, many projects have CI badges in the long description. It wouldn't be hard to extract the workflow filenames from most.
Of course the user should be urged to validate the populated values. But in general, this could be extremely useful for the end-users...

Using the structured metadata that comes with each project (specifically, the most recently uploaded release file) seems like a good approach to me, especially now that Warehouse is extracting that metadata per PEP 658 (#13649).

Pulling it from "badges" in the README might be a little riskier, particularly since people might accidentally copy those from templates/not update them from a fork. I think there'd also be diminishing returns with that approach (and we should be encouraging people to use the structured metadata, anyways).

@webknjaz
Copy link
Member

Pulling it from "badges" in the README might be a little riskier, particularly since people might accidentally copy those from templates/not update them from a fork. I think there'd also be diminishing returns with that approach (and we should be encouraging people to use the structured metadata, anyways).

Yeah, though, it's still worth doing. It just needs to be done only if the GH URL is declared in the structured metadata, and we'd have to verify the badges against the repo extracted from the meta.
The reason is that people don't normally put links to specific workflow pages in their structured metadata but rather reference the generic CI page. As in https://github.com/pypi/warehouse/actions instead of https://github.com/pypi/warehouse/actions/workflows/ci.yml, for example. Also, there's a case of such a workflow being separate from the test one.

Having a workflow name extracted from a link that matches the repository URL is a safe bet, right?

@woodruffw
Copy link
Member Author

Oh, I see what you mean -- I thought you were originally proposing using the URLs in badges as a separate source of information as a fallback versus structured metadata, not as an "enrichment" source for that metadata.

Using it as an enrichment source makes more sense to me, although I'm still not convinced that the quality of the data is worth it:

  1. Do projects commonly put badges for their release workflows in their READMEs? I personally put my "main" CI workflow badge in the README, and nothing else.
  2. How do we handle data model mismatches here? The trusted publishing feature is intentionally many-many between PyPI projects and trusted publishers; it's unclear how we should handle a README that contains multiple badges that "look like" release workflows.

I'll defer to the maintainers/admins here, but IMO we should start with the "baseline" here (leveraging the structured metadata), and move into heuristics later on (and primarily based on evidence that users are using badges in the way we'd need them to).

(As a side note: we could encourage people to put their specific release workflows in the structured metadata, which would solve both of these problems! That could probably be done with some additional documentation on the PyPA standards site, although we'd need to come up with reusable/generic-ish URL formats for different publisher types.)

@woodruffw
Copy link
Member Author

Triaging: @jleightcap will be working on this.

@di
Copy link
Member

di commented May 22, 2023

I think this is probably going to be somewhat confusingly magic. We have lots of instances of projects where the data we'd pull from here is not right (e.g. the user has forked https://github.com/pypa/sampleproject without updating metadata) and the pre-fill is going to be wrong. We can't really guarantee this is always going to be right, so I think I'd prefer to not attempt to do this.

@webknjaz
Copy link
Member

The intent is only to pre-fill the form, the users would still have to review it and submit..

One alternative, UX-wise, could be having suggestions instead of filling <input value="{{ thing }}">. For example, I was filling out a tax return recently and it had some of the personal data fields pre-filled — but that form had a prompt under each input to confirm that the pre-filled value was correct.
I think, this might be a viable solution.
Alternatively, there could be one banner/block with a single button, showing the user a URL (or several) and asking if that's right / if it should be used for pre-filling.

@di would this address your concern?

@woodruffw
Copy link
Member Author

Triaging: one idea that @di and I discussed here is to use a "magic link" technique instead, e.g. having the publish workflow emit a ::notice or similar with a clickable link that, when opened, takes the user to a pre-filled publisher creation form.

This is a bit more misuse-resistant, since the "direction" of the metadata is a canonical one (i.e. the action itself will be supplying the pre-filled values, rather than having PyPI guess at them).

In this approach seems reasonable, there are two action items:

  • PyPI itself needs a route that'll allow for pre-filling of the TP form, probably from URL parameters;
  • gh-action-pypi-publish will need to emit a generated URL on non-TP cases (i.e., when the user is still using a manually configured API token).

@di
Copy link
Member

di commented Jan 10, 2024

I think a pre-filled form with ample opportunity for confirmation should be sufficient (something that requires every field to be individually confirmed would be ideal).

Using URL parameters should be fine here since these will be logged-in views.

@woodruffw
Copy link
Member Author

As constituent tasks:

  1. Populate these fields from URL parameters; for example, ?provider=GitHub&repository_name=foo&... should pre-fill the GitHub form with the given values.
  2. Detect when fields are pre-filled and add a confirmation modal to the form submission in that case.
  3. Upstream: generate these URLs in gh-action-pypi-publish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@di @webknjaz @woodruffw and others