Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal metadata registry #34

Closed
certik opened this issue Jan 31, 2020 · 16 comments
Closed

Minimal metadata registry #34

certik opened this issue Jan 31, 2020 · 16 comments

Comments

@certik
Copy link
Member

certik commented Jan 31, 2020

After #33 is implemented, the next step is to implement a minimal metadata registry. Here is one way to do that:

  1. Have repository https://github.com/fortran-lang/package-registry that would contain a simple JSON file of this form which is the package registry:
[{
    "name": "stdlib",
    "versions": [
        {"version": "0.3.4", "url": "https://github.com/fortran-lang/stdlib/archive/0.3.4.tar.gz"},
        {"version": "0.3.5", "url": "https://github.com/fortran-lang/stdlib/archive/0.3.5.tar.gz"},
    ]
}, {
    "name": "bspline",
    "versions": [
        {"version": "6.0.0", "url": "https://github.com/jacobwilliams/bspline-fortran/archive/6.0.0.tar.gz"},
        {"version": "5.4.2", "url": "https://github.com/jacobwilliams/bspline-fortran/archive/5.4.2.tar.gz"},
    ]
}]
  1. We will then have scripts that take this JSON file and download the actual metadata for each package version. So for example, to obtain the metadata for the package bspline version 5.4.2, it would download the tarball https://github.com/jacobwilliams/bspline-fortran/archive/5.4.2.tar.gz, unpack and it would read its fpm.toml, which would contain all the metadata such as short and long description, the list of dependencies, and other things. Then we can automatically create a website which would list all this metadata. This generated website would contain a generated file metadata.json, which the fpm tool can then download to obtain a searchable data base of packages (fpm search).

  2. To add a new package to registry, just a new simple entry must be made to the above JSON file by hand, say by issuing a PR against the repository.

  3. Later we can automate things more, similarly to how conda-forge works (https://conda-forge.org/docs/maintainer/adding_pkgs.html), where to put a new package in, a PR is sent against https://github.com/conda-forge/staged-recipes/, where the CI checks initial quality and that the package builds, and then if it gets merged, the CI actually creates a new repository for the package etc. In our case, we could have a staging repository, and if a PR is merged, the CI would correctly update the above JSON file.

We can discuss if the JSON file should also contain all the metadata from fpm.toml directly. The advantage of the above approach is that it is not redundant, the JSON only contains the minimal amount of information that can be edited and maintained by hand, and if you want more, you download the tarball and read its fpm.toml, which will be done automatically in the step 2.

Overall, this minimal package registry only contains a minimal JSON file. The actual tarballs and metadata are hosted elsewhere. After this is well implemented and works, we can evolve it into a full package registry (#35).

@certik
Copy link
Member Author

certik commented Jan 31, 2020

The thing to discuss here, which we didn't have to worry about in #33 is what to do if multiple people want to have a bspline package. Should we require to prefix it with the github organization/user name, as in jacobwilliams/bspline, or what do we do if somebody submits (registers) the bspline name, but there will later be a better, more widely used and popular bspline package. If we simply switch the url for bspline from the old package to the new package, then all kinds of packages that already depend on the old bspline package would break. One approach could be that since most Fortran packages will be in this ecosystem, we would know which packages depend on it, so we could correctly update them all (and rename bspline to bspline_legacy). For example Debian had to rename the git package, as it wasn't the usual git, but some older package that just happened to be called git before the version control system came along. In our case I can see this happening for every popular package name such as "mesh", "utils", "spline", ... So we should have some policy how to handle such things. One such policy can be that we would evaluate the usage and a number of github stars, and allow the most used package to have the more popular name.

@milancurcic
Copy link
Member

Great, thanks! The step 2 in the proposed flow assumes that the package will include its fpm.toml. However, this is only possible if the package has an active maintainer that is willing to maintain the package-specific fpm.toml. Would this not preclude fpm from downloading tarballed packages from the wild, like SOFA for example?

If the community maintains all metadata needed to download and build the package in the registry, it would broaden the ecosystem of packages fpm could work with.

@certik
Copy link
Member Author

certik commented Jan 31, 2020

@milancurcic your last comment seems to be a bit orthogonal issue, so I created #36 for it to discuss just this aspect of how fpm is designed.

@milancurcic
Copy link
Member

Now that there's a package that can be built with fpm, let's revisit this issue which is a requirement for installing a package from a remote location such as a GitHub repo.

I think @certik's idea of a minimal registry is a good start. We'll also need a description field, so that fpm list not just names of packages, but also there (one-line) descriptions, just like other package managers.

Another doubt I have is whether this should be a separate repository, rather than part of this repository. In my opinion, keeping it in this repo is simpler because:

  • One doesn't need to maintain a separate repo for the registry
  • Less confusing for newcomers--there's only one repo (this one), whether you want to contribute to the code, or submit a package to the registry
  • Issues+PR system allows clean separation between fpm issues and package submissions to the registry. So we don't need to separate them in another way.

fpm would need to update its registry cache in either approach.

@certik are there benefits of having a registry in a separate repo, or is it more an esthetic thing?


Separate vs. same repo question aside, how would this look like from the UI perspective? For simplicity, let's forget about search for now. Let's say we just want to be able to list available packages. This could be something like:

fpm list
  datetime-1.7.0 -- Date and time manipulation
  openblas-0.3.9 -- Optimized BLAS library based on GotoBLAS2
  stdlib-0.1.0 -- Fortran standard library

Under the hood, fpm:

  1. Fetches the registry
  2. Parses it
  3. Lists individual packages and latest version so you get the above

We can discuss how to list individual available versions at a later time. Let's try to solve the minimal problem first.

@certik
Copy link
Member Author

certik commented May 10, 2020 via email

@everythingfunctional
Copy link
Member

I think we should do what Cargo does with crates.io and have a separate repository for packages. Packages are stored there as tarballs and you can interact with it via a simple REST API. This gives us several advantages.

  • We can put certain checks in place for packages published to the official repo
  • Anyone else can stand up their own repo, and just conform to the same API (i.e. private repos). We can even open-source the code for it
  • Anyone else can write whatever tools they like to interact with it

Until we can get such a service stood up, we should endeavor to keep a list in the fpm README of known packages

@certik
Copy link
Member Author

certik commented May 11, 2020

@everythingfunctional what you are proposing is #35. I think we all agree on that one. We also agree that is a lot of work, and so right now we are discussing what to do until we get there.

So far the proposals are:

I think we should definitely try the manual metadata registry, not just a README, as it would allow us to almost get the full experience of #35.


Progressing the discussion further, I proposed above how such a JSON file (if we use JSON) could look like. Milan suggested it also needs a description field. I don't think that's a good idea for the following reasons:

  • The description is another thing to handle manually
  • It can change between versions, so it would have to be attached to each version
  • It's duplicate from the upstream package's toml file, another thing to keep in sync.

The reason I think why Milan proposed it is to make it possible for fpm to print packages with more information about them. I agree fpm should be able to do that, but not in the above proposed way.

The same with putting this metadata in this repository.

Rather, we should plan out how to we get to fixing #35. And then in this issue we should do work that is aligned with it.

So here I am proposing a draft of such a plan:

  • The issue Full package registry #35 is mainly about hosting tarballs. But everything else about the registry can be done as part of this issue
  • Have a separate repository called fpm_registry
  • The fpm_registry will have a JSON file (with the format above)
  • To submit a package (version) to the registry, people send a PR towards fpm_registry to update the JSON file ---- and just like when submitting to pypi or crates, you do not fill out a separate description field, you should not here either --- that gets filled out automatically from fpm.toml (see below)
  • Then we have a separate repository plus CI pipeline that automatically takes this JSON file and:
    • downloads each package (it can cache old info, so only needs to download new packages), extracts full metadata (description, dependencies, etc.) and uses that information to:
    • create a nice website with a page for each package that looks like crates.io (has a description, links to dependencies, and any other useful metadata extracted from fpm.toml)
    • create a "registry JSON", which has full metadata for each package, including description
  • fpm gets updated to be able to download this "registry JSON" from this auto generated website, and use this "registry JSON" to print info about packages, what packages depend on, etc.

Then later on, to take this to implement the full #35, the only thing missing really is just hosting of tarballs. Everything else I think can be reused.

The above plan also allows other people (companies) to host their own registry


The above plan can be started by simply:

  • create an fpm_registry with the minimal JSON
  • add a CI that takes this JSON and creates "registry JSON" and hosts it online
  • update fpm to download and work with this "registry JSON"

These are three simple steps that I can even help implement, I've done something similar for LFortran. This can then be naturally expanded to also create a nice website.

@everythingfunctional
Copy link
Member

I like that plan. It's usable to the point that even if we don't end up moving to a tarball hosting registry, I don't think anybody would even mind.

@milancurcic
Copy link
Member

I like this plan as well. So actually this minimal registry is not what's read by fpm, but is read by another program that outputs the "production" registry with complete information. This is a good idea because then we don't have to assume ahead of time what is all the metadata that we'll need.

I realize now that this issue is a step 2 from a 3-step #33. I will write there for now.

@certik
Copy link
Member Author

certik commented Jul 20, 2020

Now when #33 is (mostly) done, let's tackle this issue.

@milancurcic, @everythingfunctional, let's keep the (centralized) registry in a separate repository. How should it be named? Some ideas:

https://github.com/fortran-lang/package-registry
https://github.com/fortran-lang/fpm-package-registry
https://github.com/fortran-lang/fpm-registry

I don't really have a preference. This repository will have a JSON or rather a TOML file where people will submit their packages using a GitHub PR. This file will only contain the name of the package, the version and url (everything else is redundant, so should not be there). There can also be the "latest"/"development" version that would simply download the latest git (and thus things like description can change in this latest version, so that should not be part of this TOML file, but rather only in the upstream repository inside fpm.toml, and we process it automatically).

We'll then build CI jobs to process this JSON/TOML file to:

  • create rich metadata JSON file that collects things like descriptions, license, website, logo url (later on), etc. by downloading the package (ensuring it actually downloads...) and reading the fpm.toml inside it.
  • add a section to our fortran-lang.org website (ccing @LKedward) that would probably use the json metadata from the previous point
  • implement fpm search that would use the json metadata from the first bullet point to implement search (so that you can search through the description of the package, not just the name).

@milancurcic
Copy link
Member

Looks good. I like fpm-registry.

I'm unclear about versions. If this file includes the version number, which version is it? The latest? Perhaps all versions that are fpm-enabled? Or should the version metadata be the responsibility of the package itself?

@certik
Copy link
Member Author

certik commented Jul 20, 2020

Re version: all versions that are fpm enabled. This is hard to figure out automatically, as typically old versions are just some git tags, and maybe not all of them are valid / working, etc. So I figured each version has to be explicitly specified, and one of the version can be "latest git commit".

@everythingfunctional
Copy link
Member

I like fpm-registry.

I worry a bit about having to specify every version. But as this is more of a stop-gap measure, I guess it's ok.

@certik
Copy link
Member Author

certik commented Jul 21, 2020

Ok, I created https://github.com/fortran-lang/fpm-registry and gave access to everybody with push access. We can start submitting PRs against that repository to get it up.

@certik
Copy link
Member Author

certik commented Jul 21, 2020

We can continue the discussion at fortran-lang/fpm-registry#1 and other issues there.

@LKedward
Copy link
Member

Closing as this has been implemented at https://github.com/fortran-lang/fpm-registry/, further discussion can continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants