Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SkohubProvider #29

Open
nichtich opened this issue Nov 26, 2020 · 24 comments
Open

Add SkohubProvider #29

nichtich opened this issue Nov 26, 2020 · 24 comments
Assignees
Labels
feature Additional functionality providers

Comments

@nichtich
Copy link
Member

Skohub is a static site generator for SKOS vocabularies. It's result can be browsed in machine-readable form (see https://blog.lobid.org/2019/09/27/presenting-skohub-vocabs.html) and could be wrapped as Provider.

@stefandesu
Copy link
Member

It should be pretty easy to implement.

I haven't figured out how to get a machine-readable form of the list of schemes (on the web at https://skohub.io/hbz/vocabs-edu/heads/master/, maybe @acka47 can help), but all the other data is available as JSON-LD. Examples:

The format is almost JSKOS already, with some differences:

  • JSKOS has a lot of fields as arrays of objects which are objects here.
  • JSKOS uses uri instead of id.
  • SkoHub differentiates between title (for concept schemes) and prefLabel (for concepts); JSKOS only uses prefLabel.
  • SkoHub has a description for concept schemes which doesn't exist in JSKOS. I guess the equivalent is definition?

If I understand correctly, this is not an API, but rather flat files. So we can't used things like broaderTransitive in Skosmos to get all ancestors of a concept, but have to make multiple requests instead.

@acka47
Copy link

acka47 commented Nov 30, 2020

I am not sure what a Cocoda provider actually is but I assume that this issue is (at least in part) about loading a complete SKOS scheme from SkoHub Vocabs.

SkoHub Vocabs supports both hash and slash URIs and a vocabulary using hash URIs can be directly loaded from one URL, see as an example https://skohub.io/acka47/nwbib-spatial/heads/master/nwbib.de/spatial.json (this is only for testing, the canonical version of the concept scheme resides at https://nwbib.de/spatial).

@stefandesu
Copy link
Member

@acka47 My question was rather whether it is possible to get the list of all available schemes (for example https://skohub.io/hbz/vocabs-edu/heads/master/) in a machine-readable format / JSON-LD.

@acka47
Copy link

acka47 commented Nov 30, 2020

@stefandesu This is currently not possible. I am not sure about your goal yet: Do you want to get all vocabularies for a git repo that is connected do SkoHub Vocabs (like https://skohub.io/hbz/vocabs-edu/heads/master/) or would you even like to have a list for the whole SkoHub Vocabs instance (here: https://skohub.io)?

@stefandesu
Copy link
Member

@acka47 Thanks for your reply! It seems like I hadn't fully grasped SkoHub Vocabs yet. So there are different SkoHub Vocabs instances (like https://skohub.io), each instance has a number of connected repos (like https://skohub.io/hbz/vocabs-edu/heads/master/ or https://skohub.io/acka47/nwbib-spatial/heads/master/), and each of those repos can have contain multiple vocabularies. In this case, I think getting all possible vocabularies for all repos in a SkoHub Vocabs instance would be overkill.

I think for the beginning it should be sufficient if a list of vocabularies has to be provided as well when configuring access to SkoHub Vocabs. 👍 We already have a configuration property for this and this is also necessary for Skosmos access, for example.

@acka47
Copy link

acka47 commented Dec 1, 2020

So there are different SkoHub Vocabs instances (like https://skohub.io), each instance has a number of connected repos (like https://skohub.io/hbz/vocabs-edu/heads/master/ or https://skohub.io/acka47/nwbib-spatial/heads/master/), and each of those repos can have contain multiple vocabularies.

Right now, afaik, there are two instances running on a server: skohub.io and https://vocabs.openeduhub.de/ but there also is the option to easily run SkoHub Vocabs with a single connected repo using Docker and GitHub pages (see 3 & 4 in our SWIB20 workshop) so that anybody with a GitHub account can easily set up an instance.

I think getting all possible vocabularies for all repos in a SkoHub Vocabs instance would be overkill.

Right

I think for the beginning it should be sufficient if a list of vocabularies has to be provided as well when configuring access to SkoHub Vocabs. We already have a configuration property for this and this is also necessary for Skosmos access, for example.

Does this mean that we don't need to add anything to SkoHub Vocabs for now? Anyway, I've opened an issue for adding structured data to the vocabulary list for a repo: skohub-io/skohub-vocabs#110

What kind of vocabulary list do you require for configuring access in Cocoda? Can you point me to a spec?

@stefandesu
Copy link
Member

Does this mean that we don't need to add anything to SkoHub Vocabs for now? Anyway, I've opened an issue for adding structured data to the vocabulary list for a repo: skohub-io/skohub-vocabs#110

Yes, no need to add anything for now, and thanks!

What kind of vocabulary list do you require for configuring access in Cocoda? Can you point me to a spec?

So this repo offers uniform access (= comparable methods and data in JSKOS format) to different sources of data. For example, we have a wrapper for Skosmos so that we can access Skosmos instances. This particular issue is about adding a wrapper for SkoHub Vocabs. As soon as that wrapper is available, configuration could be as easy as a JSON object like this:

{
  "provider": "SkoHubVocabs",
  "api": "https://skohub.io/hbz/vocabs-edu/heads/master/",
  "schemes": [
    { "uri": "https://w3id.org/class/esc/scheme" }
  ]
}

I guess if you're interested, you could look at the docs for ConceptApi (https://gbv.github.io/cocoda-sdk/ConceptApiProvider.html). For this wrapper, I would implement getSchemes, getTop, getConcepts, getNarrower, and getAncestors, for now. For search, since there is no API, we would need to add FlexSearch and use your index for that.

I would also suggest to put this provider/wrapper into a separate module instead of including it with cocoda-sdk.

@nichtich
Copy link
Member Author

nichtich commented Dec 7, 2020

If the vocabulary is reasonably small, we could also load the full JSON-LD file. Shohub-vocabs seems (I've not found the specific part of its code) to generate a schema.json file with the full vocabulary, e.g.

https://skohub.io/dini-ag-kim/hcrt/heads/master/w3id.org/kim/hcrt/scheme.json for the vocabulary http://bartoc.org/en/node/20057 (I've added this URL as API in BARTOC).

The JSON-LD is structured with a context document defined here. I'd prefer to not convert to RDF and back to JSON-LD but reuse the existing JSON, this requires skohub-vocabs to not change the context document in a way that's not backwards compatible.

@acka47
Copy link

acka47 commented Dec 7, 2020

Shohub-vocabs seems (I've not found the specific part of its code) to generate a schema.json file with the full vocabulary, e.g.

This is only the description of the scheme with a list of the topConcepts and their labels but there is a lot missing:

To get this information, you'll have to derefence each concept.

Note that this only holds for SKOS vocabularies with Slash URIs. As one would suspect, you will find all the information for all concepts in one file if the vocab uses hash URIs, see e.g. https://skohub.io/acka47/nwbib-spatial/heads/master/nwbib.de/spatial.json (which is a fork for testing purposes of the NWBib spatial classification that uses hash URIs).

@acka47
Copy link

acka47 commented Dec 7, 2020

this requires skohub-vocabs to not change the context document in a way that's not backwards compatible.

Probably, we will use an external context at some point but its content will be backwards compatible.

@nichtich
Copy link
Member Author

nichtich commented Dec 7, 2020

Ok, then we have vocabulary information and top concepts (getTop) from the scheme.json file, search via the scheme.index file, indivual concepts and their narrower/broader via scheme.json for hash-URIs and via individual concept pages in JSON. How to do the mapping from concept URIs to concept JSON files.

@acka47
Copy link

acka47 commented Dec 7, 2020

As you are referring to scheme.json twice and also to scheme.index, I have to make clear: This has nothing to do with SkoHub Vocabs and you will not find a scheme.json for every vocabulary, as it depends on which URI you mint for the ConceptScheme.

Actually, I think it should be considered best practice (although we haven't done it in the past for some vocabs) not to use namespace:scheme but only namespace for the scheme itself as e.g. DCMI LRMI vocabs do. (See also the already mentioned NWBib spatial, scheme: https://skohub.io/acka47/nwbib-spatial/heads/master/nwbib.de/spatial.json, index: https://skohub.io/acka47/nwbib-spatial/heads/master/nwbib.de/spatial.index.)

@nichtich
Copy link
Member Author

nichtich commented Dec 7, 2020

Well then I'm confused

  • given a concept scheme URI
    • how to get its top concepts in JSON?
    • how to get the index page for search?
  • given a concept URI
    • how to get the concept details in JSON?

@acka47
Copy link

acka47 commented Dec 7, 2020

Sorry I did confuse you but I just wanted to make clear that not every conceptScheme URI has to contain the string "scheme". Generally, you can get the top concepts in JSON, the index and the JSON for each concept easily.

  • given a concept scheme URI
    • how to get its top concepts in JSON?

Dereference the schema URI with accept: application/json or application/ld+json or by appending .json and look at skos:hasTopConcept

 * how to get the index page for search?

Add a .index to the scheme URI. (Which currently is the only option. As you said in skohub-io/skohub-vocabs#114 (comment), we should also add a <link rel="search" for the index.)

  • given a concept URI
    • how to get the concept details in JSON?

Dereference the schema URI with accept: application/json or application/ld+json or by appending .json.

nichtich added a commit that referenced this issue Dec 7, 2020
@nichtich
Copy link
Member Author

nichtich commented Dec 7, 2020

Ok, a first draft of Skohub provider without search is at the skohub branch.

@stefandesu
Copy link
Member

stefandesu commented Jun 21, 2022

Skohub integration is now ready to be tested in Cocoda Dev! I've included the following vocabularies so far:

Note that vocabularies that use hash URIs are not yet supported.

@stefandesu
Copy link
Member

stefandesu commented Jun 21, 2022

To-Dos:

  • Bug for ISCED 2013 fields of education and training: When German is selected as the preferred vocabulary language in Cocoda, the search will not return any results. Reason for this is that it offers an empty index file for German results. Seems like we not only have to check if a certain index file exists, but also whether it's empty.
  • Currently, it will try multiple index file names depending on the selected language (e.g. first trying .../scheme.index, then .../scheme.de.index, then .../scheme.en.index). However, it will do this for every search. Instead it should remember whether the file exists and not try again.
  • Error handling should be improved.

@acka47
Copy link

acka47 commented Jun 21, 2022

Thanks for moving this forward. I just want to let you know that @awagner-mainz has started working on a module that adds a reconcilation endpoint to a vocab published with SkoHub: https://github.com/mpilhlt/skohub-reconcile

Furthermore, at hbz, we are in the process of hiring a SkoHub developer who will – amongst other things – help moving the reconciliation module forward in 2023/24. This basically means that Cocoda will be able to connect to SkoHub Vocabs via an API that is already supported by Cocoda (Reconcilation API), at least when people have configured their vocab to include the endpoint. Which will make things a lot easier I guess.

@stefandesu
Copy link
Member

As far as I can tell, I fixed all known issues and everything should work as expected now. (For the three listed vocabularies above.) I've also added everything that's needed to include them via BARTOC (tested only locally so far, but I don't see a reason why it shouldn't work with our main instance).

The only thing that is missing is support for vocabularies that use hash URIs. As far as I can see, that's going to be a very different implementation from what we have now. (It's probably a simple implementation, but we might as well create a whole separate provider for it instead of having two totally different code paths in our one provider. 🤔)

@nichtich: Can we postpone hash URI support?

@acka47: Are there many vocabularies that use hash URIs? Do you have other ones apart from https://nwbib.de/spatial?

@nichtich
Copy link
Member Author

Works great, thanks! I've added Skohub URIs of kdsf-ffk, esc, and isced-2013 to BARTOC so support of the vocabularies will be configured there. As far as I understand the Shohub "API" base URI is equal to the vocabulary URI, right?

Can we postpone hash URI support?

yes.

Given the issue with hash-URI and the upcoming development of ShoHub I'd mark support of Skohub in cocoda-sdk as "experimental". The search functionality works but it is more of a hack as it plugs into an internal implementation of Skohub. A stable reconciliation API would be better in the long term.

@stefandesu
Copy link
Member

As far as I understand the Shohub "API" base URI is equal to the vocabulary URI, right?

Yes, I will document that somewhere. This is required because we're de-referencing the URIs, so we need to know which of the URIs is the Skohub vocabulary URI.

@acka47
Copy link

acka47 commented Jun 23, 2022

Are there many vocabularies that use hash URIs? Do you have other ones apart from https://nwbib.de/spatial?

Currently not, but only some legacy vocabs. Re. nwbib-spatial, there also exists an always up-to-date turtle file with the whole vocab at https://nwbib.de/spatial.ttl. In case indexing a SKOS file is a better way to integrate a vocab...

@nichtich nichtich added the feature Additional functionality label Oct 4, 2022
@nichtich nichtich changed the title [Feature] Add SkohubProvider Add SkohubProvider Oct 4, 2022
@acka47
Copy link

acka47 commented Mar 17, 2023

Picking up on this as I recently got separate requests from two people (@bokahama & @timtomch) who would like to use Cocoda on SKOS Vocabs published with SkoHub. Could they use the "experimental" version for this?

With regard to a stable reconciliation API, @sroertgen is currently working on it, see skohub-io/skohub-reconcile#11, but I think it will be a few weeks before this can be used in production. We will discuss whether we could use one of these as first use cases for the endpoint.

@stefandesu
Copy link
Member

Picking up on this as I recently got separate requests from two people (@bokahama & @timtomch) who would like to use Cocoda on SKOS Vocabs published with SkoHub. Could they use the "experimental" version for this?

Sure! I mean it is already available in the release version of Cocoda, so there's nothing different about this compared to "stable" providers, only that there might still be bugs or missing features (in particular missing support for hash URIs). Some more notes here: https://gbv.github.io/cocoda-sdk/SkohubProvider.html

@bokahama and @timtomch, please let me know here if you are running into issues. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Additional functionality providers
Projects
None yet
Development

No branches or pull requests

3 participants