Scripts to create statistical analysis of JSKOS data
See https://observablehq.com/@nichtich/jskos-metrics for a demo.
You need to clone the repository or copy its content to a local directory.
The scripts require jq and standard Unix command line tools (bash, sort, uniq, perl...).
To call the main script jskos-metrics
from anywhere, add a symlink from a directory in your $PATH
, e.g.:
cd ~/.local/bin/
ln -s $DIRECTORY_OF_JSKOS_METRICS/jskos-metrics jskos-metrics
Run script ./jskos-metrics
with a item type (concepts
or mappings
) and a .ndjson
file. On success the statistics are emitted in JSON format:
Metrics of concepts contain the following keys:
keys
- histogram of JSKOS field namesconceptNumber
- total number of conceptsbroaderDistribution
- histogram of number of broader termsnarrowerDistribution
- histogramm of number of narrower termsnarrowerDistributionImplicit
- histogramm of number of narrower terms, references broaderDistribution for ValuestopConceptOf
- number the top conceptstypeDistribution
- histogram of concept type URIslevelDistribution
- histogram of concepts per hierarchy level
Metrics of mappings contains the following keys:
keys
- histogram of JSKOS field namesfromSchemeDistribution
- histogram of source scheme URIstoSchemeDistribution
- histogram of target scheme URIstypeDistribution
- histogram mapping type URIscreatorNames
- histogram of creator namescreatorNumber
- histogram of number of creators per mappingcreatedPerDay
- number of mappings created per daymodifiedPerDay
- number of mappings modified per dayfromNumber
- histogram of number of source conceptstoNumber
- histogram of number of target conceptsfromConceptsCount
- number of distinct source conceptstoConceptsCount
- number of distinct target conceptsmappingURICount
- number of mapping uris
Each metric id calculated with a script of its own. Each item type has a main script that executes its scripts and emits a JSON file:
concepts/concept-metrics.sh
- concept scheme metrics, main scriptmappings/mapping-metrics.sh
- mapping metrics, main script
Run make
to execute jskos-metrics
with examples as unit test.
Directory examples
contains sample files which are also used for testing.
./jskos-metrics concepts examples/concepts.ndjson
./jskos-metrics mappings examples/mappings.ndjson
More example results are include in the jskos-data collection of vocabulary data and their data visualization at https://observablehq.com/@nichtich/jskos-metrics.
KOS metrics have best been summarized by Stock (2015).
-
Wolfgang G. Stock: (2015) Informetric Analyses of Knowledge Organization Systems (KOSs). https://arxiv.org/abs/1505.03671 (published in: C. R. Sugimoto (Ed.): Theories of Informetrics and Scholarly Communication. De Gruyter, 2015)
-
Gangemi, Catenacci, Ciaramita, & Lehmann (2005): A theoretical framework for ontology evaluation and validation. (PDF available)
-
Owens (2004): Thesaurus evaluation
Consistency checks of KOS have been implemented by
Some more background information can be found in the internal GBV wiki. We are going to also make public this drafts.
This scripts can be used without any restrictions (CC Zero).