From 7158470970f2289ac822f6c33a5f2c342fe451f7 Mon Sep 17 00:00:00 2001 From: Daniel Obraczka Date: Thu, 18 Jul 2024 15:29:06 +0200 Subject: [PATCH 1/2] Add README to run scripts --- run_scripts/README.md | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 run_scripts/README.md diff --git a/run_scripts/README.md b/run_scripts/README.md new file mode 100644 index 0000000..446ae0d --- /dev/null +++ b/run_scripts/README.md @@ -0,0 +1,38 @@ +# Installation + +In order to reproduce our results clone the repository and checkout the specific tag to get the state at which the experiments where done: + +``` +git clone https://github.com/dobraczka/klinker.git +cd klinker +git checkout paper +``` + +Create a virtual environment with micromamba and install the dependencies: + +``` +micromamba env create -n klinker-conda --file=klinker-conda.yaml +micromamba activate klinker-conda +pip install -e ".[all]" +``` + +# Running the experiments +We originally used SLURM to run our experiments utilizing SLURM Job arrays. We adapted our code so it can be run without SLURM, but kept the arrays. +For each embedding based method the entries 0-15 utilize sentence transformer embeddings and 16-31 rely on SIF aggregated fasttext embeddings. +For the entries 24-31 it is expected, that you have the dimensionality reduced fasttext embeddings in `~/.data/klinker/word_embeddings/100wiki.en.bin`. +For methods without embeddings (`non_relational/run_token.sh` and `relational/run_relational_token.sh`) only the entries 0-15 exist. + +You can reduce the dimensionality of the fasttext embeddings like this: +``` +import fasttext +import fasttext.util + +ft = fasttext.load_model('wiki.en.bin') +fasttext.util.reduce_model(ft, 100) +ft.save_model("~/.data/klinker/word_embeddings/100wiki.en.bin") +``` + +The experiments can then be run individually by supplying the wanted entry as first argument, e.g: +``` +bash run_scripts/relational/run_token_attribute.sh 16 +``` From fd0d7f5a5712a624c6e93e8190dbfa536270c36b Mon Sep 17 00:00:00 2001 From: Daniel Obraczka Date: Thu, 18 Jul 2024 22:53:50 +0200 Subject: [PATCH 2/2] Added zenodo --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index e5d7675..9a6edec 100644 --- a/README.md +++ b/README.md @@ -88,5 +88,4 @@ micromamba run -n klinker-conda python experiment.py movie-graph-benchmark-datas ``` This would be similar to the steps described in the above usage section. -In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder. -We recommend to `git checkout paper` to checkout out the tagged commit on which the experiments were run since future development does not aim to be backwards compatible with this state. +In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder. Please consult the `run_scripts/README.md` for further information. For archival purposes the experiment artifacts and the source code are stored in [Zenodo](https://zenodo.org/records/12774407).