Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readme to run scripts #11

Merged
merged 2 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,4 @@ micromamba run -n klinker-conda python experiment.py movie-graph-benchmark-datas
```
This would be similar to the steps described in the above usage section.

In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder.
We recommend to `git checkout paper` to checkout out the tagged commit on which the experiments were run since future development does not aim to be backwards compatible with this state.
In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder. Please consult the `run_scripts/README.md` for further information. For archival purposes the experiment artifacts and the source code are stored in [Zenodo](https://zenodo.org/records/12774407).
38 changes: 38 additions & 0 deletions run_scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Installation

In order to reproduce our results clone the repository and checkout the specific tag to get the state at which the experiments where done:

```
git clone https://github.com/dobraczka/klinker.git
cd klinker
git checkout paper
```

Create a virtual environment with micromamba and install the dependencies:

```
micromamba env create -n klinker-conda --file=klinker-conda.yaml
micromamba activate klinker-conda
pip install -e ".[all]"
```

# Running the experiments
We originally used SLURM to run our experiments utilizing SLURM Job arrays. We adapted our code so it can be run without SLURM, but kept the arrays.
For each embedding based method the entries 0-15 utilize sentence transformer embeddings and 16-31 rely on SIF aggregated fasttext embeddings.
For the entries 24-31 it is expected, that you have the dimensionality reduced fasttext embeddings in `~/.data/klinker/word_embeddings/100wiki.en.bin`.
For methods without embeddings (`non_relational/run_token.sh` and `relational/run_relational_token.sh`) only the entries 0-15 exist.

You can reduce the dimensionality of the fasttext embeddings like this:
```
import fasttext
import fasttext.util

ft = fasttext.load_model('wiki.en.bin')
fasttext.util.reduce_model(ft, 100)
ft.save_model("~/.data/klinker/word_embeddings/100wiki.en.bin")
```

The experiments can then be run individually by supplying the wanted entry as first argument, e.g:
```
bash run_scripts/relational/run_token_attribute.sh 16
```
Loading