Skip to content

Commit

Permalink
Add readme to run scripts (#11)
Browse files Browse the repository at this point in the history
* Add README to run scripts

* Added zenodo
  • Loading branch information
dobraczka authored Jul 18, 2024
1 parent 5e35773 commit 2b1cf8c
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 2 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,4 @@ micromamba run -n klinker-conda python experiment.py movie-graph-benchmark-datas
```
This would be similar to the steps described in the above usage section.

In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder.
We recommend to `git checkout paper` to checkout out the tagged commit on which the experiments were run since future development does not aim to be backwards compatible with this state.
In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder. Please consult the `run_scripts/README.md` for further information. For archival purposes the experiment artifacts and the source code are stored in [Zenodo](https://zenodo.org/records/12774407).
38 changes: 38 additions & 0 deletions run_scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Installation

In order to reproduce our results clone the repository and checkout the specific tag to get the state at which the experiments where done:

```
git clone https://github.com/dobraczka/klinker.git
cd klinker
git checkout paper
```

Create a virtual environment with micromamba and install the dependencies:

```
micromamba env create -n klinker-conda --file=klinker-conda.yaml
micromamba activate klinker-conda
pip install -e ".[all]"
```

# Running the experiments
We originally used SLURM to run our experiments utilizing SLURM Job arrays. We adapted our code so it can be run without SLURM, but kept the arrays.
For each embedding based method the entries 0-15 utilize sentence transformer embeddings and 16-31 rely on SIF aggregated fasttext embeddings.
For the entries 24-31 it is expected, that you have the dimensionality reduced fasttext embeddings in `~/.data/klinker/word_embeddings/100wiki.en.bin`.
For methods without embeddings (`non_relational/run_token.sh` and `relational/run_relational_token.sh`) only the entries 0-15 exist.

You can reduce the dimensionality of the fasttext embeddings like this:
```
import fasttext
import fasttext.util
ft = fasttext.load_model('wiki.en.bin')
fasttext.util.reduce_model(ft, 100)
ft.save_model("~/.data/klinker/word_embeddings/100wiki.en.bin")
```

The experiments can then be run individually by supplying the wanted entry as first argument, e.g:
```
bash run_scripts/relational/run_token_attribute.sh 16
```

0 comments on commit 2b1cf8c

Please sign in to comment.