Skip to content

Jiwar: A calculator for orthographic, phonological and phonographic neighborhood measures. Supports 40+ languages.

License

Notifications You must be signed in to change notification settings

AlaaAlzahrani/Jiwar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jiwar: A neighborhood calculator for 40+ languages

License: GPL v3

Google Colab Python


Jiwar is an open-source Python tool for generating orthographic, phonological, and phonographic measures across 40+ languages.

Features

  • Supports 40+ languages
  • Calculates orthographic, phonological, and phonographic neighborhood measures
  • User-friendly command-line interface
  • Includes built-in and custom corpus files

Jiwar: install and use

Installing and running Jiwar on Windows via vs-code

Get Started

alt text

  • Click HERE to use an interactive Google colab notebook.
  • This online notebook helps users start using Jiwar without installing it on their devices.

Quick Start

  1. Clone the repository:

    git clone https://github.com/AlaaAlzahrani/Jiwar.git
    cd Jiwar
    
  2. Create and activate a virtual environment:

  • For Windows:

    virtualenv -p python3 venv
    .\venv\Scripts\activate.ps1
    
  • For macOS and Linux:

    python3 -m venv venv
    source venv/bin/activate
    
  1. Install dependencies:

    pip install --upgrade pip
    pip install -r requirements.txt
    
  2. Run Jiwar:

    python jiwar.py
    

Usage

  1. Prepare your input file (csv, xlsx, txt, tsv) with a 'word' column.
  2. Run python jiwar.py and follow the prompts.
  3. Select your desired language and measures.
  4. Jiwar will process your input and save the results.

Supported Measures

Measure Description
N (Neighborhood Size) Number and forms of words that differ from the target word by one letter/phoneme via substitution only
Density Number and forms of words that that differ from the target word by one letter/phoneme via substitution, addition, or deletion
OLD20/PLD20/PGLD20 Average Levenshtein distance of the 20 closest neighbors to the target word
C (Clustering Coefficient) Measures the extent to which a given word's neighbors are also neighbors of each other
Neighborhood Frequency Descriptive statistics (Mean, SD) about the frequencies of neighboring words

Supported Languages

  • Jiwar supports 40 languages with built-in corpus, and around 90 language varieties with custom corpus.
  • For languages without a built-in corpus, you'll need to provide a custom corpus to use Jiwar.

Documentation

For more detailed instructions and examples, check out our fully documented guide here:

👉 Jiwar Documentation

License

Jiwar is licensed under the GNU General Public License v3.0.

Copyright 2024 Alaa Alzahrani

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Citation

If you use Jiwar in your research, please cite:

@preprint{Alzahrani:2024:jiwar,
    title = "{Jiwar: A database and calculator for word neighborhood measures in 40 Languages}",
    author = {Alaa Alzahrani},
    year = "2024",
    note = "Preprint"
}

About

Jiwar: A calculator for orthographic, phonological and phonographic neighborhood measures. Supports 40+ languages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages