GitHub - ontologyportal/semconcor: Semantic Concordancer that uses dependency parsing and the Suggested Upper Merged Ontology (SUMO) to support linguistic analysis at a level of semantic abstraction above the original textual elements

ontologyportal / semconcor Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

Semantic Concordancer that uses dependency parsing and the Suggested Upper Merged Ontology (SUMO) to support linguistic analysis at a level of semantic abstraction above the original textual elements

www.ontologyportal.org

2 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
lib		lib
src/com/articulate/semconcor		src/com/articulate/semconcor
web		web
README.txt		README.txt
build.xml		build.xml
script.sql		script.sql
semconcor.iml		semconcor.iml
web.xml		web.xml

Repository files navigation

Introduction
============

Semantic Concordancer

Concordancers are an accepted and valuable part of the tool set of linguists and
lexicographers. They allow the user to see the context of use of a word or
phrase in a corpus. One challenge is that there may be too many results for
short phrases or common words when only a specific context is desired. However,
finding meaningful groupings of usage may be challenging or impractical if it
means enumerating long lists of possible values, such as city names. If a tool
existed that could create some semantic abstractions, it would free the
lexicographer from the need to resort to customized development of analysis
software. To address this need, we have developed a Semantic Concordancer that
uses dependency parsing and the Suggested Upper Merged Ontology (SUMO) to
support linguistic analysis at a level of semantic abstraction above the
original textual elements.
  Users of this work who publish academic papers are asked to cite

Pease, A., and Cheung, A., (2018).  Toward A Semantic Concordancer, to appear.


Linux Installation
==================

Please install https://github.com/ontologyportal/sigmakee and
https://github.com/ontologyportal/sigmanlp first.

cd ~/workspace
git clone https://github.com/ontologyportal/semconcor
cd ~/Programs
wget 'http://www.h2database.com/h2-2017-06-10.zip'
unzip h2-2017-06-10.zip

you'll need to set Indexer.JDBCstring to point to your corpus database and
modify the main() method of Indexer to read your corpus.  Currently,
the system can read from

- the FCE corpus, https://www.ilexir.co.uk/datasets/index.html
- wikipedia - http://www.evanjones.ca/software/wikipedia2text-extracted.txt.bz2 plain text of 10M words
- or our Hong Kong court judgment corpus, which is not generally available.

Note that the system is currently pretty slow, and even indexing the FCE corpus will take days

Assuming you've downloaded the FCE corpus to ~/corpora/fce-released-dataset
First create the empty database

java -cp ~/Programs/h2/bin/h2*.jar org.h2.tools.RunScript -url jdbc:h2:~/corpora/FCE -script ~/workspace/semconcor/script.sql

start the DB engine

java -jar ~/Programs/h2/bin/h2*.jar &

Then run the indexer

java -Xmx2G -cp /home/apease/workspace/semconcor/build/classes:
  /home/apease/workspace/semconcor/build/lib/*:
  /home/apease/workspace/sigmanlp/build/lib/* com.articulate.semconcor.Indexer

Finally, you can run the concordancer on the command line

java -Xmx2G -cp /home/apease/workspace/semconcor/build/classes:
  /home/apease/workspace/semconcor/build/lib/*:
  /home/apease/workspace/sigmanlp/build/lib/* com.articulate.semconcor.Searcher -i

or start it up in tomcat

$CATALINA_HOME/bin/startup.sh

and point your browser at

localhost:8080/semconcor/semconcor.jsp

About

Semantic Concordancer that uses dependency parsing and the Suggested Upper Merged Ontology (SUMO) to support linguistic analysis at a level of semantic abstraction above the original textual elements

www.ontologyportal.org

nlp ontology concordancer

Readme

Activity

Custom properties

2 stars

10 watching

0 forks

Report repository