read string as an input, and read all abbreviation into english words
read_abbre_main.py
- reads read_abbre.pkl to clean the abbreviation
- abbre_then_replace('abbre_then_replace('i h8 yuo fuckin c-u-n-t, yuo shold dickh3adddddd fuckkkkkkk di3 @$$h0l3 - i will k1ll u @TEOTD')')
- OUTPUT >>> 'i hate you fucking cunt you should dickhead fuck die asshole i will kill u it the end of the day'
big.txt
- english words corpus contains 205k unique words (10% of badwords contains in here)
main.py
- return toxicity levels of words
these following files need to be locate the same place as the main file to read its function
- read_abbre_main.py
- reads sonar_func.py
- correct_repeatedBadWords.pkl
- New_allcalled.pkl # the file is too large to be stored in github, so I stored in Kaggle dataset
(https://www.kaggle.com/chadapamettapun/nlp-hatespeech)