Text Mining Project

Text Analysis is becoming a fundamental tool in Data Science, because of the importance of parsing texts in order to extract machine-readable facts from them.

The goal of this text mining project is to accomplish three main tasks:

First Task - Data Cleaning and Pre-processing on Facebook comments:

Removing punctuation and stop words;
Tokenization of the text;
Bi-grams;
Split corpus in sentences;
Bag of words;
TF-IDF and document term matrix;
Implementation with pipelines of the previous tasks.

Second Task - Classification, Clustering and Topic Model of SMS (Spam Detection):

Classification with Logistic Regression;
K-means Clustering;
Topic Model using LDA (Latent Dirichlet Allocation);

Third Task - Summarization of a text:

Application of TextRank algorithm to summarize a text from a WW2 TextBook.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Text Mining project.ipynb		Text Mining project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Mining Project

About

Releases

Packages

Languages

emanuelemorales/TextMining

Folders and files

Latest commit

History

Repository files navigation

Text Mining Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages