In this project we will classify a wine in different quality classes, using its physicochemical properties.
***
This dataset is highly unbalanced, for which the main goal will be to compare a Random Forest Classifier performance, when using data augmentation techniques (SMOTE).
To get a local copy up and running follow these simple steps.
A running installation of Anaconda. If you haven't installed Anaconda yet, you can follow the next tutorial:
Anaconda Installation
- Clone the repo
git clone https://github.com/loremendez/wine_quality.git
- Install the environment
You can do it either by loading theYML
fileor step by stepconda env create -f conda_environment.yml
- Create and activate the environment
conda create -n wine_env python=3.9 conda activate wine_env
- Install the needed packages
pip install --upgrade pip pip list # show packages installed within the virtual environment pip install numpy pandas matplotlib seaborn scikit-learn imbalanced-learn pip install jupyterlab
- Create and activate the environment
Open Jupyter-lab and open the notebook Random_Forest.ipynb
to see the classifier's performance on the original dataset, or open the notebook Random_Forest_SMOTE.ipynb
to see the classifier's performance on the augmented dataset.
jupyter-lab
[1] Dataset by Cortez, Paulo (@LSIND) “Wine Quality Data Set”. Last updated: 2020-04-27. Link UCI Machine Learning Repository: [https://archive.ics.uci.edu/ml/datasets/wine+quality) Link Kaggle: https://www.kaggle.com/datasets/yasserh/wine-quality-dataset
Lorena Mendez - LinkedIn - lorena.mendez@tum.de
Take a look into my other projects!