Skip to content

This repository contains the work of our data mining project

License

Notifications You must be signed in to change notification settings

MSWinds/dm-group1-project

Repository files navigation

dm-group1-project

Project Structure

The project consists of two parts:

Part A: Classification Problem

Our tasks include developing and optimizing predictive models based on various criteria, such as accuracy, interpretability, lift for the top 30% of cases, and cost of misclassification. We also explore clustering analysis to identify potential patterns among the students.

Tasks:

  • Task A1: Develop an accurate predictive model.
  • Task A2: Generate an explanatory predictive model, preferably a Decision Tree with 4-6 rules.
  • Task A3: Develop a predictive model focused on obtaining the highest lift for the top 30% of cases.
  • Task A4: Construct a model taking into account the different costs associated with misclassification.
  • Task A5: Perform clustering analysis to identify three natural groups of students based on the interval and ordinal input variables.

Part B: Regression Problem

In this part, we work with the DMABASE.CSV dataset, using LOGSALAR as the target variable. Our tasks involve generating predictive models, performing clustering segmentation, and constructing decision trees.

Tasks:

  • Task B1: Generate the 'best' predictive model using Average Square Error as the model assessment measure.
  • Task B2: Perform clustering segmentation and provide an explanation for the clusters.
  • Task B3: Partition the continuous variable LOGSALAR into three intervals and generate the best decision tree.

Usage

Please follow the Jupyter notebooks in order for each task. Each notebook is self-contained and includes the plan, execution, and results for the task.

Note: In case of any discrepancies, the Jupyter notebook takes precedence.

Contributions

All team members actively participated in developing and executing the project, contributing to different aspects such as data understanding, data preparation, model building, evaluation, and interpretation.

We appreciate any feedback or suggestions to improve our work.

About

This repository contains the work of our data mining project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages