The project consists of two parts:
Our tasks include developing and optimizing predictive models based on various criteria, such as accuracy, interpretability, lift for the top 30% of cases, and cost of misclassification. We also explore clustering analysis to identify potential patterns among the students.
- Task A1: Develop an accurate predictive model.
- Task A2: Generate an explanatory predictive model, preferably a Decision Tree with 4-6 rules.
- Task A3: Develop a predictive model focused on obtaining the highest lift for the top 30% of cases.
- Task A4: Construct a model taking into account the different costs associated with misclassification.
- Task A5: Perform clustering analysis to identify three natural groups of students based on the interval and ordinal input variables.
In this part, we work with the DMABASE.CSV
dataset, using LOGSALAR
as the target variable. Our tasks involve generating predictive models, performing clustering segmentation, and constructing decision trees.
- Task B1: Generate the 'best' predictive model using Average Square Error as the model assessment measure.
- Task B2: Perform clustering segmentation and provide an explanation for the clusters.
- Task B3: Partition the continuous variable
LOGSALAR
into three intervals and generate the best decision tree.
Please follow the Jupyter notebooks in order for each task. Each notebook is self-contained and includes the plan, execution, and results for the task.
Note: In case of any discrepancies, the Jupyter notebook takes precedence.
All team members actively participated in developing and executing the project, contributing to different aspects such as data understanding, data preparation, model building, evaluation, and interpretation.
We appreciate any feedback or suggestions to improve our work.