The project is based upon the kaggle dataset of Heart Disease UCI. The final model is generated by Random Forest Classifier algorithm, which gave an accuracy of 88.52% over the test dataset that is generated randomly choosing of 20% from the main dataset.

Heart Disease detection algorithm based upon the Kaggle Dataset on Heart Disease UCI.


There are a total of 14 columns, the columns are described as followed:

  • age
  • sex
  • chest pain type (4 values)
  • resting blood pressure
  • serum cholestoral in mg/dl
  • fasting blood sugar > 120 mg/dl
  • resting electrocardiographic results (values 0,1,2)
  • maximum heart rate achieved
  • exercise induced angina
  • oldpeak = ST depression induced by exercise relative to rest
  • the slope of the peak exercise ST segment
  • number of major vessels (0-3) colored by flourosopy
  • thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • target, i.e. whether the patient has heart diseases or not [0 for patient who has heart diseases & 1 for no heart diseases]

Debmalya's Work

I have tried the work in four different machine learning models, i.e.

  • DecisionTreeClassifier
  • RandomForestClassifier
  • LogisticRegression
  • XGBClassifier

Among them RandomForestClassifier & LogisticRegression showed good cross validation accuracy and the standard deviations also less as compare to other two, and after further research I choose RandomForestClassifier as my final model. The hyper-parameters I used in the final model is:

final_model = RandomForestClassifier(n_estimators=60, random_state=9, 
                                     criterion='gini', max_features='sqrt',

Architechture Of Final Tree

The tree generated by the final model looks like:

Final Tree Generated By Random Forest Classifier


  • The Classification Report is as followed:
                    Classification Report :
                                  precision    recall  f1-score   support

                               0       0.89      0.86      0.88        29
                               1       0.88      0.91      0.89        32

                        accuracy                           0.89        61
                       macro avg       0.89      0.88      0.88        61
                    weighted avg       0.89      0.89      0.89        61

I have mentioned the Training Accuracy, Testing Accuracy, Sensitivity, Specificity and The AUC Score in the .ipynb file!


The model has given an accuracy of 88.52% over the test dataset that is randomly generated by 20% of the main datset.

theroyakash's work

My model is based on neural network. Architecture of my model is the following:

Model: "sequential_UCI"
Layer (type)                 Output Shape              Param #   
dense_77 (Dense)             (None, 16)                224       
dropout_41 (Dropout)         (None, 16)                0         
dense_78 (Dense)             (None, 8)                 136       
dropout_42 (Dropout)         (None, 8)                 0         
dense_79 (Dense)             (None, 1)                 9         
Total params: 369
Trainable params: 369
Non-trainable params: 0

Hyper-parameters for the model

  • [16, 8] Network with 16 hidden units in first layer and 8 units in next layer.
  • Dropout of 20%
  • L2 Kernel Regularizer applied in each hidden layer.
  • Optimizer: Adam() with learning rate 0.001


  • Sensitivity - Specificity hasn't determined yet.
  • Maximum validation accuracy 82.25%
  • Lowest Binary Cross-Entropy Loss 0.4073
  • Training Accuracy 88.02%

Accuracy with several different NN Accuracy with several different NN


