Skip to content

The project is based upon the kaggle dataset of Heart Disease UCI. The final model is generated by Random Forest Classifier algorithm, which gave an accuracy of 88.52% over the test dataset that is generated randomly choosing of 20% from the main dataset.

Notifications You must be signed in to change notification settings

surdebmalya/Heart-Disease-UCI

Repository files navigation

Heart-Disease-UCI

Heart Disease detection algorithm based upon the Kaggle Dataset on Heart Disease UCI.

Parameters

There are a total of 14 columns, the columns are described as followed:

  • age
  • sex
  • chest pain type (4 values)
  • resting blood pressure
  • serum cholestoral in mg/dl
  • fasting blood sugar > 120 mg/dl
  • resting electrocardiographic results (values 0,1,2)
  • maximum heart rate achieved
  • exercise induced angina
  • oldpeak = ST depression induced by exercise relative to rest
  • the slope of the peak exercise ST segment
  • number of major vessels (0-3) colored by flourosopy
  • thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • target, i.e. whether the patient has heart diseases or not [0 for patient who has heart diseases & 1 for no heart diseases]

Debmalya's Work

I have tried the work in four different machine learning models, i.e.

  • DecisionTreeClassifier
  • RandomForestClassifier
  • LogisticRegression
  • XGBClassifier

Among them RandomForestClassifier & LogisticRegression showed good cross validation accuracy and the standard deviations also less as compare to other two, and after further research I choose RandomForestClassifier as my final model. The hyper-parameters I used in the final model is:

final_model = RandomForestClassifier(n_estimators=60, random_state=9, 
                                     criterion='gini', max_features='sqrt',
                                     max_samples=9)

Architechture Of Final Tree

The tree generated by the final model looks like:

Final Tree Generated By Random Forest Classifier

Ouput

  • The Classification Report is as followed:
                    Classification Report :
                                  precision    recall  f1-score   support

                               0       0.89      0.86      0.88        29
                               1       0.88      0.91      0.89        32

                        accuracy                           0.89        61
                       macro avg       0.89      0.88      0.88        61
                    weighted avg       0.89      0.89      0.89        61

I have mentioned the Training Accuracy, Testing Accuracy, Sensitivity, Specificity and The AUC Score in the .ipynb file!

Result

The model has given an accuracy of 88.52% over the test dataset that is randomly generated by 20% of the main datset.

theroyakash's work

My model is based on neural network. Architecture of my model is the following:

Model: "sequential_UCI"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_77 (Dense)             (None, 16)                224       
_________________________________________________________________
dropout_41 (Dropout)         (None, 16)                0         
_________________________________________________________________
dense_78 (Dense)             (None, 8)                 136       
_________________________________________________________________
dropout_42 (Dropout)         (None, 8)                 0         
_________________________________________________________________
dense_79 (Dense)             (None, 1)                 9         
=================================================================
Total params: 369
Trainable params: 369
Non-trainable params: 0
_________________________________________________________________

Hyper-parameters for the model

  • [16, 8] Network with 16 hidden units in first layer and 8 units in next layer.
  • Dropout of 20%
  • L2 Kernel Regularizer applied in each hidden layer.
  • Optimizer: Adam() with learning rate 0.001

Accuracy

  • Sensitivity - Specificity hasn't determined yet.
  • Maximum validation accuracy 82.25%
  • Lowest Binary Cross-Entropy Loss 0.4073
  • Training Accuracy 88.02%

Accuracy with several different NN Accuracy with several different NN


About

The project is based upon the kaggle dataset of Heart Disease UCI. The final model is generated by Random Forest Classifier algorithm, which gave an accuracy of 88.52% over the test dataset that is generated randomly choosing of 20% from the main dataset.

Topics

Resources

Stars

Watchers

Forks