BRCA Breast Cancer Prediction Model Using DNN and K-means

1. Introduction

1.1 Set goals for project

Use BRCA_prognosis data to create a program that can detect gene anomalies and predict breast cancer.

① Execute test data and training data separately.

② Using gene data from patients, distinguishes between good and risky genes.

③ As a result, the program will be makes good and risky predictions of the gene.

1.2 Data description

Data structure

2. Model training with sklearn

Without preprocessing, the results of unsupervised learning

i. KNN (k=[3, 5, 7, 9, 11, 13, 15])

ii. Naive Bayesian Classification

iii. Information gain ( max_depth=[3, 5, 7, 9, 11, 13, 15] )

iv. SVM (kernel = [linear, poly, rbf, sigmoid])

v. DNN (solver=[adam, sgd, lbfgs], activation= [identity, logistic, tanh, relu])

The accuracy of DNN was the highest at about 0.85, and DNN had the highest value except the sensitivity. As a result, DNN (solver = ibfgs, activation = logistic) is the best classification.

When SVM kernel is sigmoid and DNN solver is adman and sgd, it is not classified properly.

3. Method

Preprocessing

① Edit labels array

For use in the DNN model, the Labels array is changed to a two-dimensional array, and the column vectors are replaced by row vectors.

② One-Hot-Encoding

One-Hot-Encoding is used to change the values of labels. One-Hot-Encoder is also referred to as One-of-K encoding and converts an integer scalar value having a value of 0 to K-1 into a K-dimensional vector having a value of 0 or 1.

③ Normalization

Normalize the values of the data. Normalization is a transformation to make all of the individual data the same size.

Data grouping

Divide into two groups with similar characteristics to get better results.

① Principal component analysis (PCA)

The dimension of the data is reduced to two dimensions.

① K-means

Use K-means to divide into two groups.

③ Grouping

Training

① DNN (Deep Neural Network)

The hidden layer is composed of four layers (4096, 1024, 256, 32) and the Learning_rate is set to 0.0001. I used the solver as the adam optimizer function and the activate function as relu. Train step were set to 500.

② Dropout

Avoid using some of the neurons at each learning step to prevent some features from sticking to specific neurons, balancing the weights to prevent overfitting.

Dropouts were set to 0.8.

③ Regularization

Let’s not have too big numbers in the weight. And, prevent overfitting.

Reularization was set to 0.001.

4. result

Result

① First Group & Second Group

② Sum first group, seconde group result

Accuracy was 0.88

Compare with other methods

① Not grouping, Not regularization, node(1024,256,32)

② Not grouping, Not regularization, node(4096,1024,256,32)

③ Not grouping, Use regularization, node(4096,1024,256,32)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
classification code		classification code
img		img
FInetwork2016_directed.txt		FInetwork2016_directed.txt
Figure_1.png		Figure_1.png
README.md		README.md
bioproject.ipynb		bioproject.ipynb
final_presentation.docx		final_presentation.docx
final_presentation_appendix.docx		final_presentation_appendix.docx
pca.png		pca.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BRCA Breast Cancer Prediction Model Using DNN and K-means

1. Introduction

1.1 Set goals for project

1.2 Data description

2. Model training with sklearn

3. Method

4. result

About

Releases

Packages

Languages

hwk0702/BRCA-Breast-Cancer-Prediction-Model-Using-DNN-and-K-means

Folders and files

Latest commit

History

Repository files navigation

BRCA Breast Cancer Prediction Model Using DNN and K-means

1. Introduction

1.1 Set goals for project

1.2 Data description

2. Model training with sklearn

3. Method

4. result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages