Skip to content

KimiyaVahidMotlagh/KNN_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

KNN_Classifier

KNN is a supervised algorithm that determines a data's label, based on the k nearest training examples in the dataset. This algorithm determines the data is more likely to be in a class by the training data most similar to our test data. In this project, we programmed the KNN algorithm manually. The main focus of this project was to show how KNN classifies data.

Table of Contents

Dataset Discription & Functions

  • dataset
    The data we are dealing with has 400 points consisting of their x, y, and labels. Every class in the dataset is shown in a different color. Each point has a corresponding label to separate the classes.
    Our data is shown in the diagram below:

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.
  • Load & split train and test data
    With Pandas, you can read a CSV file and load the dataset as a dataframe. For training a KNN model we need both train and test data. We have some data to predict test dataset labels and then evaluate our code. To achieve that we used Scikit Learn's split data function.

Main Functions

  • distance_calculator()
    Machine learning data consists of vectors. KNN algorithm searches for similar training data. The most straightforward function to find the neighbors is the Euclidean distance function.
  • K_nearest_neighbour_classifier()
    By knowing the distance between two points, K nearest neighbors are the training points with the least distance. This function would find the frequent labels in the neighbors and classify the test data point as the same label.
  • accuracy_calculator()
    Because KNN is a supervised algorithm, we have the test labels. The number of correctly classified labels is divided by the total number of data.
  • data_plot()
    The plotter will show the train and test data with their class labels in a 2D coordinate diagram. You can set the inputs for both supervised and predicted labels.

Hyperpatameter Tuning(K)

Setting KNN's hyperparameter is commonly achieved by the elbow chart. For each K from 3 to 20, you can check how different the loss will be and then you can pick the best number for tuning in your scenario.

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.

For our dataset, K=13 has the least loss.

Train and Evaluation

For training the dataset, we have a function to combine previously mentioned functions. We set K as input to set the number of naibours and train our model. This function will predict a label for each test datapoint by the end of training. To test how much the model predicted correctly, we call the accuracy calculator. Our KNN predicts over 88 percent of labels correctly.