Change language

ML | Implementing the KNN classifier using Sklearn

| | |

K-Nearest Neighbors is one of the most basic yet important classification algorithms in machine learning. It belongs to a supervised learning field and finds wide application in pattern recognition, data mining, and intrusion detection. It is widely available in real-world scenarios because it is nonparametric, meaning it does not make any basic assumptions about the distribution of the data (unlike other algorithms like GMM, which assume a Gaussian distribution of the data).

this article will demonstrate how to implement the Nearest Neighbor Classifier algorithm, using Sklearn library from Python.

Step 1: Import required libraries

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

import matplotlib.pyplot as plt 

import seaborn as sns

Step 2: Read the dataset

Step 3: Train the model

cd C: UsersDevDesktopKaggleBreast_Cancer
# Change the location of the read file to the file location

 

df = pd.read_csv ( ’data.csv’ )

  

y = df [ ’ diagnosis’ ]

X = df.drop ( ’ diagnosis’ , axis = 1 )

X = X.drop ( ’Unnamed: 32’ , axis = 1 )

X = X.drop ( ’id’ , axis = 1 )

# Separate dependent and independent

 

X_train, X_test, y_train, y_test = train_test_split (

X, y, test_size = 0.3 , random_state = 0 )

# Splitting data into training and testing data

K = []

training = []

test = []

scores = { }

 

for k in range ( 2 , 21 ):

clf = KNeighborsClassifier (n_neighbors = k)

  clf.fit (X_train, y_train)

  

training_s core = clf.score (X_train, y_train)

  test_score = clf.score (X_test, y_test)

K.append (k)

 

training.append (training_score)

test.append (test_score)

  scores [k] = [training_score, test_score]

Step 4: Model Evaluation

for keys, values ​​ in scores.items ():

print (keys, ’:’ , values)

Now let’s try to find the optimal value for "k", that is, the number of nearest neighbors .

Step 5: Graph learning and test results

ax = sns.stripplot (K, training); 

ax. set (xlabel = ’values ​​of k’ , ylabel = ’Training Score’

  
plt.show ()
# plot show function

ax = sns. stripplot (K, test); 

ax. set (xlabel = ’values ​​of k’ , ylabel = ’Test Score’ )

plt.show ()

plt.scatter (K, training, color = ’k’ )

plt.scatter (K, test, color = ’g’ )

plt.show ()
# For overlapping dot diagrams amm


From the above scatter plot we can conclude that the optimal k value would be around 5.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method