Change language

K Nearest Neighbors with Python | ML

|

The K-Nearest Neighbors (KNN) algorithm is a simple and easy to implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

The KNN algorithm assumes that things like this exist in the immediate vicinity. In other words, similar things next to each other. KNN reflects the idea of ​​similarity (sometimes called distance, proximity, or proximity) with some math we might have learned as a child, — calculating the distance between points on the graph. There are other ways to calculate distance, and one of them may be preferable depending on the problem we are solving. However, rectilinear distance (also called Euclidean distance) is a popular and familiar choice.

It is widely available in real-world scenarios because it is nonparametric, mean ing it does not make any basic assumptions about the distribution of the data (unlike other algorithms, such as GMM, which assume a Gaussian distribution of the data)

This article demonstrates an illustration of K-nearest neighbors on a sample of random data using sklearn libraries .

Numpy , Pandas , sklearn

We were given a random dataset with one function as target classes ... We will try to use KNN to create a model that directly predicts the class for a new function-based data point.

Importing libraries:

Let’s first represent our data with a few functions.

Get data:

Set index_col = 0 to use the first column as index.

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

import numpy as np

df  = pd.read_csv ( "Data" , index_col = 0 )

 
df.head ()

Output:

Standardize the variables:
Since the KNN classifier predicts the class of a given test case by determining the closest observations to it, the scale of the variables matters ... Any variables that are on a large scale will have a much larger impact on the distance between observations and therefore on the KNN classifier than variables that are on a small scale.

from sklearn.preprocessing import StandardScaler

 

scaler = StandardScaler ()

 

scaler.fit (df.drop ( ’TARGET CLASS’ , axis = 1 ))

scaled_features = scaler.transform (df.drop ( ’TARGET CLASS’ , axis = 1 ))

 

df_feat = pd.DataFrame (scaled_features, columns = df.columns [: - 1 ])

df_feat .head ()

Output:

Train the split test data and use the KNN model from the sklearn library:

from sklearn.model_selection import train_test_split

 

X_train, X_test, y_train, y_test = train_test_split (

  scaled_features , df [ ’TARGET CLASS’ ], test_size = 0.30 )

 
# Remember what we’re trying to come up with
# with the model to predict
# someone will be TARGET CLASS or not.
# Let’s start with k = 1.

 

from sklearn.neighbors import KNeighborsClassifier

 

knn = KNeighborsClassifier (n_neighbors = 1 )

 
knn.fit (X_train, y_train)

pred = knn.predict (X_test)

 
# Predictions and estimates
# Let’s rate our KNN model!

from sklearn.metrics import classification_report, confusion_matrix

print (confusion_matrix (y_test, pred))

 

print (classification_report (y_test, pred))

Exit:

 [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 

Select K value :

Let’s go ahead and use the elbow method to pick a good value K

error_rate = []

  
# Will take some time

for < code class = "plain"> i in range ( 1 , 40 ):

 

knn = KNeighborsClassifier (n_neighbors = i)

knn.fit (X_train, y_train)

pred_i = knn.predict (X_test)

error_rate.append (np. mean (pred_i! = y_test))

 

plt.figure (figsize = ( 10 , 6 ))

plt.plot ( range ( 1 , 40 ), error_rate, color = ’blue’ ,

linestyle = ’dashed’ , marker = ’o’ ,

markerfacecolor = ’red’ , markersize = 10 )

 

plt.title ( ’Error Rate vs. K Value’ )

plt.xlabel ( ’K’ )

plt.ylabel ( ’ Error Rate’ )

Output:

Here we are we can see that after about K" 15 the error rate just hovers between 0.07-0.08. Let’s retrain the model with this and check the classification report.

# FIRST QUICK COMPARISON WITH OUR ORIGINAL K = 1

knn = KNeighborsClassifier (n_neighbors = 1 )

 
knn.fit (X_train, y_train)

pred = knn.predict (X_test)

  

print ( ’WITH K = 1’ )

print ( ’’ )

print (confusion_matrix (y_test, pred))

print ( ’’ )

print (classification_report (y_test, pred))

 

 
# NOW FROM K = 15

knn = KNeighborsClassifier (n_neighbors = 15 )

 
knn.fit (X_train, y_train)

pred = knn.predict (X_test)

 

print ( ’WITH K = 15’ )

print ( ’’ )

print (confusion_matrix (y_test, pred) )

print ( ’’ )

print (classification_report (y_test, pred) )

Output:

 WITH K = 1 [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 WITH K = 15 [[133 16] [6 145]] precision recall f1-score support 0 0.96 0.89 0.92 149 1 0.90 0.96 0.93 151 accuracy 0.93 300 macro avg 0. 93 0.93 0.93 300 weighted avg 0.93 0.93 0.93 300 

Large! We were able to increase the performance of our model by adjusting the best K value

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically