Change language

K Nearest Neighbors with Python | ML

|

The K-Nearest Neighbors (KNN) algorithm is a simple and easy to implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

The KNN algorithm assumes that things like this exist in the immediate vicinity. In other words, similar things next to each other. KNN reflects the idea of ​​similarity (sometimes called distance, proximity, or proximity) with some math we might have learned as a child, — calculating the distance between points on the graph. There are other ways to calculate distance, and one of them may be preferable depending on the problem we are solving. However, rectilinear distance (also called Euclidean distance) is a popular and familiar choice.

It is widely available in real-world scenarios because it is nonparametric, mean ing it does not make any basic assumptions about the distribution of the data (unlike other algorithms, such as GMM, which assume a Gaussian distribution of the data)

This article demonstrates an illustration of K-nearest neighbors on a sample of random data using sklearn libraries .

We were given a random dataset with one function as target classes ... We will try to use KNN to create a model that directly predicts the class for a new function-based data point.

Importing libraries:

Let’s first represent our data with a few functions.

Get data:

Set index_col = 0 to use the first column as index.

 ` import ` ` pandas as pd ` ` import ` ` seaborn as sns ` ` import ` ` matplotlib.pyplot as plt ` ` import ` ` numpy as np `
 ` df ` ` = ` ` pd.read_csv (` ` "Data" ` `, index_col ` ` = ` ` 0 ` `) `   ` df.head () `

Output:

Standardize the variables:
Since the KNN classifier predicts the class of a given test case by determining the closest observations to it, the scale of the variables matters ... Any variables that are on a large scale will have a much larger impact on the distance between observations and therefore on the KNN classifier than variables that are on a small scale.

 ` from ` ` sklearn.preprocessing ` ` import ` ` StandardScaler `   ` scaler ` ` = ` ` StandardScaler () `   ` scaler.fit (df.drop (` ` ’TARGET CLASS’ ` `, axis ` ` = ` ` 1 ` `)) ` ` scaled_features ` ` = scaler.transform (df.drop ( ’TARGET CLASS’ , axis = 1 )) ``   df_feat = pd.DataFrame (scaled_features, columns = df.columns [: - 1 ]) df_feat .head () `

Output:

Train the split test data and use the KNN model from the sklearn library:

 ` from ` ` sklearn.model_selection ` ` import ` ` train_test_split `   ` X_train, X_test, y_train, y_test ` ` = ` ` train_test_split (` ` ` ` scaled_features , df [` ` ’TARGET CLASS’ ` `], test_size ` ` = ` ` 0.30 ` `) `   ` # Remember what we’re trying to come up with ` ` # with the model to predict ` ` # someone will be TARGET CLASS or not. ` ` # Let’s start with k = 1. `   ` from sklearn.neighbors import KNeighborsClassifier ``   knn = KNeighborsClassifier (n_neighbors = 1 )   knn.fit (X_train, y_train) pred = knn.predict (X_test)   # Predictions and estimates # Let’s rate our KNN model! from sklearn.metrics import classification_report, confusion_matrix print (confusion_matrix (y_test, pred))   print (classification_report (y_test, pred)) `

Exit:

` [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 `

Select K value :

Let’s go ahead and use the elbow method to pick a good value K

 ` error_rate ` ` = ` ` [] ` ` `  ` # Will take some time ` ` for ` < code class = "plain"> i ` in ` ` range ` ` (` ` 1 ` `, ` ` 40 ` `): `   ` knn ` ` = ` ` KNeighborsClassifier (n_neighbors ` ` = ` ` i) ` ` knn.fit (X_train, y_train) ` ` pred_i ` ` = ` ` knn.predict (X_test) ` ` error_rate.append (np. mean (pred_i! ` ` = ` ` y_test)) `   ` plt.figure (figsize ` ` = ` ` (` ` 10 ` `, ` ` 6 ` `)) ` ` plt.plot (` ` range ` ` (` ` 1 ` `, ` ` 40 ` `), error_rate, color ` ` = ` ` ’blue’ ` `, ` ` linestyle ` ` = ` ` ’dashed’ ` `, marker ` ` = ` ` ’o’ ` `, ` ` markerfacecolor ` ` = ` ` ’red’ ` `, markersize ` ` = ` ` 10 ` `) `   ` plt.title (` ` ’Error Rate vs. K Value’ ` `) ` ` plt.xlabel (` ` ’K’ ` `) ` ` plt.ylabel (` `’ Error Rate’ ` `) `

Output:

Here we are we can see that after about K" 15 the error rate just hovers between 0.07-0.08. Let’s retrain the model with this and check the classification report.

 ` # FIRST QUICK COMPARISON WITH OUR ORIGINAL K = 1 ` ` knn ` ` = ` ` KNeighborsClassifier (n_neighbors ` ` = ` ` 1 ` `) `   ` knn.fit (X_train, y_train) ` ` pred ` ` = ` ` knn.predict (X_test) ` ` `  ` print ` ` (` ` ’WITH K = 1’ ` `) ` ` print ` ` (` ` ’’ ` `) ` ` print ` ` (confusion_matrix (y_test, pred)) ` ` print ` ` (` ` ’’ ` `) ` ` print ` ` (classification_report (y_test, pred)) `     ` # NOW FROM K = 15 ` ` knn ` ` = ` ` KNeighborsClassifier (n_neighbors ` ` = ` ` 15 ` `) `   ` knn.fit (X_train, y_train) ` ` pred ` ` = ` ` knn.predict (X_test) `   ` print ` ` (` ` ’WITH K = 15’ ` ` ) ` ` print ` ` (` ` ’’ ` `) ` ` print ` ` (confusion_matrix (y_test, pred) ) ` ` print ` ` (` ` ’’ ` `) ` ` print ` ` (classification_report (y_test, pred) ) `

Output:

` WITH K = 1 [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 WITH K = 15 [[133 16] [6 145]] precision recall f1-score support 0 0.96 0.89 0.92 149 1 0.90 0.96 0.93 151 accuracy 0.93 300 macro avg 0. 93 0.93 0.93 300 weighted avg 0.93 0.93 0.93 300 `

Large! We were able to increase the performance of our model by adjusting the best K value

Shop

Learn programming in R: courses

\$

Best Python online courses for 2022

\$

Best laptop for Fortnite

\$

Best laptop for Excel

\$

Best laptop for Solidworks

\$

Best laptop for Roblox

\$

Best computer for crypto mining

\$

Best laptop for Sims 4

\$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

NUMPYNUMPY

How to specify multiple return types using type-hints

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

NUMPYNUMPY

glob exclude pattern

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

NUMPYNUMPY

Python CSV error: line contains NULL byte

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries