 # K Nearest Neighbors with Python | ML

The K-Nearest Neighbors (KNN) algorithm is a simple and easy to implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

The KNN algorithm assumes that things like this exist in the immediate vicinity. In other words, similar things next to each other. KNN reflects the idea of ​​similarity (sometimes called distance, proximity, or proximity) with some math we might have learned as a child, — calculating the distance between points on the graph. There are other ways to calculate distance, and one of them may be preferable depending on the problem we are solving. However, rectilinear distance (also called Euclidean distance) is a popular and familiar choice.

It is widely available in real-world scenarios because it is nonparametric, meaning it does not make any basic assumptions about the distribution of the data (unlike other algorithms, such as GMM, which assume a Gaussian distribution of the data)

This article demonstrates an illustration of K-nearest neighbors on a sample of random data using sklearn libraries .

We were given a random dataset with one function as target classes … We will try to use KNN to create a model that directly predicts the class for a new function-based data point.

Importing libraries:

Let`s first represent our data with a few functions.

Get data:

Set index_col = 0 to use the first column as index.

 ` import ` ` pandas as pd ` ` import ` ` seaborn as sns ` ` import ` ` matplotlib.pyplot as plt ` ` import ` ` numpy as np `
 ` df ` ` = ` ` pd.read_csv (` ` "Data" ` `, index_col ` ` = ` ` 0 ` `) ````   df.head () ```

Output: Standardize the variables:
Since the KNN classifier predicts the class of a given test case by determining the closest observations to it, the scale of the variables matters … Any variables that are on a large scale will have a much larger impact on the distance between observations and therefore on the KNN classifier than variables that are on a small scale.

 ` from ` ` sklearn.preprocessing ` ` import ` ` StandardScaler `   ` scaler ` ` = ` ` StandardScaler () `   ` scaler.fit (df.drop (` ` `TARGET CLASS` ` `, axis = 1 )) ```` scaled_features = scaler.transform (df.drop ( `TARGET CLASS` , axis = 1 ))   df_feat = pd.DataFrame (scaled_features, columns = df.columns [: - 1 ]) df_feat .head () ```

Output: Train the split test data and use the KNN model from the sklearn library:

 ` from ` ` sklearn.model_selection ` ` import ` ` train_test_split `   ` X_train, X_test, y_train, y_test ` ` = ` ` train_test_split (` ` ` ` scaled_features , df [` ` `TARGET CLASS` ` `], test_size ` ` = 0.30 ) ````   # Remember what we`re trying to come up with # with the model to predict # someone will be TARGET CLASS or not. # Let`s start with k = 1.   from sklearn.neighbors import KNeighborsClassifier   knn = KNeighborsClassifier (n_neighbors = 1 )   knn.fit (X_train, y_train) pred = knn.predict (X_test)   # Predictions and estimates # Let`s rate our KNN model! from sklearn.metrics import classification_report, confusion_matrix print (confusion_matrix (y_test, pred))   print (classification_report (y_test, pred)) ```

Exit:

` [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 `

Select K value :

Let`s go ahead and use the elbow method to pick a good value K

 ` error_rate ` ` = ` ` [] ` `   # Will take some time ```` ```` for < code class = "plain"> i ` ` in ` ` range ` ` ( 1 , 40 ): ````   knn = KNeighborsClassifier (n_neighbors = i) knn.fit (X_train, y_train) pred_i = knn.predict (X_test) error_rate.append (np. mean (pred_i! = y_test))   plt.figure (figsize = ( 10 , 6 )) plt.plot ( range ( 1 , 40 ), error_rate, color = `blue` , linestyle = `dashed` , marker = `o` , markerfacecolor = `red` , markersize = 10 )   plt.title ( `Error Rate vs. K Value` ) plt.xlabel ( `K` ) plt.ylabel ( ` Error Rate` ) ```

Output: Here we are we can see that after about K & gt; 15 the error rate just hovers between 0.07-0.08. Let`s retrain the model with this and check the classification report.

 ` # FIRST QUICK COMPARISON WITH OUR ORIGINAL K = 1 ` ` knn ` ` = ` ` KNeighborsClassifier (n_neighbors ` ` = ` ` 1 ` `) `   ` knn.fit (X_train, y_train) ` ` pred ` ` = ` ` knn.predict (X_test) ` ` `  ` print ` ` (` ` `WITH K = 1` ` `) ` ` print ` ` (` ` `` ` `) ` ` print ` ` (confusion_matrix (y_test, pred)) ` ` print ` ` (` ` `` ` `) ` ` print ` ` (classification_report (y_test, pred)) `     ` # NOW FROM K = 15 ` ` knn ` ` = ` ` KNeighborsClassifier (n_neighbors ` ` = ` ` 15 ` `) `   ` knn.fit (X_train, y_train) ` ` pred ` ` = ` ` knn.predict (X_test) `   print ` (` ` `WITH K = 15` ` ` ) ` ` print ` ` (` ` `` ` `) ` ` print ` ` (confusion_matrix (y_test, pred) ) ` ` print ` ` (` ` `` ` `) ` ` print ` ` (classification_report (y_test, pred) ) `

Output:

` WITH K = 1 [[133 16] [15 136]] precision recall f1-score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 WITH K = 15 [[133 16] [6 145]] precision recall f1-score support 0 0.96 0.89 0.92 149 1 0.90 0.96 0.93 151 accuracy 0.93 300 macro avg 0. 93 0.93 0.93 300 weighted avg 0.93 0.93 0.93 300 `

Large! We were able to increase the performance of our model by adjusting the best K value