The KNearest Neighbors (KNN) algorithm is a simple and easy to implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
The KNN algorithm assumes that things like this exist in the immediate vicinity. In other words, similar things next to each other. KNN reflects the idea of similarity (sometimes called distance, proximity, or proximity) with some math we might have learned as a child, — calculating the distance between points on the graph. There are other ways to calculate distance, and one of them may be preferable depending on the problem we are solving. However, rectilinear distance (also called Euclidean distance) is a popular and familiar choice.
It is widely available in realworld scenarios because it is nonparametric, mean ing it does not make any basic assumptions about the distribution of the data (unlike other algorithms, such as GMM, which assume a Gaussian distribution of the data)
This article demonstrates an illustration of Knearest neighbors on a sample of random data using sklearn libraries .
We were given a random dataset with one function as target classes ... We will try to use KNN to create a model that directly predicts the class for a new functionbased data point.
Importing libraries:
Let’s first represent our data with a few functions.
Get data:
Set index_col = 0 to use the first column as index.


Output:
Standardize the variables:
Since the KNN classifier predicts the class of a given test case by determining the closest observations to it, the scale of the variables matters ... Any variables that are on a large scale will have a much larger impact on the distance between observations and therefore on the KNN classifier than variables that are on a small scale.

Output:
Train the split test data and use the KNN model from the sklearn library:

Exit:
[[133 16] [15 136]] precision recall f1score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300
Select K value :
Let’s go ahead and use the elbow method to pick a good value K

Output:
Here we are we can see that after about K" 15 the error rate just hovers between 0.070.08. Let’s retrain the model with this and check the classification report.

Output:
WITH K = 1 [[133 16] [15 136]] precision recall f1score support 0 0.90 0.89 0.90 149 1 0.89 0.90 0.90 151 accuracy 0.90 300 macro avg 0.90 0.90 0.90 300 weighted avg 0.90 0.90 0.90 300 WITH K = 15 [[133 16] [6 145]] precision recall f1score support 0 0.96 0.89 0.92 149 1 0.90 0.96 0.93 151 accuracy 0.93 300 macro avg 0. 93 0.93 0.93 300 weighted avg 0.93 0.93 0.93 300
Large! We were able to increase the performance of our model by adjusting the best K value