K-Nearest Neighbors is one of the most basic yet important classification algorithms in machine learning. It belongs to a supervised learning field and finds wide application in pattern recognition, data mining, and intrusion detection. It is widely available in real-world scenarios because it is nonparametric, meaning it does not make any basic assumptions about the distribution of the data (unlike other algorithms like GMM, which assume a Gaussian distribution of the data).
this article will demonstrate how to implement the Nearest Neighbor Classifier algorithm, using Sklearn library from Python.
Step 1: Import required libraries
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier import matplotlib.pyplot as plt import seaborn as sns |
Step 2: Read the dataset
cd C: UsersDevDesktopKaggleBreast_Cancer # Change the location of the read file to the file location df = pd.read_csv ( ’data.csv’ ) y = df [ ’ diagnosis’ ] X = df.drop ( ’ diagnosis’ , axis = 1 ) X = X.drop ( ’Unnamed: 32’ , axis = 1 ) X = X.drop ( ’id’ , axis = 1 ) # Separate dependent and independent X_train, X_test, y_train, y_test = train_test_split ( X, y, test_size = 0.3 , random_state = 0 ) # Splitting data into training and testing data |
table>
Step 3: Train the model
K = [] training = [] test = [] scores = { } for k in range ( 2 , 21 ): clf = KNeighborsClassifier (n_neighbors = k) clf.fit (X_train, y_train) training_s core = clf.score (X_train, y_train) test_score = clf.score (X_test, y_test) K.append (k) training.append (training_score) test.append (test_score) scores [k] = [training_score, test_score] |
Step 4: Model Evaluation
for keys, values in scores.items (): print (keys, ’:’ , values) |
Now let’s try to find the optimal value for "k", that is, the number of nearest neighbors .
Step 5: Graph learning and test results
ax = sns.stripplot (K, training); ax. set (xlabel = ’values of k’ , ylabel = ’Training Score’ ) plt.show () # plot show function |
ax = sns. stripplot (K, test); ax. set (xlabel = ’values of k’ , ylabel = ’Test Score’ ) plt.show () |
plt.scatter (K, training, color = ’k’ ) plt.scatter (K, test, color = ’g’ ) plt.show () # For overlapping dot diagrams amm |
From the above scatter plot we can conclude that the optimal k value would be around 5.
Shop
Learn programming in R: courses
$
Best Python online courses for 2022
$
Best laptop for Fortnite
$
Best laptop for Excel
$
Best laptop for Solidworks
$
Best laptop for Roblox
$
Best computer for crypto mining
$
Best laptop for Sims 4
$
Latest questions
NUMPYNUMPY
psycopg2: insert multiple rows with one query
12 answers
NUMPYNUMPY
How to convert Nonetype to int or string?
12 answers
NUMPYNUMPY
How to specify multiple return types using type-hints
12 answers
NUMPYNUMPY
Javascript Error: IPython is not defined in JupyterLab
12 answers
Wiki
Python OpenCV | cv2.putText () method
numpy.arctan2 () in Python
Python | os.path.realpath () method
Python OpenCV | cv2.circle () method
Python OpenCV cv2.cvtColor () method
Python - Move item to the end of the list
time.perf_counter () function in Python
Check if one list is a subset of another in Python
Python os.path.join () method