+

Weighted K-NN

Intuition:
Consider the following training set

Red marks represent 0 grade and green marks 1 grade.
Treat the white point as a query point (the point for which the class label should be predicted)

If we pass the above dataset to the kNN-based classifier, then the classifier will declare the query point to be of class 0. But it is clear from the graph that the point is closer to the points of class 1 compared to with the class. 0 points To overcome this disadvantage, a weighted kNN was used. In a weighted kNN, the nearest k points are weighted using a function called a kernel function. The intuition behind the weighted kNN is to give more weight to points that are nearby and less weight to points that are farther away. Any function can be used as a kernel function for the weight classifier knn, the value of which decreases with increasing distance. The simple function that is used is the inverse function of distance.

Algorithm :

  • Let L = {(x i , y i ), i = 1 ,. ,,, n} be the training set of observations x i with the given class y i, and let x be the new observation (query point) whose class label y is to be predicted.
  • Calculate d (x i , x) for i = 1 ,. ,,, n — distance between the query point and any other point in the training set.
  • Select D & # 39; ⊆ D, the set of k closest training data points to the query points
  • Predict the class of the query point, using distance-weighted voting. V represents class labels. Use the following formula

Implementation:
Consider 0 as a label for class 0 and 1 as a label for class 1. Below is the implementation of the weighted-kNN algorithm.

C / C ++

// C ++ program for implementation
// weighted K nearest neighbor algorithm.
# include & lt; bits / stdc ++. h & gt; 

using namespace std; 

 

struct Point 

  int val;  // Dot class

double x, y;  // Point coordinate

double distance; // Distance from reference point

}; 

 
// Used to sort the array of points by increasing
// order of weighted distance

bool comparison (Point a, Point b) 

  return (a.distance & lt; b.distance); 

 
// This function finds the classification of point p using
// weighted k nearest neighbor algorithm. Assumes only
// two groups and returns 0 if p belongs to class 0, otherwise
/ / 1 (belongs to class 1).

int weightedkNN (Point arr [], int n, int k, Point p) 

  // Fill in the weighted distances of all points from p

for ( int i = 0; i & lt; n; i ++) 

arr [i] .distance = 

( sqrt ((arr [i] .x - px) * (arr [i] .x - px) + 

(arr [i] .y - py) * (arr [i] .y - py))); 

 

// Sort points by distance weighted from p

  sort (arr, arr + n , comparison); 

 

// Now let`s look at the first k elements and only

  // two groups

double freq1 = 0;  // weighted sum of group 0

double freq2 = 0;  // group 1 weighted sum

for ( int i = 0; i & lt; k; i ++) 

if (arr [i] .val == 0) 

freq1 + = double (1 / arr [i] .distance); 

else if (arr [i] .val == 1) 

freq2 + = double (1 / arr [i] .distance); 

return (freq1 & gt; freq2? 0: 1); 

 
// Driver code

int main () 

int n = 13; // Number of data points

Point arr [n]; 

 

arr [0] .x = 0; 

arr [0] .y = 4; 

arr [0] .val = 0; 

 

arr [1] .x = 1; 

arr [1] .y = 4.9; 

arr [1] .val = 0; 

 

arr [2] .x = 1.6; 

arr [2] .y = 5.4; 

arr [2] .val = 0; 

 

arr [3] .x = 2.2; 

arr [3] .y = 6; 

arr [3] .val = 0; 

 

arr [4] .x = 2.8; 

arr [4] .y = 7; 

arr [4] .val = 0; 

 

arr [5] .x = 3.2; 

arr [5] .y = 8; 

arr [5] .val = 0; 

 

arr [6] .x = 3.4; 

arr [6] .y = 9; 

arr [6] .val = 0; 

 

arr [7] .x = 1.8; 

arr [7] .y = 1; 

arr [7] .val = 1; 

 

arr [8] .x = 2.2; 

arr [8] .y = 3; 

arr [8] .val = 1; 

 

arr [9] .x = 3; 

arr [9] .y = 4; 

arr [9] .val = 1; 

 

arr [10] .x = 4; 

arr [10] .y = 4.5; 

arr [10] .val = 1; 

 

arr [11] .x = 5; 

arr [11] .y = 5; 

arr [11] .val = 1; 

 

arr [12] .x = 6; 

arr [12] .y = 5.5; 

arr [12] .val = 1; 

 

/ * Test point * /

  Point p; 

px = 2; 

py = 4; 

 

// Parameter for defining the class of the query point

  int k = 5; 

printf ( "The value classified to query point"

" is:% d. " , weightedkNN (arr, n, k, p)); 

return 0; 

python3

# Python3 program to implement
# weighted K nearest neighbor algorithm.

 

import math 

 

def weightedkNN (points, p, k = 3 ): 

"" "

This function finds the classification of p using

weighted k nearest neighbor algorithm. Only two

two classes are assumed and returns 0 if p belongs to class 0, otherwise

1 (belongs to class 1).

 

Options -

glasses: Training glasses dictionary with two keys - 0 and 1

Each key has a list of training data points belonging to this

 

p: tuple, test point of form data (x, y)

 

k: amount nearest sos food, default 3

"" "

 

distance = [] 

for group in points: 

for feature in points [group]: 

 

# calculate Euclidean distance p from training points

euclidean_distance = math.sqrt ((featu re [ 0 ] - p [ 0 ]) * * 2 + (feature [ 1 ] - p [ 1 ]) * * 2

  

# Add the form tuple (distance, group) to the distance list

distance.append ((euclidean_distance, group)) 

 

  # sort the distance list in ascending order

  # and select the first k distances

distance = sorted (distance) [: k] 

 

freq1 = 0 # group 0 weighted sum

freq2 = 0 # group 1 weighted sum

 

  for d in distance:

if d [ 1 ] = = 0 :

freq1 + = ( 1 / d [ 0 ])

 

elif d [ 1 ] = = 1

freq2 + = ( 1 / d [ 0 ])

 

 

return 0 if freq1 & gt; freq2 else 1

 
# Driver function

def main (): 

  

  # Training point dictionary with two keys - 0 and 1

# key 0 has points belonging to class 0

# key 1 has class 1 points

 

  points = { 0 : [( 0 , 4 ), ( 1 , 4.9 ), ( 1.6 , 5.4 ), ( 2.2 , 6 ), ( 2.8 , 7 ), ( 3.2 , 8 ), ( 3.4 , 9 )], 

1 : [( 1.8 , 1 ), ( 2.2 , 3 ), ( 3 , 4 ), ( 4 , 4.5 ), ( 5 , 5 ), ( 6 , 5.5 )]} 

 

# query point p (x, y)

p = ( 2 , 4

 

Number of neighbors

k = 5

  

print ( "The value classified to query point is: {} " . format (weightedkNN (points, p, k))) 

 

if __ name__ = = `__main__`

main () 

Exit:

 The value classified to query point is: 1 

Time complexity : O (N), where N — the number of points in the training set.

Get Solution for free from DataCamp guru