ML | Clustering mean shift

Shift Meanshift falls into under the clustering algorithm category, as opposed to unsupervised learning, which iteratively assigns data points to clusters by shifting points to mode (mode — highest data point density in a region in context of Meanshift). As such, it is also known as the mode search algorithm . The mean shift algorithm finds applications in image processing and computer vision.

Given a set of data points, the algorithm iteratively assigns each data point towards the closest cluster centroid and direction to the closest cluster centroid is determined by where most of the points nearby are at. So each iteration each data point will move closer to where the most points are at, which is or will lead to the cluster center. When the algorithm stops, each point is assigned to a cluster.

Unlike the popular K-Means clustering algorithm, the average bias does not require specifying the number of clusters in advance. The number of clusters is determined by the algorithm in relation to the data.

Note . The disadvantage of Mean Shift is that it is computationally expensive O (n²).

Estimating kernel density —

The first step when applying mean shift clustering algorithms is to represent your data in a mathematical way which means representing your data as dots like the set below.

The average bias is based on the concept of kernel density estimation, this is the kind of KDE. Imagine the above data were sampled from a probability distribution. KDE — it is a technique for estimating the underlying distribution, also called the probability density function for a dataset.

It works by placing a kernel at each point in the dataset. The core — it is a fancy mathematical word for a weight function commonly used in convolution. There are many different types of kernels, but the most popular is the Gaussian kernel. Adding all the individual kernels generates an example of the density of the probability surface function. Depending on the kernel bandwidth used, the resulting density function will be different.

Below is the KDE surface for our points using a Gaussian kernel with kernel bandwidth 2.

Site area :

Outline plot:

Below is the Python implementation:

import numpy as np

import pandas as pd

from sklearn.cluster import MeanShift

from sklearn .datasets.samples_generator import make_blobs

from matplotlib import pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

 
# We`ll use the make_blobs method
# to generate our own data.

 

clusters = [[ 2 , 2  , 2 ], [ 7 , 7 , 7 ], [ 5 , 13 , 13 ]]

 

X, _ = make_blobs (n_samples = 150 , centers = clusters,

cluster_std = 0.60 )

 
# After training the model, we store
# coordinates for cluster centers

ms = MeanShift ()

ms.fit (X)

cluster_centers = ms.cluster_centers_

  
# Finally, we plot the data points
# and centroids in 3D.

fig = plt.figure ()

  

ax = fig.add_subplot ( 111 , projection = ` 3d` )

 

ax.scatter (X [:, 0 ], X [:, 1 ], X [:, 2 ], marker = `o` )

 

ax.scatter (cluster_centers [:, 0 ], cluster_centers [:, 1 ],

cluster_centers [:, 2 ], marker = ` x` , color = `red` ,

s = 300 , linewidth = 5 , zorder = 10 )

 
plt.show ()

Try the code here

Exit :

To illustrate this, suppose we are given a dataset {ui} of points in d-dimensional space, sampled from some larger population, and that we have chosen the kernel K, which has the bandwidth parameter h. Together, this data and the kernel function return the following kernel density estimator for the full population density function.

The kernel function here must satisfy the following two conditions:

– & gt; The first requirement is needed to ensure that our estimate is normalized.
– & gt; The second is associated with the symmetry of our space.

Two popular kernel functions that satisfy these conditions are defined

Below is an example in one dimension using a Gaussian kernel to estimate the density of a population along the x-axis. We can see that each sample point adds a small Gaussian to our estimate centered around it, and the above equations may seem a little intimidating, but the figure here should make it clear that the concept is pretty simple.

Search in iterative mode —

 1. Initialize random seed and window W. 2. Calculate the center of gravity (mean) of W. 3. Shift the search window to the mean. 4. Repeat Step 2 until convergence. 

General scheme of the algorithm —

for p in copied_points:

while not at_kde_peak:

p = shift (p, original_points)

The shift function looks like this —

def shift (p, origin al_points):

shift_x = float ( 0 )

shift_y = float ( 0 )

scale_factor = float ( 0 )

 

for p_temp in original_points:

  # numerator

dist = euclidean_dist (p, p_temp)

weight = kernel (dist, kernel_bandwidth)

shift_x + = p_temp [ 0 ] * weight

shift_y + = p_temp [ 1 ] * weight

  # denominator

  scale_factor + = weight

 

shift_x = shift_x / scale_factor

shift_y = shift_y / scale_factor

  return [shift_x, shift_y]

Pros :

  • Finds a variable number of modes
  • Durable to emissions
  • Generic, application-independent tool

  • Without a model, does not take any preliminary shape like spherical, elliptical, etc. In data clusters
  • Just one parameter ( window size h), where h has physical meaning (as opposed to k-means)

Cons :

  • Output depends from window size
  • Selecting window size (bandwidth) is not trivial
  • Computationally (relatively) expensive (about 2s / image)
  • Does not scale with dimensions object spaces.