Change language

# ML | Hierarchical clustering (agglomeration and divisional clustering)

There are essentially two types of hierarchical cluster analysis strategies:

1. Agglomeration clustering: also known as bottom-up or hierarchical agglomeration clustering (HAC). A structure that is more informative than the unstructured set of clusters returned by flat clustering. This clustering algorithm does not require us to specify the number of clusters in advance. Bottom-up algorithms treat all data as a singleton cluster from the start and then agglomerate pairs of clusters sequentially until all clusters are combined into a single cluster that contains all the data.

Algorithm :

` given a dataset (d  1 , d  2 , d  3 , .... d  N  ) of size N # compute the distance matrix for i = 1 to N: # as the distance matrix is ​​symmetric about # the primary diagonal so we compute only lower # part of the primary diagonal for j = 1 to i: dis_mat [i] [j] = distance [d  i , d  j ] each data point is a singleton cluster  repeat  merge the two cluster having minimum distance update the distance matrix  untill  only a single cluster remains `

Implementation of the above algorithm in Python using the scikit-learn library:

 ` from ` ` sklearn .cluster ` ` import ` ` AgglomerativeClustering ` ` import ` ` numpy as np `   ` # random dataset ` ` X ` ` = ` ` np .array ([[` ` 1 ` `, ` ` 2 ` `], [` ` 1 ` `, ` ` 4 ` `], [` ` 1 ` `, ` ` 0 ` `], ` ` [` ` 4 ` `, 2 ], [ 4 , 4 ], [ 4 , 0 ]]) ``   # the number of clusters must be mentioned here # otherwise the result will be one cluster # containing all data clustering = AgglomerativeClustering (n_clusters = 2 ). fit (X)   # print tags classes print (clustering.labels_)  `

Output:

` [1, 1, 1, 0, 0, 0] `
2. Split Clustering: also known as top-down approach. This algorithm also does not require the number of clusters to be specified in advance. Top-down clustering requires a cluster splitting method that contains all the data and continues to recursively split clusters until the individual data is split into a singleton cluster.

Algorithm :

` given a dataset (d  1 , d  2 , d  3 , .... d  N ) of size N at the top we have all data in one cluster the cluster is split using a flat clustering method eg. K-Means etc  repeat  choose the best cluster among all the clusters to split split that cluster by the flat clustering algorithm  untill  each data is in its own singleton cluster `

Hierarchical agglomeration versus divisional clustering —

• Split clustering is more complex than agglomeration clustering, because in the case of split clustering we need a flat clustering method as a "subroutine" for splitting each cluster until we have all the data that has its own singleton cluster.
• Split clustering is more efficient unless we create complete hierarchy down to individual data sheets. The time complexity of naive agglomeration clustering is O (n 3 ), because we carefully scan the N x N dist_mat for the smallest distance in each of the N-1 iterations. Using the priority queue data structure, we can reduce this complexity to O (n 2 logn) . With a few more optimizations, it can be reduced to O (n 2 ) . Whereas, for partitioning clustering with a fixed number of top levels, using an efficient flat algorithm such as K-Means, the partitioning algorithms are linear in the number of patterns and clusters.
• The partitioning algorithm is also more accurate . Agglomeration clustering makes decisions based on local patterns or neighboring points without first considering the global distribution of data. These early decisions cannot be reversed. while partitioning clustering takes global data distribution into account when making top-level partitioning decisions.

## Shop

Learn programming in R: courses

\$FREE

Best Python online courses for 2022

\$FREE

Best laptop for Fortnite

\$399+

Best laptop for Excel

\$

Best laptop for Solidworks

\$399+

Best laptop for Roblox

\$399+

Best computer for crypto mining

\$499+

Best laptop for Sims 4

\$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

PythonStackOverflow

Check if one list is a subset of another in Python

PythonStackOverflow

How to specify multiple return types using type-hints

PythonStackOverflow

Printing words vertically in Python

PythonStackOverflow

Python Extract words from a given string

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

PythonStackOverflow

Python os.path.join () method

PythonStackOverflow

Flake8: Ignore specific warning for entire file

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically