Change language

Implementation of the DBSCAN algorithm using Sklearn

| |

Dense Spatial clustering of applications with noise ( DBCSAN ) — this is a clustering algorithm that was proposed in 1996. In 2014, the algorithm was awarded the "Test of Time" award at the leading Data Mining conference, KDD.

Dataset — Credit Card .

Step 1: Import Required Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

Β 

from sklearn.cluster import DBSCAN

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import normalize

from sklearn.decomposition import PCA

Step 2: Load data

X = pd.read_csv ( ’..input_path / CC_GENERAL.csv’ )

Β 
# Remove the CUST_ID column from the data

X = X.drop ( ’ CUST_ID’ , axis = 1 )

Β 
# Handling missing values ​​

X.fillna (method = ’ffill’ , inplace = True )

Β Β 

print (X.head ())

Step 3: Data preprocessing

# Scale data to bring all attributes to comparable level

scaler = StandardScaler ()

X_scaled = scaler.fit_transform (X)

Β 
# Normalize data so that
# data is roughly Gaussian

X_normalized = normalize (X_scaled)

Β Β 
# Convert numpy array to panda DataFrame

X_normalized = pd.DataFrame (X_normalized)

Step 4: Downsizing the data to make it renderable

pca = PCA (n_components = 2 )

X_principal = pca.fit_transform (X_normalized)

X_principal = pd.DataFrame (X_principal)

X_principal.columns = [ ’P1’ , ’ P2’ ]

print (X_principal.head ())

Step 5: Building the clustering model

# Numpy array of all cluster labels assigned each data point

db_default = DBSCAN ( eps = 0.0375 , min_samples = 3 ). fit (X_principal)

labels = db_default.labels_

Step 6: Visualize clustering

# Create a label for color matching

colors = {}

colors [ 0 ] = ’r’

colors [ 1 ] = ’g’

colors [ 2 ] = ’b’

colors [ - 1 ] = ’ k’

Β 
# Build a color vector for each data point

cvec = [colors [label] for label in labels]

Β Β 
# To build a legend about the plot

r = plt.scatter (X_principal [ ’P1’ ], X_principal [ ’P2’ ], color = ’ r’ );

g = plt.scatter (X_principal [ ’P1’ ], X_principal [ ’ P2’ ], color = ’g’ );

b = plt.scatter (X_principal [ ’P1’ ], X_principal [ ’ P2’ ], color = ’b’ );

k = plt.scatter (X_principal [ ’P1’ ], X_principal [ ’ P2’ ], color = ’k’ );

Β 
# P1 plotting on the X axis and P2 on the Y axis
# according to a specific color vector

plt.figure (figsize = ( 9 , 9 ))

plt.scatter (X_principal [ ’P1’ ], X_principal [ ’ P2’ ], c = cvec)

Β 
# Building a legend

plt.legend ((r , g, b, k), ( ’Label 0’ , ’Label 1’ , ’Label 2’ , ’ Label -1’ ))

Β Β 
plt.show ()

Step 7: Setting Model Parameters

db = DBSCAN (eps = 0.0375 , min_samples = 50 ). fit (X_principal)

labels1 = db.labels _

Step 8: Visualize the changes

colors1 = {}

colors1 [ 0 ] = ’r’

colors1 [ 1 ] = ’g’

colors1 [ 2 ] = ’b’

colors1 [ 3 ] = ’c’

colors1 [ 4 ] = ’y’

colors1 [ 5 ] = ’m’

colors1 [ - 1 ] = ’k’

Β Β 

cvec = [colors1 [label] for label in labels]

colors = [ ’r’ , ’g’ , ’ b’ , ’c’ , ’y’ , ’ m’ , ’ k’ ]

Β Β 

r = plt.scatter (

X_principal [ ’P1’ ], X_principal [ ’P2’ ], marker = ’o’ , color = colors [ 0 ])

g = plt.scatter (

X_principal [ ’P1’ ], X_principal [ ’ P2’ ], marker = ’o’ , color = colors [ 1 ])

b = plt.scatter (

Β  X_principal [ ’ P1’ ], X_principal [ ’P2’ ], marker = ’ o’ , color = colors [ 2 ])

c = plt.scatter (

X_principal [ ’P1’ ], X_principal [ ’ P2’ ], marker = ’o’ , color = colors [ 3 ])

y = plt.scatter (

Β  X_principal [ ’ P1’ ], X_principal [ ’P2’ ], marker = ’ o’ , color = colors [ 4 ])

m = plt.scatter (

X_principal [ ’P1’ ], X_principal [ ’P2’ ], marker = ’o’ , color = colors [ 5 ])

k = plt.scatter (

Β  X_principal [ ’ P1’ ], X_principal [ ’P2’ ], marker = ’o’ , color = colors [ 6 ])

Β 

plt.figure (figsize = ( 9 , 9 ))

plt.scatter (X_principal [ ’P1’ ], X_principal [ ’ P2’ ], c = cvec)

plt.legend (( r, g, b, c, y, m, k),

( ’Label 0’ , ’ Label 1’ , ’Label 2’ , ’ Label 3 ’ Label 4 ’,

’Label 5’ , ’ Label -1’ ),

scatterpoints = 1 ,

loc = ’upper left’ ,

Β  ncol = 3 ,

Β  fontsize = 8 )

plt.show ()

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV |Β cv2.putText () method

numpy.arctan2 () in Python

Python |Β os.path.realpath () method

Python OpenCV |Β cv2.circle () method

Python OpenCVΒ cv2.cvtColor () method

Python -Β Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method