Analyzing Test Data Using K-Means Clustering in Python



matplot -lib
Let`s first render the test data with Multiple Features using the matplot-lib tool.

# import of required tools

import numpy as np

from matplotlib import pyplot as plt

 
# create two test data

X = np.random.randint ( 10 , 35 , ( 25 , 2 ))

Y = np.random.randint ( 55 , 70 , ( 25 , 2 ))

Z = np.vstack ((X, Y))

Z = Z.reshape (( 50 , 2 ))

  
# convert to np.float32

Z = np.float32 (Z)

 

plt .xlabel ( `Test Data` )

plt.ylabel ( `Z samples` )

  

plt.hist (Z, 256 , [ 0 , 256 ])

 
plt.show ()

Here & # 39; Z & # 39; — it is an array of size 100 and values ​​in the range 0 to 255. Now the shape of & # 39; z & # 39; per column vector. It will be more useful when more than one function is present. Then change the data to type np.float32.

Output:

Now apply the k-Means clustering algorithm to the same example as in the test data above and see its behavior. 
Steps included:
1) First, we need to install the test data. 
2) Define the criteria and apply kmeans (). 
3) Now split the data. 
4) Finally, fill in the data.

import numpy as np

import cv2

from matplotlib import pyplot as plt

 

X = np.random.randint ( 10 , 45 , ( 25 , 2 ))

Y = np.random.randint ( 55 70 , ( 25 , 2 ))

Z = np.vstack ((X, Y))

 
# convert to np.float32

Z = np.float32 (Z)

 
# define criteria and apply kmeans ()

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10 , 1.0 )

ret, lab el, center = cv2.kmeans (Z, 2 , None , criteria, 10 , cv2.KMEANS_RANDOM_CENTERS)

 
# Now strip the data

A = Z [label.ravel () = = 0 ]

B = Z [label.ravel () = = 1 ]

  
# Data plot

plt.scatter (A [:,  0 ], A [:, 1 ])

plt.scatter (B [:, 0 ], B [:, 1 ], c = `r` )

plt.scatter (center [:, 0 ], center [:, 1 ], s = 80 , c = `y` , marker = `s` )

plt.xlabel ( `Test Data` ), plt.ylabel ( `Z sample s` )

plt.show ()

Output:

This example is intended to illustrate where k-means creates intuitively possible clusters.

Applications :
1) Identification of cancer data. 
2) Predicting student progress. 
3) Prediction of drug activity.