Using PCA:
 It is used to find relationships between variables in the data.
 It is used to interpret and visualize data.
 As the number of variables decreases, this simplifies further analysis.
 It is often used to visualize genetic distance and the relationship between populations.
They are mainly performed on square symmetric matrix. It can be pure sum of squares and cross product matrices, or covariance matrices or correlation matrices. The correlation matrix is used if the individual variance is very different.
ATP objectives:
 This is basically an independent procedure in which it reduces the attribute space from more variables to fewer factors.
 PCA — it is essentially a process of shrinking dimensions, but there is no guarantee that the dimension can be interpreted.
 The main challenge in this PCA is to select a subset of variables from a larger set based on which the original variables have the greatest correlation with principal.
Principal Axis Method: PCA basically looks for a linear combination of variables so that we can extract the maximum variance from the variables. Once this process is complete, it removes it and looks for another linear combination that gives an explanation for the maximum fraction of the variance remaining, which mostly results in orthogonal factors. In this method, we analyze the total variance.
Eigen vector: this is a nonzero vector that remains parallel after matrix multiplication. Suppose x is an rdimensional eigenvector of an r * r matrix M if Mx and x are parallel. Then we need to solve Mx = Ax, where x and A are unknown, to get the eigenvector and eigenvalues.
In the Eigenvectors section, we can say that principal components show both the total and unique variance of a variable. Basically, it is a varianceoriented approach that aims to reproduce the total variance and correlation with all components. The main components are mostly linear combinations of input variables, weighted by their contribution to explain variance in a particular orthogonal dimension.
Eigenvalues: this is mostly known as characteristic roots. It basically measures the variance across all variables that is accounted for by this factor. The eigenvalue ratio is the ratio of the explanatory importance of factors in relation to variables. If the coefficient is low, then it contributes less to explaining the variables. In simple terms, it measures the number of variances in the total given database taken into account by a factor. We can compute the eigenvalue of a factor as the sum of its quadratic factor loading for all variables.
Now, let`s look at principal component analysis with Python.
To get the dataset used in the implementation, click here .
Step 1: Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Step 2: Import the dataset
Import the dataset and distribute the dataset X and y components for data analysis.
dataset = pd.read_csv ( ` wines.csv` )
X = dataset.iloc [:, 0 : 13 ]. values
y = dataset.iloc [:, 13 ]. values

Step 3: Splitting the dataset into training and test cases
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split (X , y, test_size = 0.2 , random_state = 0 )

Step 4: Scaling functions
Execute doing preprocessing on the training and test set, for example fitting to the standard scale.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler ()
X_train = sc.fit_transform (X_train)
X_test = sc.transform (X_test)

Step 5: Applying the PCA function
Applying the PCA function in the training and test case e. for analysis.
from sklearn.decomposition import PCA
pca = PCA (n_components = 2 )
X_train = pca.fit_transform (X_train)
X_test = pca.transform (X_test)
explained_variance = pca.explained_variance_ratio_

Step 6: Fitting the logistic regression to the training regression set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression (random_state = 0 )
classifier.fit (X_train, y_train)

Step 7. Predicting test result
y_pred = classifier.predict (X_test)

Step 8: Create a confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix (y_test, y_pred)

Step 9: Predicting the result of the training set
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid (np.arange (start = X_set [:, 0 ]. min ()  1 ,
stop = X_set [:, 0 ]. max () + 1 , step = 0.01 ),
np.arange (start = X_set [:, 1 ]. min ()  1 ,
stop = X_set [:, 1 ]. max () + 1 , step = 0.01 ))
plt.contourf (X1, X2, classifier.predict (np.array ([X1.ravel (),
X2.ravel ()]). T) .reshape (X1.shape), alpha = 0.75 ,
cmap = ListedColormap (( `yellow` , ` white` , `aquamarine` )))
plt.xlim (X1. min (), X1. max ())
plt.ylim (X2. min (), X2. max ())
for i, j in enumerate (np.unique (y_set)):
plt.scatter (X_set [y_set = = j, 0 ], X_set [y_set = = j, 1 ],
c = ListedColormap (( `red` , `green` , ` blue` )) (i), label = j)
plt.title ( `Logistic Regression (Training set)` )
plt.xla bel ( `PC1` )
plt.ylabel ( `PC2` )
plt.legend ()
plt.show ()

Step 10: Render test case results
from matplotlib .colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid (np.arange (start = X_set [:, 0 ]. min ()  1 ,
stop = X_set [:, 0 ]. max () + 1 , step = 0.01 ),
np.arange (start = X_set [:, 1 ] . min ()  1 ,
stop = X_set [:, 1 ]. max () + 1 , step = 0.01 ))
plt.contourf (X1, X2, classifier.predict (np.array ([X1.ravel (),
X2.ravel ()]). T) .reshape (X1.shape), alpha = 0.75 ,
cmap = ListedColormap (( `yellow` , ` white` , `aquamarine` )))
plt.xlim (X1. min (), X1. max ())
plt.ylim ( X2. min (), X2. max ())
for i, j in enumerate (np.unique (y_set)):
plt.scatter (X_set [y_set = = j, 0 ], X_set [y_set = = j, 1 ],
c = ListedColormap (( ` red` , `green` , ` blue` )) (i), label = j)
plt.title ( `Logistic Regression (Test set)` )
plt.xlabel ( ` PC1` )
plt.ylabel ( `PC2` )
plt.legend ()
plt.show ()
