  # Principal component analysis with Python

NumPy | Python Methods and Functions

Using PCA:

• It is used to find relationships between variables in the data.
• It is used to interpret and visualize data.
• As the number of variables decreases, this simplifies further analysis.
• It is often used to visualize genetic distance and the relationship between populations.

They are mainly performed on square symmetric matrix. It can be pure sum of squares and cross product matrices, or covariance matrices or correlation matrices. The correlation matrix is ​​used if the individual variance is very different.

ATP objectives:

• This is basically an independent procedure in which it reduces the attribute space from more variables to fewer factors.
• PCA — it is essentially a process of shrinking dimensions, but there is no guarantee that the dimension can be interpreted.
• The main challenge in this PCA is to select a subset of variables from a larger set based on which the original variables have the greatest correlation with principal.

Principal Axis Method: PCA basically looks for a linear combination of variables so that we can extract the maximum variance from the variables. Once this process is complete, it removes it and looks for another linear combination that gives an explanation for the maximum fraction of the variance remaining, which mostly results in orthogonal factors. In this method, we analyze the total variance.

Eigen vector: this is a nonzero vector that remains parallel after matrix multiplication. Suppose x is an r-dimensional eigenvector of an r * r matrix M if Mx and x are parallel. Then we need to solve Mx = Ax, where x and A are unknown, to get the eigenvector and eigenvalues.
In the Eigenvectors section, we can say that principal components show both the total and unique variance of a variable. Basically, it is a variance-oriented approach that aims to reproduce the total variance and correlation with all components. The main components are mostly linear combinations of input variables, weighted by their contribution to explain variance in a particular orthogonal dimension.

Eigenvalues: this is mostly known as characteristic roots. It basically measures the variance across all variables that is accounted for by this factor. The eigenvalue ratio is the ratio of the explanatory importance of factors in relation to variables. If the coefficient is low, then it contributes less to explaining the variables. In simple terms, it measures the number of variances in the total given database taken into account by a factor. We can compute the eigenvalue of a factor as the sum of its quadratic factor loading for all variables.

Now, let`s look at principal component analysis with Python.

Step 1: Importing Libraries

 ` # import required libraries ` ` import ` ` numpy as np ` ` import ` ` matplotlib.pyplot as plt ` ` import ` ` pandas as pd `

Step 2: Import the dataset

Import the dataset and distribute the dataset X and y components for data analysis.

 ` # import or load dataset ` ` dataset ` ` = ` ` pd.read_csv (` `` wines.csv` ` `) `   ` # distribution of the dataset across the two components X and Y ` ` X ` ` = ` ` dataset.iloc [:, ` ` 0 ` `: ` ` 13 ` `]. values ​​` ` y ` ` = ` ` dataset.iloc [:, ` ` 13 ` `]. values `

Step 3: Splitting the dataset into training and test cases

 ` # Separate X and Y by ` ` # Training and Testing Kit ` ` from ` ` sklearn.model_selection ` ` import ` ` train_test_split `   ` X_train, X_test, y_train, y_test ` ` = ` ` train_test_split (X , y, test_size ` ` = ` ` 0.2 ` `, random_state ` ` = ` ` 0 ` `) `

Step 4: Scaling functions

Execute doing preprocessing on the training and test set, for example fitting to the standard scale.

 ` # preprocessor execution ` ` from ` ` sklearn.preprocessing ` ` import ` ` StandardScaler ` ` sc ` ` = ` ` StandardScaler () `   ` X_train ` ` = ` ` sc.fit_transform (X_train) ` ` X_test ` ` = ` ` sc.transform (X_test) `

Step 5: Applying the PCA function

Applying the PCA function in the training and test case e. for analysis.

 ` # Using the PCA function in training ` ` # and test suite X- component ` ` from ` ` sklearn.decomposition ` ` import ` ` PCA `   ` pca ` ` = ` ` PCA (n_components ` ` = ` ` 2 ` `) `   ` X_train ` ` = ` ` pca.fit_transform (X_train) ` ` X_test ` ` = ` ` pca.transform (X_test) `   ` explained_variance ` ` = ` ` pca.explained_variance_ratio_ `` `

Step 6: Fitting the logistic regression to the training regression set

 ` # Fitting the logistic regression to the training set ` ` from ` ` sklearn.linear_model ` ` import ` ` LogisticRegression `   ` classifier ` ` = LogisticRegression (random_state = 0 ) `` classifier.fit (X_train, y_train) ` Step 7. Predicting test result

 ` # Predict test result using ` ` # predictive function in LogisticRegression ` ` y_pred ` ` = ` ` classifier.predict (X_test) `

Step 8: Create a confusion matrix

 ` # creating confusion between ` ` # test case Y and predicted value. ` ` from ` ` sklearn.metrics ` import ` confusion_matrix `   ` cm ` ` = ` ` confusion_matrix (y_test, y_pred) `

Step 9: Predicting the result of the training set

 ` # Predicting training set ` ` # scatter plot result ` ` from ` ` matplotlib.colors ` ` import ` ` ListedColormap `   ` X_set, y_set ` ` = ` ` X_train, y_train ` ` X1, X2 ` ` = ` ` np.meshgrid (np.arange (start ` ` = ` ` X_set [:, ` ` 0 ` `]. ` ` min () - 1 , `` stop = X_set [:, 0 ]. max () + 1 , step = 0.01 ), np.arange (start = X_set [:, 1 ]. min () - 1 , stop = X_set [:, 1 ]. max () + 1 , step = 0.01 ))   plt.contourf (X1, X2, classifier.predict (np.array ([X1.ravel (), X2.ravel ()]). T) .reshape (X1.shape), alpha = 0.75 , cmap = ListedColormap (( `yellow` , ` white` , `aquamarine` )))    plt.xlim (X1. min (), X1. max ()) plt.ylim (X2. min (), X2. max ())   for i, j in enumerate (np.unique (y_set)):   plt.scatter (X_set [y_set = = j, 0 ], X_set [y_set = = j, 1 ], c = ListedColormap (( `red` , `green` , ` blue` )) (i), label = j)   plt.title ( `Logistic Regression (Training set)` ) plt.xla bel ( `PC1` ) # for Xlabel plt.ylabel ( `PC2` ) # for Ylabel plt.legend () # show legend   # show scatter plot plt.show () ` Step 10: Render test case results

 ` # Visualize test case results from a scatter plot ` ` from ` ` matplotlib .colors ` ` import ` ` ListedColormap `   ` X_set, y_set ` ` = ` ` X_test, y_test `   ` X1, X2 ` ` = ` ` np.meshgrid (np.arange (start ` ` = ` ` X_set [:, ` ` 0 ` `]. ` ` min ` ` () ` ` - ` ` 1 ` `, ` ` stop ` ` = ` ` X_set [:, ` ` 0 ` `]. ` ` max ` ` () ` ` + ` ` 1 ` `, step ` ` = ` ` 0.01 ` `), ` ` np.arange (start ` ` = ` ` X_set [:, ` ` 1 ` `] . ` ` min ` ` () ` ` - ` ` 1 ` `, ` ` ` ` stop ` ` = ` ` X_set [:, ` ` 1 ` `]. ` ` max ` ` () ` ` + ` ` 1 ` `, step ` ` = ` ` 0.01 ` `)) ` `   plt.contourf (X1, X2, classifier.predict (np.array ([X1.ravel (), `` X2.ravel ()]). T) .reshape (X1.shape), alpha = 0.75 ,   cmap = ListedColormap (( `yellow` , ` white` , `aquamarine` )))     plt.xlim (X1. min (), X1. max ()) plt.ylim ( X2. min (), X2. max ())   for i, j in enumerate (np.unique (y_set)):   plt.scatter (X_set [y_set = = j, 0 ], X_set [y_set = = j, 1 ], c = ListedColormap (( ` red` , `green` , ` blue` )) (i), label = j)   # title for the bitmap plt.title ( `Logistic Regression (Test set)` )  plt.xlabel ( ` PC1` ) # for Xlabel plt.ylabel ( `PC2` ) # for Ylabel plt.legend ()   # show scatter plot plt.show () ` 