  # ML | T-distributed stochastic neighbor embedding (t-SNE) algorithm

NumPy | Python Methods and Functions

What is dimension reduction?
Dimension reduction — it is a method of representing n-dimensional data (multidimensional data with many elements) in 2 or 3 dimensions.

An example of dimensionality reduction can be discussed as a classification problem, i.e. the student will play football or not, which depends on both temperature and humidity, and can be summarized in a single basic characteristic, since both functions are highly correlated. Therefore, we can reduce the number of functions in such tasks. The problem of three-dimensional classification is difficult to imagine, and two-dimensional can be compared with a simple two-dimensional space, and the problem of one-dimensional — with a simple line.

How does t-SNE work?
The t-SNE nonlinear dimensionality reduction algorithm finds patterns in the data based on the similarity of data points to features, point similarity is calculated as the conditional probability that point A will choose point B as its neighbor.
It then tries to minimize the difference between these conditional probabilities (or similarities) in high-dimensional and low-dimensional space to perfectly represent data points in low-dimensional space.

Space and time complexity

### Applying t-SNE to the MNIST dataset

 ` # Import required modules. ` ` import ` ` numpy as np ` ` import ` ` pandas as pd ` ` import ` ` matplotlib.pyplot as plt ` ` from ` ` sklearn.manifold ` ` import ` ` TSNE ` ` from ` ` sklearn.preprocessing ` ` import ` ` StandardScaler `

< code>

 ` # Reading data using pandas ` ` df ` ` = ` ` pd.read_csv (` ` `mnist_train.csv` ` `) ` ` `  ` # print the first five lines df ` ` print ` ` (df.head (` ` 4 ` `)) `   ` # save tags to l variable. ` ` l ` ` = ` ` df [` `` label` ` `] `   ` # Remove the tag and save the data pixels per d. `  ` d ` ` = ` ` df.drop (` ` "label" ` `, axis ` ` = ` ` 1 ` `) `

Output: Code # 2: data preprocessing

 ` # Data preprocessing: data standardization ` ` from ` ` sklearn.preprocessing ` ` import ` ` StandardScaler `   ` standardized_data ` ` = ` ` StandardScale r (). fit_transform (data) `   ` print ` ` (standardized_data.shape) `

Output: Code # 3 :

 ` # TSNE ` ` # Choose the best 1000 points as TSNE ` ` # takes a long time for 15K points ` ` data_1000 ` ` = ` ` standardized_data [` ` 0 ` `: ` ` 1000 ` `,:] ` ` labels_1000 ` ` = ` ` labels [ 0 : 1000 ] ``   model = TSNE (n_components = 2 , random_state = 0 ) # setting parameters # number of components = 2 # default bewilderment = 30 # default learning rate = 200 # default Maximum number of iterations # for optimization = 1000    tsne_data = model.fit_transform (data_1000)     # create a new data frame that # help us build the results data tsne_data = np.vstack ((tsne_data .T, labels_1000)). T tsne_df = pd.DataFrame (data = tsne_data, columns = ( "Dim_1" , "Dim_2" , "label" ))    # Building the cne result sn.FacetGrid (tsne_df, hue = "label" , size = 6 ). map ( plt.scatter, `Dim_1` , ` Dim_2` ). add_legend ()   plt.show () `

Output: 