Change language

ML | T-distributed stochastic neighbor embedding (t-SNE) algorithm

What is dimension reduction?
Dimension reduction — it is a method of representing n-dimensional data (multidimensional data with many elements) in 2 or 3 dimensions.

An example of dimensionality reduction can be discussed as a classification problem, i.e. the student will play football or not, which depends on both temperature and humidity, and can be summarized in a single basic characteristic, since both functions are highly correlated. Therefore, we can reduce the number of functions in such tasks. The problem of three-dimensional classification is difficult to imagine, and two-dimensional can be compared with a simple two-dimensional space, and the problem of one-dimensional — with a simple line.

How does t-SNE work?
The t-SNE nonlinear dimensionality reduction algorithm finds patterns in the data based on the similarity of data points to features, point similarity is calculated as the conditional probability that point A will choose point B as its neighbor. 
It then tries to minimize the difference between these conditional probabilities (or similarities) in high-dimensional and low-dimensional space to perfectly represent data points in low-dimensional space.

Space and time complexity

Applying t-SNE to the MNIST dataset

# Import required modules.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.manifold import TSNE

from sklearn.preprocessing import StandardScaler

Code # 1: Reading data

< code>

# Reading data using pandas

df = pd.read_csv ( ’mnist_train.csv’ )

  
# print the first five lines df

print (df.head ( 4 )) 

 
# save tags to l variable.

l = df [ ’ label’ ]

 
# Remove the tag and save the data pixels per d.  

d = df.drop ( "label" , axis = 1 )

Output:

Code # 2: data preprocessing

# Data preprocessing: data standardization

from sklearn.preprocessing import StandardScaler

 

standardized_data = StandardScale r (). fit_transform (data)

 

print (standardized_data.shape)

Output:

Code # 3 :

# TSNE
# Choose the best 1000 points as TSNE
# takes a long time for 15K points

data_1000 = standardized_data [ 0 : 1000 ,:]

labels_1000 = labels [ 0 : 1000 ]

 

model = TSNE (n_components = 2 , random_state = 0 )

# setting parameters
# number of components = 2
# default bewilderment = 30
# default learning rate = 200
# default Maximum number of iterations
# for optimization = 1000

  

tsne_data = model.fit_transform (data_1000)

 

 
# create a new data frame that
# help us build the results data

tsne_data = np.vstack ((tsne_data .T, labels_1000)). T

tsne_df = pd.DataFrame (data = tsne_data,

columns = ( "Dim_1" , "Dim_2" , "label" ))

  
# Building the cne result

sn.FacetGrid (tsne_df, hue = "label" , size = 6 ). map (

plt.scatter, ’Dim_1’ , ’ Dim_2’ ). add_legend ()

 
plt.show ()

Output:

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically