How do autoencoders work?

NumPy | Python Methods and Functions

Autoencoders — they are models in a dataset that find low-dimensional representations using the extreme nonlinearity of neural networks. An autoencoder consists of two parts:

Encoder - This transforms the input (high-dimensional into a code that is crisp and short.
Decoder - This transforms the shortcode into a high-dimensional input.

Assume that from the data generation process, pdata (x) if X — is a collection of samples. Suppose xi & gt; & gt; n; However, do not hold any restrictions on the support structure. An example of this is for RGB images xi & gt; & gt; n × m × 3.

Here`s a simple illustration of a generic autoencoder:

For p-dimensional vector code, the parameterized function e (•) is the definition of the encoder:

Similarly, the decoder is another parameterized function d (•):

So when given an input fetch xi, a full autoencoder combined function will provide the best alternative as output :

The autoencoder is trained using a backpropagation algorithm, often based on a root mean square error cost function, the reason is that the autoencoder is usually applied via neural networks.

"

On the other hand, if you are looking at the data generation process, you can look at the parameterized conditional distribution q (•) to repeat the target:

This turns the cost function into a Kullback-Leibler divergence between pdata (•) and q (•):

Using the optimization process, pdata can be excluded since its entropy is constant. Now minimization of cross-entropy between pdata and q and divergence is. The Kullback-Leibler cost function and the root mean square error are equal. If you assume that pdata and q are Gaussian, you can change both methods of the approach.

In some cases, you can implement the Bernoulli distribution for either pdata, or for q. But this is only possible when you normalize the data range to (0, 1). This is not entirely correct on a formal note, though, how is the Bernoulli distribution binary and xi? {0, 1} d . Using output sigmoid blocks will also lead to efficient optimization of continuous samples, xi? (0, 1) d . Now the cost function will look like this:

Implementing a deep convolutional autoencoder —

Now let`s look at an example of a deep convolutional autoencoder based on TensorFlow. We will use the Olivetti Face Dataset as it is small, fit for purpose, and contains many expressions.

Step # 1: Download 400 sample grayscale 64 × 64 to prepare the workout kit:

Step # 2: Now, to speed up our calculations, we will resize them to 32x32. This will also help avoid problems with memory. We can lose a little visual fidelity. Note that you can skip this if you have a lot of computational resources.

Step # 3: Let`s define the basic constants.

 - & gt; number of epochs (nb_epochs) - & gt; batch_size - & gt; code_length - & gt; graph 

from sklearn.datasets import fetch_olivetti_faces

 

faces = fetch_olivetti_faces (shuffle = True , random_state = 1000 )

X_train = faces [ `images` ]

import tensorflow as tf

 

nb_epochs = 600

batch_size = 50

code_length = 256  

width = 32

height = 32

 

graph = tf.Graph ()

Step # 4: Use I have 50 samples per batch, we will now train the model for 600 epochs. With an image size of 64x64 = 4.096, we get a compression ratio of 4.096/256 = 16 times. You can always try different configurations to maximize convergence rate and maximum accuracy.

Step # 5: Simulate the encoder with these layers.

- & gt; 2D convolution with 16 (3x3) filters, (2x2) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 128 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.

Step # 6: The decoder achieves deconvolution (the transposition sequence of the convolutions).

- & gt; 2D transpose convolution with 128 (3x3) filters, (2x2) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 1 (3 × 3) filter, (1 × 1) strides, Sigmoid activation, and the same padding.

The loss function is based on the L2 difference between reconstructions and original images. Here Adam — an optimizer with a learning level of α = 0.001. Now let`s take a look at the coding part of the TensorFlow DAG:

import tensorflow as tf

 
with graph.as_default ():

input_images_xl = tf.placeholder (tf. float32, 

shape = ( None , X_train.shape [ 1 ],

  X_train.shape [ 2 ], 1 ))

 

input_images = tf.image.resize_images (input_images_xl,

(width, height),

  method = tf.image.ResizeMethod.BICUBIC)

 

# Encoder

conv_0 = tf .layers.conv2d (inputs = input_images,

filters = 16 ,

kernel_size = ( 3 , 3 ),

strides = ( 2 , 2 ),

  activation = tf.nn.relu,

padding = ` same` )

 

conv_1 = tf.layers.conv2d (inputs = conv_0,

filters = 32 ,

kernel_size = ( 3 , 3 ),

  activation = tf.nn.relu,

padding = `same` )

  

conv_ 2 = tf.layers.conv2d (inputs = conv_1,

filters = 64 ,

kernel_size = ( 3 , 3 ),

activation = tf.nn.relu,

  padding = ` same` )

 

  conv_3 = tf.layers.conv2d (inputs = conv_2,

  filters = 128 ,

kernel_size = ( 3 , 3 ) ,

activation = tf.nn.relu,

padding = `same` )

Below is the coding th part of the DAG:

import tensorflow as tf

 
with graph.as_default (): 

  

# Code level

code_input = tf.layers.flatten ( inputs = conv_3)

 

code_layer = tf.layers.dense (inputs = code_input,

  units = code_length,

activation = tf.nn. sigmoid)

 

code_mean = tf.reduce_mean (code_layer, axis = 1 )

Now let`s take a look at the DAG decoder:

import tensorflow as tf

  
with graph.as_default (): 

 

# Decoder

decoder_input = tf.reshape (code_layer,

( - 1 , int (width / 2 ),

int (height / 2 ), 1 ))

 

convt_0 = tf.layers.conv2d_transpose (inputs = decoder_input,

filters = 128 ,

kernel_size = ( 3 , 3 ),

strides = ( 2 , 2 ),

activation = tf.nn.relu,

padding = `same` )

 

convt_1 = tf.layers.conv2d_transpose (inputs = convt_0,

filters = 64 ,

kernel_size = ( 3 , 3 ),

activation = tf.nn.relu,

  padding = `same` )

 

  convt_2 = tf.layers.conv2d_transpose (inputs = convt_1,

filters = 32 ,

kernel_size = ( 3 , 3 ),

  activation = tf.nn.relu,

padding = `same` )

  

convt_3 = tf.layers.conv2d_transpose (inputs = convt_2,

filters = 1 ,

kernel_size = ( 3 , 3 ),

activation = tf.sigmoid,

padding = `same` )

  

  output_images = tf.image.resize_images (c onvt_3, (X_train.shape [ 1 ],

  X_train.shape [ 2 ]), 

method = tf.image.ResizeMethod.BICUBIC)

Step # 7: This is how you define the loss function and Adam`s optimizer —

import tensorflow as tf

 
with graph.as_default ():

  # Lost

loss = tf.nn.l2_loss (convt_3 - input_images)

 

  # Training step

training_step = tf.train.AdamOptimizer ( 0.001 ). minimize (loss)

Step # 8: Now that we have defined the full DAG, we can start a session and initialize all the variables.

import tensorflow as tf

 

ses sion = tf.InteractiveSession (graph = graph)

tf.global_variables_initializer (). run ()

Step # 9: We can start the learning process after initializing TensorFlow:

import numpy as np

 

for e in range (nb_epochs):

np.random.shuffle (X_train)

 

total_ loss = 0.0

  code_means = []

 

for i in range ( 0 , X_train.shape [ 0 ] - batch_size, batch_size):

X = np.expand_dims (X_train [i: i + batch_size,:,:],

  axis = 3 ). astype (np.float32)

 

_, n_loss, c_mean = session.run ([training_step, loss, code_mean],

  feed_dict = {input_images_xl: X})

 

total_loss + = n_loss

code_means.append (c_mean)

 

print ( ` Epoch {}) Average loss per sample: {} (Code mean: {}) ` .

format (e + 1 , total_loss / float (X_train.shape [ 0 ]),

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              np.mean (code_means)))

Exit:

 Epoch 1) Average loss per sample: 11.933397521972656 (Code mean: 0.5420681238174438) Epoch 2) Average loss per sample: 10.294102325439454 (Code mean: 0.4132006764411926) Epoch 3) Average loss per sample: 9.917563934326171 (Code mean: 0.38105469942092896) ... Epoch 600) Average loss per sample: 0.4635812330245972 (Code mean: 0.42368677258491516) 

When the learning process ends, .46 (given 32x32 images) — this is the average loss per sample, and 0.42 — this is the average of the codes. This proves that the encoding is relatively dense, resulting in an average of 0.5. Our goal — look at the sparsity when comparing the result.

Some sample images led to the following autoencoder output:

When the image size is increased to 64x64, the reconstruction quality is partially degraded. However, we can decrea



Get Solution for free from DataCamp guru