  # How do autoencoders work?

NumPy | Python Methods and Functions

Autoencoders — they are models in a dataset that find low-dimensional representations using the extreme nonlinearity of neural networks. An autoencoder consists of two parts:

Encoder - This transforms the input (high-dimensional into a code that is crisp and short.
Decoder - This transforms the shortcode into a high-dimensional input.

Assume that from the data generation process, ` pdata (x) ` if X — is a collection of samples. Suppose xi & gt; & gt; n; However, do not hold any restrictions on the support structure. An example of this is for RGB images xi & gt; & gt; n × m × 3.

Here`s a simple illustration of a generic autoencoder:

For p-dimensional vector code, the parameterized function e (•) is the definition of the encoder:

Similarly, the decoder is another parameterized function d (•):

So when given an input fetch xi, a full autoencoder combined function will provide the best alternative as output :

The autoencoder is trained using a backpropagation algorithm, often based on a root mean square error cost function, the reason is that the autoencoder is usually applied via neural networks.

On the other hand, if you are looking at the data generation process, you can look at the parameterized conditional distribution q (•) to repeat the target:

This turns the cost function into a Kullback-Leibler divergence between pdata (•) and q (•):

Using the optimization process, pdata can be excluded since its entropy is constant. Now minimization of cross-entropy between pdata and q and divergence is. The Kullback-Leibler cost function and the root mean square error are equal. If you assume that pdata and q are Gaussian, you can change both methods of the approach.

In some cases, you can implement the Bernoulli distribution for either pdata, or for q. But this is only possible when you normalize the data range to (0, 1). This is not entirely correct on a formal note, though, how is the Bernoulli distribution binary and ` xi? {0, 1} d `. Using output sigmoid blocks will also lead to efficient optimization of continuous samples, ` xi? (0, 1) d `. Now the cost function will look like this:

### Implementing a deep convolutional autoencoder —

Now let`s look at an example of a deep convolutional autoencoder based on TensorFlow. We will use the Olivetti Face Dataset as it is small, fit for purpose, and contains many expressions.

Step # 1: Download 400 sample grayscale 64 × 64 to prepare the workout kit:

Step # 2: Now, to speed up our calculations, we will resize them to 32x32. This will also help avoid problems with memory. We can lose a little visual fidelity. Note that you can skip this if you have a lot of computational resources.

Step # 3: Let`s define the basic constants.

` - & gt; number of epochs (nb_epochs) - & gt; batch_size - & gt; code_length - & gt; graph `

 ` from ` ` sklearn.datasets ` ` import ` ` fetch_olivetti_faces `   ` faces ` ` = ` ` fetch_olivetti_faces (shuffle ` ` = ` ` True ` `, random_state ` ` = ` ` 1000 ` `) ` ` X_train ` ` = ` ` faces [ ` ` `images` ` `] `
 ` import ` ` tensorflow as tf `   ` nb_epochs ` ` = ` ` 600 ` ` batch_size ` ` = ` ` 50 `` code_length = 256   width = 32 height = 32   graph = tf.Graph () `

Step # 4: Use I have 50 samples per batch, we will now train the model for 600 epochs. With an image size of 64x64 = 4.096, we get a compression ratio of 4.096/256 = 16 times. You can always try different configurations to maximize convergence rate and maximum accuracy.

Step # 5: Simulate the encoder with these layers.

- & gt; 2D convolution with 16 (3x3) filters, (2x2) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D convolution with 128 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.

Step # 6: The decoder achieves deconvolution (the transposition sequence of the convolutions).

- & gt; 2D transpose convolution with 128 (3x3) filters, (2x2) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
- & gt; 2D transpose convolution with 1 (3 × 3) filter, (1 × 1) strides, Sigmoid activation, and the same padding.

The loss function is based on the L2 difference between reconstructions and original images. Here Adam — an optimizer with a learning level of α = 0.001. Now let`s take a look at the coding part of the TensorFlow DAG:

 ` import ` ` tensorflow as tf `   ` with graph.as_default (): ` ` input_images_xl ` ` = ` ` tf.placeholder (tf. float32, ` ` shape ` ` = ` ` (` ` None ` `, X_train.shape [` ` 1 ` `], ` ` ` ` X_train.shape [` ` 2 ` `], ` ` 1 ` `)) ``   input_images = tf.image.resize_images (input_images_xl, (width, height),   method = tf.image.ResizeMethod.BICUBIC)   # Encoder conv_0 = tf .layers.conv2d (inputs = input_images, filters = 16 , kernel_size = ( 3 , 3 ), strides = ( 2 , 2 ),   activation = tf.nn.relu, padding = ` same` )   conv_1 = tf.layers.conv2d (inputs = conv_0, filters = 32 , kernel_size = ( 3 , 3 ),   activation = tf.nn.relu, padding = `same` )    conv_ 2 = tf.layers.conv2d (inputs = conv_1, filters = 64 , kernel_size = ( 3 , 3 ), activation = tf.nn.relu,   padding = ` same` )   ``   conv_3 ` ` = ` ` tf.layers.conv2d (inputs ` ` = ` ` conv_2, ` ` ` ` filters ` ` = ` ` 128 ` `, ` ` kernel_size ` ` = ` ` (` ` 3 ` `, ` ` 3 ` `) , ` ` activation ` ` = tf.nn.relu, `` padding = `same` ) `

Below is the coding th part of the DAG:

 ` import ` ` tensorflow as tf `   ` with graph.as_default (): ` ` `  ` # Code level ` ` code_input ` ` = ` ` tf.layers.flatten ( inputs ` ` = ` ` conv_3) `   ` code_layer ` ` = ` ` tf.layers.dense (inputs ` ` = ` ` code_input, ` `  units = code_length, `` activation = tf.nn. sigmoid)   code_mean = tf.reduce_mean (code_layer, axis = 1 ) `

Now let`s take a look at the DAG decoder:

 ` import ` ` tensorflow as tf ` ` `  ` with graph.as_default (): `   ` # Decoder ` ` decoder_input ` ` = ` ` tf.reshape (code_layer, ` ` (` ` - ` ` 1 ` `, ` ` int ` ` (width ` ` / ` ` 2 ` `), ` ` int ` ` (height ` ` / ` ` 2 ` `), ` ` 1 ` `)) `   ` convt_0 ` ` = ` ` tf.layers.conv2d_transpose (inputs ` ` = ` ` decoder_input, ` ` filters ` ` = ` ` 128 ` `, ` ` kernel_size ` ` = ` ` (` ` 3 ` `, ` ` 3 ` `), ` ` strides ` ` = ` ` (` ` 2 ` `, ` ` 2 ` `), ` ` activation ` ` = ` ` tf.nn.relu, `` `` padding ` ` = ` ` `same` ` `) `   ` convt_1 ` ` = ` ` tf.layers.conv2d_transpose (inputs ` ` = convt_0, `` filters = 64 , kernel_size = ( 3 , 3 ), activation = tf.nn.relu,   padding = `same` )     convt_2 = tf.layers.conv2d_transpose (inputs = convt_1, filters = 32 , kernel_size = ( 3 , 3 ),   activation = tf.nn.relu, padding = `same` )    convt_3 = tf.layers.conv2d_transpose (inputs = convt_2, filters = 1 , kernel_size = ( 3 , 3 ), activation = tf.sigmoid, padding = `same` )      output_images = tf.image.resize_images (c onvt_3, (X_train.shape [ 1 ],   X_train.shape [ 2 ]),  method = tf.image.ResizeMethod.BICUBIC) `

Step # 7: This is how you define the loss function and Adam`s optimizer —

` `

` import tensorflow as tf   with graph.as_default ():   # Lost loss = tf.nn.l2_loss (convt_3 - input_images)     # Training step training_step = tf.train.AdamOptimizer ( 0.001 ). minimize (loss) `

Step # 8: Now that we have defined the full DAG, we can start a session and initialize all the variables.

` `

 ` import ` ` tensorflow as tf `   ` ses sion ` ` = ` ` tf.InteractiveSession (graph ` ` = ` ` graph) ` ` tf.global_variables_initializer (). run () `
` `

` `

Step # 9: We can start the learning process after initializing TensorFlow:

 ` import ` ` numpy as np `   ` for ` ` e ` ` in ` ` range ` ` (nb_epochs): ` ` np.random.shuffle (X_train) `   ` total_ loss ` ` = ` ` 0.0 ` `  code_means = [] ``   for i in range ( 0 , X_train.shape [ 0 ] - batch_size, batch_size): X = np.expand_dims (X_train [i: i + batch_size,:,:],   axis = 3 ). astype (np.float32)   _, n_loss, c_mean = session.run ([training_step, loss, code_mean],   feed_dict = {input_images_xl: X})   total_loss + = n_loss code_means.append (c_mean)   print ( ` Epoch {}) Average loss per sample: {} (Code mean: {}) ` . format (e + 1 , total_loss / float (X_train.shape [ 0 ]),                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               np.mean (code_means))) `

Exit:

` Epoch 1) Average loss per sample: 11.933397521972656 (Code mean: 0.5420681238174438) Epoch 2) Average loss per sample: 10.294102325439454 (Code mean: 0.4132006764411926) Epoch 3) Average loss per sample: 9.917563934326171 (Code mean: 0.38105469942092896) ... Epoch 600) Average loss per sample: 0.4635812330245972 (Code mean: 0.42368677258491516) `

When the learning process ends, .46 (given 32x32 images) — this is the average loss per sample, and 0.42 — this is the average of the codes. This proves that the encoding is relatively dense, resulting in an average of 0.5. Our goal — look at the sparsity when comparing the result.

Some sample images led to the following autoencoder output:

When the image size is increased to 64x64, the reconstruction quality is partially degraded. However, we can decrea