Autoencoders — they are models in a dataset that find lowdimensional representations using the extreme nonlinearity of neural networks. An autoencoder consists of two parts:
Encoder – This transforms the input (highdimensional into a code that is crisp and short.
Decoder – This transforms the shortcode into a highdimensional input.
Assume that from the data generation process, pdata (x)
if X — is a collection of samples. Suppose xi & gt; & gt; n; However, do not hold any restrictions on the support structure. An example of this is for RGB images xi & gt; & gt; n × m × 3.
Here`s a simple illustration of a generic autoencoder:
For pdimensional vector code, the parameterized function e (•) is the definition of the encoder:
Similarly, the decoder is another parameterized function d (•):
So when given an input fetch xi, a full autoencoder combined function will provide the best alternative as output :
The autoencoder is trained using a backpropagation algorithm, often based on a root mean square error cost function, the reason is that the autoencoder is usually applied via neural networks.
On the other hand, if you are looking at the data generation process, you can look at the parameterized conditional distribution q (•) to repeat the target:
This turns the cost function into a KullbackLeibler divergence between pdata (•) and q (•):
Using the optimization process, pdata can be excluded since its entropy is constant. Now minimization of crossentropy between pdata and q and divergence is. The KullbackLeibler cost function and the root mean square error are equal. If you assume that pdata and q are Gaussian, you can change both methods of the approach.
In some cases, you can implement the Bernoulli distribution for either pdata, or for q. But this is only possible when you normalize the data range to (0, 1). This is not entirely correct on a formal note, though, how is the Bernoulli distribution binary and xi? {0, 1} d
. Using output sigmoid blocks will also lead to efficient optimization of continuous samples, xi? (0, 1) d
. Now the cost function will look like this:
Now let`s look at an example of a deep convolutional autoencoder based on TensorFlow. We will use the Olivetti Face Dataset as it is small, fit for purpose, and contains many expressions.
Step # 1: Download 400 sample grayscale 64 × 64 to prepare the workout kit:
Step # 2: Now, to speed up our calculations, we will resize them to 32×32. This will also help avoid problems with memory. We can lose a little visual fidelity. Note that you can skip this if you have a lot of computational resources.
Step # 3: Let`s define the basic constants.
 & gt; number of epochs (nb_epochs)  & gt; batch_size  & gt; code_length  & gt; graph


Step # 4: Use I have 50 samples per batch, we will now train the model for 600 epochs. With an image size of 64×64 = 4.096, we get a compression ratio of 4.096/256 = 16 times. You can always try different configurations to maximize convergence rate and maximum accuracy.
Step # 5: Simulate the encoder with these layers.
– & gt; 2D convolution with 16 (3×3) filters, (2×2) strides, ReLU activation, and the same padding.
– & gt; 2D convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
– & gt; 2D convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
– & gt; 2D convolution with 128 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
Step # 6: The decoder achieves deconvolution (the transposition sequence of the convolutions).
– & gt; 2D transpose convolution with 128 (3×3) filters, (2×2) strides, ReLU activation, and the same padding.
– & gt; 2D transpose convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
– & gt; 2D transpose convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
– & gt; 2D transpose convolution with 1 (3 × 3) filter, (1 × 1) strides, Sigmoid activation, and the same padding.
The loss function is based on the L2 difference between reconstructions and original images. Here Adam — an optimizer with a learning level of α = 0.001. Now let`s take a look at the coding part of the TensorFlow DAG:

Below is the coding th part of the DAG:

Now let`s take a look at the DAG decoder:

Step # 7: This is how you define the loss function and Adam`s optimizer —
import
tensorflow as tf
with graph.as_default ():
# Lost
loss =
tf.nn.l2_loss (convt_3

input_images)
# Training step
training_step
=
tf.train.AdamOptimizer (
0.001
). minimize (loss)
Step # 8: Now that we have defined the full DAG, we can start a session and initialize all the variables.
import
tensorflow as tf
ses sion
=
tf.InteractiveSession (graph
=
graph)
tf.global_variables_initializer (). run ()
Step # 9: We can start the learning process after initializing TensorFlow:

Exit:
Epoch 1) Average loss per sample: 11.933397521972656 (Code mean: 0.5420681238174438) Epoch 2) Average loss per sample: 10.294102325439454 (Code mean: 0.4132006764411926) Epoch 3) Average loss per sample: 9.917563934326171 (Code mean: 0.38105469942092896) ... Epoch 600) Average loss per sample: 0.4635812330245972 (Code mean: 0.42368677258491516)
When the learning process ends, .46 (given 32x32 images) — this is the average loss per sample, and 0.42 — this is the average of the codes. This proves that the encoding is relatively dense, resulting in an average of 0.5. Our goal — look at the sparsity when comparing the result.
Some sample images led to the following autoencoder output:
When the image size is increased to 64x64, the reconstruction quality is partially degraded. However, we can decrea