Generative Adversarial Network (GAN)

Why were GANs designed in the first place?
It has been observed that most major neural networks can be easily tricked into misclassifying, adding only a small amount of noise to the raw data. Surprisingly, the model, after adding noise, has a higher confidence in the wrong prediction than when it was predicted correctly. The reason for this adversary is that most machine learning models learn from a limited amount of data, which is a huge disadvantage as it tends to overfit. In addition, the mapping between input and output is almost linear. While it may seem that the boundaries between the different classes are linear, in reality they are made up of linearities, and even a small change in a point in the feature space can lead to misclassification of the data.

How does GAN work?

Generative adversarial networks (GANs) can be broken down into three parts:

  • Generative: Examine the generative model, which describes how the data is generated in terms of a probabilistic model.
  • Competition: The model is trained in a competitive environment.
  • Networks: Use deep neural networks as artificial intelligence (AI) algorithms for learning purposes.

The GAN has a generator and a discriminator . The generator generates fake data samples (be it image, audio, etc.) and tries to trick the Discriminator. A discriminator, on the other hand, tries to distinguish between real and fake samples. The Generator and Discriminator are Neural Networks, and they both compete with each other during the training phase. The steps are repeated several times, and in this the Generator and Discriminator become better and better at their work after each repetition. The work can be visualized using the diagram below:

Here the generative model captures the distribution of the data and is trained like this in a way that tries to maximize the likelihood that the discriminator will make a mistake. The discriminator, on the other hand, is based on a model that estimates the likelihood that the resulting sample will come from the training data rather than from a generator.
GANs are formulated as a minimax game in which the Discriminator tries to minimize its reward V (D, G), and the Generator tries to minimize the Discriminator`s reward, or, in other words, maximize its loss. This can be mathematically described by the formula below:

G = generator
D = discriminator
Pdata (x) = distribution of real data
P (z) = distribution of generator
x = sample from Pdata (x)
z = sample from P (z)
D (x) = discriminator network
G (z) = generator network

So basically, GAN training consists of two parts:

  • Part 1: The Discriminator is trained when the Generator is idle. At this stage, the network propagates only in the forward direction and does not propagate backward. The discriminator learns to work with real data for n epochs and checks if it can correctly predict it as real. In addition, at this stage, the Discriminator also learns to work with fake data generated by the Generator and checks if it can correctly predict it as fake.
  • Part 2. The Generator is trained while no discriminator is used. Once the Discriminator is trained with the Generator`s generated fake data, we can get its predictions and use the results to train the Generator, as well as improve our previous state to try to cheat the Discriminator.
  • The above method is repeated over several epochs. and then manually checks the fake data if it appears to be genuine. If this seems acceptable, then the learning is stopped, otherwise it is allowed to continue for several more eras.

    Different types of GANs:
    Currently, GANs are a very active research topic. and there are many different types of GAN implementations. Some of the important ones that are actively used today are described below:

  1. Vanilla GAN: This is the simplest type of GAN. Here Generator and Discriminator are simple multilayer perceptrons. In vanilla GAN, the algorithm is really simple, it tries to optimize a mathematical equation using stochastic gradient descent.
  2. Conditional GAN ​​(CGAN): CGAN can be described as a deep learning method in which some conditional parameters. In CGAN, an additional generator & # 39; y & # 39; added to Generator to generate relevant data. Labels are also inserted into the input to the discriminator to help the discriminator distinguish real data from fake data.
  3. Deep Convolutional GAN ​​(DCGAN): DCGAN is one of the most popular and most successful implementations GAN. It consists of ConvNets instead of multilayer perceptrons. ConvNets are implemented without a maximum pool, which is effectively replaced by a convolutional step. Also, the layers are not completely connected.
  4. Laplace GAN (LAPGAN): Laplace Pyramid — it is a linear reversible image representation consisting of a set of stripe images spaced by an octave plus a low frequency residual. This approach uses several Generator and Discriminator network numbers and different levels of the Laplacian pyramid. This approach is mainly used because it produces very high quality images. The image is first downsampled on each layer of the pyramid and then upscaled again on each layer in a reverse pass where the image receives some noise from the Conditional GAN ​​in those layers until it reaches its original size.
  5. Super Resolution GAN (SRGAN): SRGAN, as the name suggests, is a way to create a GAN that uses a deep neural network in conjunction with an adversarial network to produce higher resolution images. This type of GAN is especially useful for optimally scaling low-resolution native images to improve detail while minimizing errors.

Sample Python code that implements an adversarial network generator:
GANs are very computationally expensive. They require powerful GPUs and a lot of time (many epochs) to get good results. In our example, we`ll use the well-known MNIST dataset and use it to create a clone of a random digit.

# import required libraries and MNIST dataset

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.examples.tutorials.mnist import input_data


mnist = input_data.read_data_sets ( "MNIST_data " )

# defining features for two networks.
# Both networks have two hidden layers
# and an output layer that is dense or
# fully related layers defining
# Generator network function

def generator (z, reuse = None ):

with tf.variable_scope ( ` gen` , reuse = reuse):

hidden1 = tf.layers.dense (inputs = z, units = 128

activation = tf.nn.leaky_relu)


hidden2 = tf.layers.dense ( inputs = hidden1,

units = 128 , activation = tf.nn.leaky_relu)


  output = tf.layers.dense (inputs = hidden2, 

  units = 784 , activation = tf.nn.tanh)


return output

# Discriminator network function definition

def discriminator (X, reuse = None ):

with tf.variable_scope ( `dis` , reuse = reuse):

  hidden1 = tf.layers.dense (inputs = X, units = 128 ,

activation = tf.nn.leaky_relu)


hidden2 = tf.layers.dense (inputs = hidden1,

  units = 128 , activation = tf.nn.leaky_relu)


logits = tf.layers.dense (hidden2, units = 1 )

output = tf.sigmoid (logits)


return output, logits

# create exit placeholders
tf.reset_default_graph ()


real_images = tf.placeholder (tf.float32, shape = [ None , 784 ])

z = tf.placeholder (tf.float32, shape = [ None , 100 ])


G = generator (z)

D_output_real, D_logits_real = discriminator (real_images)

D_output_fake, D_logits_fake = discriminator (G, reuse = True )

# loss function definition

def loss_func (logits_in, labels_in):

  return tf.reduce_mean (tf.nn.sigmoid_cross_entropy_with_logits (

logits = logits_in, labels = labels_in))


# Anti-aliasing for generalization

D_real_loss = loss_func (D_logits_real, tf.ones_like (D_logits_real) * 0.9 )

D_fake_loss = loss_func (D_logits_fake, tf.zeros_like (D_logits_real))

D_loss = D_real_loss + D_fake_loss


G_loss = loss_func (D_logits_fake, tf.ones_like (D_logits_fake))

# determining learning rate, batch size,
# number of epochs and using the optimizer Adam

lr = 0.001 # learning rate

# Do this when multiple networks
# interact with each other friend

# returns all created variables (two
# scope variables) and makes the learner true

tvars = tf.trainable_variables () 

d_vars = [var for var in tvars if ` dis` in]

g_vars = [var for var in tvars if `gen` in]


D_trainer = tf.train.AdamOptimizer (lr) .minimize (D_loss, var_list = d_vars)

G_trainer = tf.train.AdamOptimizer (lr) .minimize (G_loss, var_list = g_vars)


batch_size = 100 # batch size

epochs = 500 # number of epochs. The higher the better the result

init = tf.global_variables_initializer ()

# create a session to train networks

samples = [] # generator examples

with tf.Session () as sess: (init)

for epoch in range (epochs):

  num_batches = mnist.train. num_examples / / batch_size


for i in range (num_batches):

batch = mnist.train.next_batch (batch_size)

batch_images = batch [ 0 ] .reshape ((batch_size, 784 ))

batch_images = batch_images * 2 - 1

batch_z = np.random.uniform ( - 1 , 1 , size = (batch_size, 100 ))

  _ = (D_trainer, feed_dict = {real_images: batch_images, z: batch_z})

_ = (G_trainer, feed_dict = {z: batch_z})


print ( "on epoch {} " . format (epoch))


sample_z = np.random.uniform ( - 1 , 1 , size = ( 1 , 100 ))

gen_sample = (generator (z, reuse = True ) ,

feed_dict = {z: sample_z})


samples.append (gen_sample)

# result after epoch 0

plt.imshow (samples [ 0 ]. reshape ( 28 , 28 ))

# result after epoch 499

plt.imshow (samples [ 49 ]. reshape ( 28 , 28 ))


 on epoch0 on epoch1 ... ... ... on epoch498 on epoch499 

Result after epoch 0:

Resulr after the 499th era:

So, from the above example, we can see that in the first image after the 0th epoch, pixels are scattered all over the place, and we couldn`t do anything from find out this.
But from the second image, we could see that the pixels are organized more systematically, and we could figure out that this is the number "7" that the code picked at random and the network was trying to clone it. In our example, we took 500 for the number of epochs. But you can increase this number to improve your score.