Change language

Explanation of the fundamental functions involved in the A3C algorithm

| | |

While any implementation of the Asynchronous Advantage Actor Critic algorithm must be complex, all implementations will have one thing in common — the presence of a global network and a working class.

  1. The Global Network: class contains all the necessary operations Tensorflow for autonomous creation of neural networks.
  2. Working class. This class is used to simulate the process of training an employee who has his own copy of the environment and a "personal" neural network.

The following implementation will require the following modules:

  1. Numpy
  2. Tensorflow
  3. Multiprocessing
  4. The following lines of code indicate the basic functionality required to create the corresponding class.

    WAN class:

    # Network class definition

    class AC_Network ():

    The following lines contain various functions that describe the member functions of the class defined above.

    Class initialization:

    # Class initialization

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf.variable_scope (scope):

     

    # Input and visual coding of layers

      self . inputs = tf.placeholder (shape = [ None , s_size], 

    dtype = tf.float32)

    self . imageIn = tf.reshape ( self . inputs, shape = [ - 1 , 84 , 84 , 1 ])

      self . conv1 = slim.conv2d (activation_fn = tf.nn.elu,

      inputs = self . imageIn, num_outputs = 16 ,

    kernel_size = [ 8 , 8 ],

    stride = [ 4 , 4 ], padding = ’VALID’ )

    self . conv2 = slim.conv2d ( activation_fn = tf.nn.elu,

    inputs = self . conv1, num_outputs = 32 ,

    kernel_size = [ 4 , 4 ],

    stride = [ 2 , 2 ], padding = ’VALID’ )

    hidden = slim .fully_connected (slim.flatten ( self . conv2),

    256 , activation_fn = tf.nn.elu)

     -"  tf.placeholder ()  - Inserts a placeholder for a tensor that will always be fed. -"  tf.reshape ()  - Reshapes the input tensor -"  slim.conv2d ()  - Adds an n-dimensional convolutional network -"  slim.fully_connected ()  - Adds a fully connected layer 

    Note the following definitions:

    • Filter: this is a small matrix that is used to apply various effects to a given image.
    • Padding: is the process of adding an extra row or column at the edges of an image to fully compute filter convolution values.
    • Step: is the number of steps after which the filter is set to a pixel in the given direction.

    Recurrent construction networks:

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf.variable_scope (scope) :

    . ... ... ... ... ... ... ... ... ... 

    . ... ... ... ... ... ... ... ... ...

    . ... ... ... ... ... ... ... ... ...

     

    # Building a recurrent network for temporal dependencies

      lstm_cell = tf.nn.rnn_cell.BasicLSTMCell ( 256 , state_is_tuple = True )

    c_init = np .zeros (( 1 , lstm_cell.state_size.c), np.float32)

    h_init = np .zeros (( 1 , lstm_cell.state_size.h), np.float32)

    self . state_init = [c_init, h_init]

      c_Init = tf.placeholder (tf.float32, [ 1 , lstm_cell.state_size.c])

      h_Init = tf.placeholder (tf.float32, [ 1 , lstm_cell.state_size.h])

      self . state_Init = ( c_Init, h_Init)

    rnn_init = tf.expand_dims (hidden, [ 0 ])

    step_size = tf.shape ( self . imageIn) [: 1 ]

    state_Init = tf.nn.rnn_cell.LSTMStateTuple (c_Init, h_Init)

    lstm_outputs, lstm_state = tf.nn.dynamic_rnn (lstm_cell, rnn_init,

    initial_state = state_Init,

    sequence _length = step_size,

    time_major = False )

    lstm_c, lstm_h = lstm_state

    self . state_out = (lstm_c [: 1 ,:], lstm_h [: 1 ,:])

    rnn_out = tf.reshape (lstm_outputs, [ - 1 , 256 ])

     -"  tf.nn.rnn_cell.BasicLSTMCell ()  - Builds a basic LSTM Recurrent network cell -"  tf.expand_dims ()  - Inserts a dimension of 1 at the dimension index axis of input’s shape -"  tf.shape ()  - returns the shape of the tensor -"  tf.nn.rnn_cell.LSTMStateTuple ()  - Creates a tuple to be used by the LSTM cells for state_size, zero_state and output state. -"  tf.nn.dynamic_rnn ()  - Builds a Recurrent network according to the Recurrent network cell 

    Generate pricing and policy output layers:

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf .variable_scope (scope):

    . ... ... ... ... ... ... ... ... ... 

    . ... ... ... ... ... ... ... ... ...

    . ... ... ... ... ... ... ... ... ...

     

    # Create output layers for cost and policy estimation

      self . policy = slim.fully_connected (rnn_out, a_size,

    activation_fn = tf .nn.softmax,

    weights_initializer = normalized_columns_initializer ( 0.01 ),

    biases_initializer = None )

      self . value = slim.fully_connected (rnn_out, 1 ,

    activation_fn = None ,

    weights_initializer = normalized_columns_initializer ( 1.0 ),

    biases_initializer = None )

    Build a master network and deploy workers:

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf.variable_scope (scope):

    . ... ... ... ... ... ... ... ... ... 

    . ... ... ... ... ... ... ... ... ...

    . ... ... ... ... ... ... ... ... ...

    with tf.device ( " / cpu: 0 " ): 

      

    # Global network generation

    master_network = AC_Network (s_size, a_size, ’global’ , None )

     

    # Save the number of employees

    # as the number of available CPU threads

    num_workers = multiprocessing.cpu_count ()

     

    # Create and deploy workers

      workers = []

    for i in range (num_workers):

    workers .append (Worker (DoomGame (), i, s_size, a_size,

    trainer, saver , model_path))

    Performing parallel Tensorflow operations:

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf.variable_scope (scope):

    . ... ... ... ... ... ... ... ... ... 

    . ... ... ... ... ... ... ... ... ...

    . ... ... ... ... ... ... ... ... ...

     

    with tf.Session () as sess:

      coord = tf.train.Coordinator ()

    if load_model = = True :

    ckpt = tf.train.get_checkpoint_state (model_path)

    saver.restore (sess, ckpt.model_checkpoint_path)

    else :

    sess.run (tf.global_variables_initializer ())

     

    worker_threads = []

      for worker in workers:

    worker_work = lambda : worker.work (max_episode_length,

      gamma, master_network, sess, coord)

    t = threading.Thread (target = (worker_work))

    t.start ()

    worker_threads.append (t)

    coord.join (worker_threads)

     -"  tf.Session ()  - A class to run the Tensorflow operations -"  tf.train.Coordinator ()  - Returns a coordinator for the multiple threads -"  tf.train.get_checkpoint_state ()  - Returns a valid checkpoint state from the "checkpoint" file -"  saver.restore ()  - Is used to store and restore the models -"  sess.run ()  - Outputs the tensors and metadata obtained from running a session 

    Updating WAN parameters:

    def __ init __ ( self , s_size, a_size, scope, trainer):

    with tf.variable_scope (scope):

    . ... ... ... ... ... ... ... ... ... 

    . ... ... ... ... ... ... ... ... ...

    . ... ... ... ... ... ... ... ... ...

     

    if scope! = ’ global’ :

    self . actions = tf.placeholder (shape = [ None ], dtype = tf.int32)

    self . actions_onehot = tf.one_hot ( self . actions,

    a_size, dtype = tf.float32)

    self . target_v = tf.placeholder (shape = [ None ], dtype = tf.float32)

    self . advantages = tf.placeholder (shape = [ None ], dtype = tf.float32)

     

    self . respons ible_outputs = tf.reduce_sum ( self . policy *  

    self . actions_onehot, [ 1 ])

     

    # Calculation errors

    self . value_loss = 0.5 * tf .reduce_sum (tf.square ( self . target_v -

    tf.reshape ( self . value, [ - 1 ])) )

    self . entropy = - tf.reduce_sum ( self . policy * tf.log ( self . policy))

    self . policy_loss = - tf.reduce_sum (tf.log ( self . responsible_outputs)

    * self . advantages)

      self . loss = 0.5 * self . value_loss +  

    self . policy_loss - self . entropy * 0.01

      

    # Get gradients from LAN

    local_vars = tf.get_collection (tf.GraphKeys .TRAINABLE_VARIABLES,

    scope)

    self . gradients = tf.gradients ( self .loss, local_vars)

    self .var_norms = tf.global_norm (local_vars)

      grads, self . grad_norms = tf.clip_by_global_norm ( self . gradients,

    40.0 )

     

    # Apply local gradients to the global web

    global_vars = tf.get_collection (tf.GraphKeys.TRAINABLE_VARIABLES,

    ’global’ )

      self . apply_grads = trainer.apply_gradients (

    > .apply_grads = trainer.apply_gradients (

    > .apply_grads = trainer.apply_gradients (

    Shop

    Learn programming in R: courses

    $

    Best Python online courses for 2022

    $

    Best laptop for Fortnite

    $

    Best laptop for Excel

    $

    Best laptop for Solidworks

    $

    Best laptop for Roblox

    $

    Best computer for crypto mining

    $

    Best laptop for Sims 4

    $

    Latest questions

    NUMPYNUMPY

    psycopg2: insert multiple rows with one query

    12 answers

    NUMPYNUMPY

    How to convert Nonetype to int or string?

    12 answers

    NUMPYNUMPY

    How to specify multiple return types using type-hints

    12 answers

    NUMPYNUMPY

    Javascript Error: IPython is not defined in JupyterLab

    12 answers


    Wiki

    Python OpenCV | cv2.putText () method

    numpy.arctan2 () in Python

    Python | os.path.realpath () method

    Python OpenCV | cv2.circle () method

    Python OpenCV cv2.cvtColor () method

    Python - Move item to the end of the list

    time.perf_counter () function in Python

    Check if one list is a subset of another in Python

    Python os.path.join () method