Change language

Algorithm Asynchronous Advantages Actor Critic (A3C)

|

Deciphering the different parts of the algorithm name:

  • Asynchronous: unlike other popular deep learning algorithms such as Deep Q-Learning that use one agent and one environment, this algorithm uses multiple agents, each with its own network parameters and a copy of the environment. These agents interact with their respective environments asynchronously , learning with each interaction. Each agent is controlled by the global network. As each agent gains more knowledge, he contributes to the overall knowledge of the global network. The presence of a global network allows each agent to have more varied training data. This scheme simulates the real environment in which people live, since each person gains knowledge from the experience of some other person, which allows the entire "global network" to become better.
  • Actor-Critic: in Unlike some simpler methods based on either Value-Iteration or Policy-Gradient methods, the A3C algorithm combines the best parts of both methods, i.e. the algorithm predicts both the function V (s) and the optimal function policy function , The training agent uses the value of the Value function to update the optimal policy function (Actor). Note that the policy function here stands for the probability distribution of the action space . More precisely, the training agent determines the conditional probability P (a | s; ) that is, the parameterized probability that the agent chooses action a in state s.

Advantage: As a rule of thumb, when implementing a gradient policy , the discounted income value () tell the agent which of his actions were helpful and which were fined. Using the Advantage value instead, the agent also knows how much better the reward was than expected. This gives the beginner an understanding of the agent in the environment, and therefore the learning process is better. Advantage metric is defined by the following expression:

Advantage: A = Q (s, a) — V (s)

The following pseudocode is from the research paper referenced above.

  Define global shared parameter vectors   and   Define global shared counter T = 0 Define thread specific parameter vectors   and   Define thread step counter t = 1 while ( ) {            while (  is not terminal  ) {Simulate action   according to   Receive reward   and next state   t ++ T ++} if (  is terminal) {R = 0} else {R =  } for (i = t-1; i" =  ; i--) {R =      }    }  

Where,

— Maximum number of iterations

— change the global parameter vector

— Overall reward

— Political function

— Value function

— discount factor

Benefits :

  • This algorithm is faster and more reliable than standard reinforcement learning algorithms.
  • It performs better than other reinforcement learning methods due to the diversity of knowledge as described above.
  • It can be used on both discrete and continuous action spaces.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically