The SARSA algorithm is a small variation of the popular Q-Learning algorithm. For the training agent in any reinforcement learning algorithm, its policy can be of two types:
- About the policy: in this, the training agent learns the value function in accordance with the current action obtained from the currently used policy.
- A disabled policy. In this case, the training agent learns the value function according to the action obtained from another policy.
Q-Learning is a out of politics technique and uses a greedy approach to learning the Q-value. The SARSA technique, on the other hand, is a policy enabled and uses the action taken by the current policy to examine the Q-value.
This difference is visible in the difference between update statements for each method:
Here the update equation for SARSA depends on the current state, the current action, reward received, next state, and next action. This observation leads to the naming of the learning technique, since SARSA stands for State Action Reward State Action, which symbolizes the tuple (s, a, r, s & # 39;, a & # 39;).
The following Python code demonstrates how to implement the SARSA algorithm, using the OpenAI gym module to load the environment.
Step 1: Import the required libraries
Step 2: Create Environment
Here we will use the FrozenLake-v0 environment preloaded in the gym. You can read about the environment description here .
Step 3: Initialize various parameters
Step 4: Define the utility functions to be used in the learning process
Step 5: Train the Learning Agent
In the above output, the red mark defines the current position of the agent in environment, while the direction shown in parentheses indicates the direction of movement that the agent will take next. Note that the agent remains in position if it goes beyond.
Step 6: Evaluate performance