Change language

SARSA Strengthening Learning


The SARSA algorithm is a small variation of the popular Q-Learning algorithm. For the training agent in any reinforcement learning algorithm, its policy can be of two types:

  1. About the policy: in this, the training agent learns the value function in accordance with the current action obtained from the currently used policy.
  2. A disabled policy. In this case, the training agent learns the value function according to the action obtained from another policy.

Q-Learning is a out of politics technique and uses a greedy approach to learning the Q-value. The SARSA technique, on the other hand, is a policy enabled and uses the action taken by the current policy to examine the Q-value.

This difference is visible in the difference between update statements for each method:

  1. Q-Learning:
  2. Sarsa:

Here the update equation for SARSA depends on the current state, the current action, reward received, next state, and next action. This observation leads to the naming of the learning technique, since SARSA stands for State Action Reward State Action, which symbolizes the tuple (s, a, r, s & # 39;, a & # 39;).

The following Python code demonstrates how to implement the SARSA algorithm, using the OpenAI gym module to load the environment.

Step 1: Import the required libraries

import numpy as np

import gym

Step 2: Create Environment

Here we will use the FrozenLake-v0 environment preloaded in the gym. You can read about the environment description here .

# Create environment

env = gym.make ( ’FrozenLake-v0’ )

Step 3: Initialize various parameters

# Define different parameters

epsilon = 0.9

total_episodes = 10000

max_steps  = 100

alpha = 0.85

gamma = 0.95

# Q-matrix initialization

Q = np.zeros ((env.observation_space .n, env.action_space.n))

Step 4: Define the utility functions to be used in the learning process

# Next action selection function

def choose_action (state):

action = 0

if np.random.uniform ( 0 , 1 ) & lt; epsilon:

action = env.action_space.sample ()

else :

action = np.argmax (Q [state,:])

return action

# Function to find out the Q value

def update (state, state2, reward, action, action2):

  predict = Q [state, action]

  target = reward + gamma * Q [state2, action2]

Q [state, action] = Q [state, action] + alpha * (target - predict)

Step 5: Train the Learning Agent

# Initializing reward

reward = 0

# Start learning SARSA

for episode in range (total_episodes):

t = 0

state1 = env.reset ()

action1 = choose_action (state1)


while t & lt; max_steps:

# Learning visualization

env.render ()


# Get next state

state2, reward, done, info = env.step (action1)


# Select next action

  action2 = choose_action (state2)


# Examining the Q-value

  update (state1, state2, reward, action1, action2)


  state1 = state2

action1 = action2


# Update the corresponding values ​​

t + = 1

reward + = 1


  # If at the end of the learning process

if done:


In the above output, the red mark defines the current position of the agent in environment, while the direction shown in parentheses indicates the direction of movement that the agent will take next. Note that the agent remains in position if it goes beyond.

Step 6: Evaluate performance

# Performance evaluation

print ( "Performace:" , reward / total_episodes)

# Q-matrix rendering

print (Q)


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


Common xlabel/ylabel for matplotlib subplots

12 answers


How to specify multiple return types using type-hints

12 answers


Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers


Flake8: Ignore specific warning for entire file

12 answers


glob exclude pattern

12 answers


How to avoid HTTP error 429 (Too Many Requests) python

12 answers


Python CSV error: line contains NULL byte

12 answers


csv.Error: iterator should return strings, not bytes

12 answers



Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python


How to specify multiple return types using type-hints


Printing words vertically in Python


Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries


Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically