Depending on the number of training examples considered when updating model parameters, we have 3 types of gradient descent:
Batch Gradient Descent  Stochastic Gradient Descent  MiniBatch Gradient Descent 
Since entire training data is considered before taking a step in the direction of gradient, therefore it takes a lot of time for making a single update.  Since onl ya single training example is considered before taking a step in the direction of gradient, we are forced to loop over the training set and thus cannot exploit the speed associated with vectorizing the code.  Since a subset of training examples is considered, it can make quick updates in the model parameters and can also exploit the speed associated with vectorizing the code. 
It makes smooth updates in the model parameters  It makes very noisy updates in the parameters  Depending upon the batch size, the updates can be made less noisy – greater the batch size less noisy is the update 
So minibatch gradient descent makes a tradeoff between fast convergence and noise associated with gradient updates, making it a more flexible and robust algorithm .
Mini Batch Gradient Descent:
Algorithm
Let theta = model parameters an d max_iters = number of epochs.
for itr = 1, 2, 3,…, max_iters:
for mini_batch (X_mini, y_mini):
 Forward Pass on the batch X_mini:
 Make predictions on the minibatch
 Compute error in predictions (J (theta)) with the current values of the parameters
 Backward Pass:
 Compute gradient (theta) = partial derivative of J (theta) wrt theta
 Update parameters:
 theta = theta – learning_rate * gradient (theta)
Below is the Python implementation:
Step # 1: The first step — import dependencies, generate linear regression data, and visualize the generated data. We have created 8000 sample data, each with 2 attributes / functions. These sample data are further subdivided into a training set (X_train, y_train) and a test set (X_test, y_test), which have 7200 and 800 examples, respectively.

Exit:
Number of examples in training set = 7200
Number of examples in test set = 800
Step # 2: Next, we write the code to implement linear regression using minibatch gradient descent.
gradientDescent ()
is the main function of the driver and the other functions are helper functions used to predict — hypothesis ()
, calculating gradients — gradient ()
, error computation — cost ()
and create minipackages — create_mini_batches ()
. The driver function initializes the parameters, calculates the best set of parameters for the model, and returns those parameters along with a list containing the error history when the parameters were updated.
# linear regression using “ minibatch "gradient descent"
# function for calculating hypotheses / predictions
def
hypothesis (X, theta):
return
np.dot (X, theta)
# function to compute the error gradient with theta function
def
gradient ( X, y, theta):
h =
hypothesis (X, theta)
grad
=
np.dot (X.transpose (), (h

y))
return
grad
# function to calculate the error for the current theta values
def
cost (X, y, theta):
h
=
hypothesis (X, theta)
J
=
np.dot ((h

y) .trans pose (), (h

y))
J
/
=
2
return
J [
0
]
# function to create a list containing minipackages
def
create_mini_batches (X, y, batch_size):
mini_batches
=
[]
data
=
np.hstack ((X, y))
np.random.shuffle (data)
n_minibatches
=
data.shape [
0
]
/
/
batch_size
i
=
0
for
i
in
range
(n_minibatches
+
1
):
mini_batch
=
data [i *
batch_size: (i
+
1
)
*
batch_size,:]
X_mini
=
mini_batch [:,:

1
]
Y_mini
=
mini_batch [:,

1
]. reshape ((

1
,
1
))
mini_batches.append ((X_mini, Y_mini))
if
data.shape [
0
]
%
batch_size!
=
0
:
mini_batch
=
data [i
*
batch_size: data.shape [
0
]]
X_mini
=
mini_batch [:,:

1
]
Y_mini
=
mini_batch [:,

1
]. reshape ((

1
,
1
) )
mini_batches.append ((X_mini, Y_mini))
return
mini_batches
# function to perform minigradient descent
def
gradientDescent (X, y, learning_rate
=
0.001
, batch_size
=
32
):
theta
=
np.zeros ((X.shape [
1
],
1
))
error_list
=
[]
max_iters
=
3
for
itr
in
range
(max_iters):
mini_batches
=
create_mini_batches (X, y, batch_size)
for
mini_batch
in
mini_batches:
X_mini, y_mini
=
mini_batch
theta
=
theta

learning_rate
*
gradient (X_mini, y_mini, theta)
error_list.append (cost (X_mini, y_mini, theta))
return
theta, error_list
Function call gradientDescent () to compute the model parameters (theta) and visualize the change in the error function.
theta, error_list
=
gradientDescent (X_train, y_train)
print
(
"Bias ="
, theta [
0
])
print
(
"Coefficients ="
, theta [
1
:])
# render gradient descent
plt.plot (error_list)
plt.xlabel (
"Number of iterations"
)
plt.ylabel ( "Cost"
)
plt.show ()
Output:
Offset = [0.81830471]
Odds = [[1.04586595] ]
Step # 3: Finally, we make predictions on the test set and calculate the mean absolute error in the predictions.
# predicting exit for X_test
y_pred
=
hypothesis (X_test, theta )
plt.scatter (X_test [:,
1
], y_test [:,], marker
=
` .`
)
plt.plot (X_test [:,
1 ], y_pred, color
=
`orange`
)
plt .show ()
# calculating error in forecasts
error
=
np.
sum
(np.
abs
(y_test 
y_pred)
/
y_test .shape [
0
])
print
(
"Mean absolute error ="
, error)
Exit:
Mean absolute error = 0.4366644295854125
The orange line represents the final function of the hypothesis: theta [0] + theta [1] * X_test [:, 1] + theta [2] * X_test [:, 2] = 0
X
Submit new EBook