  # ML | Mini batch gradient descent with python

NumPy | Python Methods and Functions

Depending on the number of training examples considered when updating model parameters, we have 3 types of gradient descent:

1. Batch gradient descent: parameters are updated after the gradient is computed errors across the entire training set
2. Stochastic Gradient Descent: the parameters are updated after the error gradient is computed with respect to one training case
3. Mini Batch Gradient Descent : parameters are updated after calculating the error gradient relative to a subset of the training set
 Batch Gradient Descent Stochastic Gradient Descent Mini-Batch Gradient Descent Since entire training data is considered before taking a step in the direction of gradient, therefore it takes a lot of time for making a single update. Since onl ya single training example is considered before taking a step in the direction of gradient, we are forced to loop over the training set and thus cannot exploit the speed associated with vectorizing the code. Since a subset of training examples is considered, it can make quick updates in the model parameters and can also exploit the speed associated with vectorizing the code. It makes smooth updates in the model parameters It makes very noisy updates in the parameters Depending upon the batch size, the updates can be made less noisy - greater the batch size less noisy is the update

So mini-batch gradient descent makes a trade-off between fast convergence and noise associated with gradient updates, making it a more flexible and robust algorithm .

Algorithm-

Let theta = model parameters an d max_iters = number of epochs.

for itr = 1, 2, 3,…, max_iters:
for mini_batch (X_mini, y_mini):

• Forward Pass on the batch X_mini:
• Make predictions on the mini-batch
• Compute error in predictions (J (theta)) with the current values ​​of the parameters
• Backward Pass:
• Compute gradient (theta) = partial derivative of J (theta) wrt theta
• Update parameters:
• theta = theta - learning_rate * gradient (theta)

Below is the Python implementation:

Step # 1: The first step — import dependencies, generate linear regression data, and visualize the generated data. We have created 8000 sample data, each with 2 attributes / functions. These sample data are further subdivided into a training set (X_train, y_train) and a test set (X_test, y_test), which have 7200 and 800 examples, respectively.

 ` # import dependencies ` ` import ` ` numpy as np ` ` import ` ` matplotlib.pyplot as plt `   ` # data creation ` ` mean ` ` = ` ` np.array ([` ` 5.0 ` `, ` ` 6.0 ` `]) ` ` cov ` ` = np.array ([[ 1.0 , 0.95 ], [ 0.95 , 1.2 ]]) `` data = np.random.multivariate_normal (mean, cov, 8000 )   # data visualization plt.scatter (data [: 500 , 0 ], data [: 500 , 1 ], marker = `.` ) plt.show ()   # train- test-split data = np. hstack ((np.ones ((data.shape [ 0 ], 1 )), data))   split_factor = 0.90 split = int (split_factor * data.shape [ 0 ])   X_train = data [: split,: - 1 ] y_train = data [: split, - 1 ]. reshape (( - 1 , 1 )) X_test = data [split :,: - 1 ] y_test = data [split :, - 1 ]. reshape (( - 1 , 1 ))    print ( "Number of examples in training set =% d " % (X_train.shape [ 0 ])) print ( "Number of examples in testing set =% d" % (X_test. shape [ 0 ])) `

Exit:

Number of examples in training set = 7200
Number of examples in test set = 800

Step # 2: Next, we write the code to implement linear regression using mini-batch gradient descent.
` gradientDescent () ` is the main function of the driver and the other functions are helper functions used to predict — ` hypothesis () `, calculating gradients — ` gradient () `, error computation — ` cost () ` and create mini-packages — ` create_mini_batches () `. The driver function initializes the parameters, calculates the best set of parameters for the model, and returns those parameters along with a list containing the error history when the parameters were updated.

` `

 ` # linear regression using “ mini-batch "gradient descent" ` ` # function for calculating hypotheses / predictions ` ` def ` ` hypothesis (X, theta): ` ` return ` ` np.dot (X, theta) `   ` # function to compute the error gradient with theta function ` ` def ` ` gradient ( X, y, theta): ` ` h = hypothesis (X, theta) `` grad = np.dot (X.transpose (), (h - y)) return grad   # function to calculate the error for the current theta values ​​ def cost (X, y, theta): h = hypothesis (X, theta)   J = np.dot ((h - y) .trans pose (), (h - y))   J / = 2 return J [ 0 ]   # function to create a list containing mini-packages def create_mini_batches (X, y, batch_size): mini_batches = [] data = np.hstack ((X, y))   np.random.shuffle (data) n_minibatches = data.shape [ 0 ] / / batch_size i = 0      for i in range (n_minibatches + 1 ): mini_batch = data [i * batch_size: (i + 1 ) * batch_size,:] X_mini = mini_batch [:,: - 1 ] Y_mini = mini_batch [:, - 1 ]. reshape (( - 1 , 1 ))   mini_batches.append ((X_mini, Y_mini))   if data.shape [ 0 ] % batch_size! = 0 : mini_batch = data [i * batch_size: data.shape [ 0 ]] X_mini = mini_batch [:,: - 1 ] Y_mini = mini_batch [:, - 1 ]. reshape (( - 1 , 1 ) ) mini_batches.append ((X_mini, Y_mini)) return mini_batches   # function to perform mini-gradient descent def gradientDescent (X, y, learning_rate = 0.001 , batch_size = 32 ): theta = np.zeros ((X.shape [ 1 ], 1 )) error_list = []   max_iters = 3 for itr in range (max_iters): mini_batches = create_mini_batches (X, y, batch_size) for mini_batch in mini_batches: X_mini, y_mini = mini_batch theta = theta - learning_rate * gradient (X_mini, y_mini, theta) error_list.append (cost (X_mini, y_mini, theta))     return theta, error_list `

Function call ` gradientDescent () to compute the model parameters (theta) and visualize the change in the error function. `

``` theta, error_list = gradientDescent (X_train, y_train) print ( "Bias =" , theta [ 0 ]) print ( "Coefficients =" , theta [ 1 :])   # render gradient descent plt.plot (error_list) plt.xlabel ( "Number of iterations" ) plt.ylabel ( "Cost" ) plt.show () Output: Offset = [0.81830471] Odds = [[1.04586595] ] Step # 3: Finally, we make predictions on the test set and calculate the mean absolute error in the predictions. # predicting exit for X_test y_pred = hypothesis (X_test, theta ) plt.scatter (X_test [:, 1 ], y_test [:,], marker = ` .` ) plt.plot (X_test [:, 1 ], y_pred, color = `orange` ) plt .show ()   # calculating error in forecasts error = np. sum (np. abs (y_test - y_pred) / y_test .shape [ 0 ]) print ( "Mean absolute error =" , error) Exit: Mean absolute error = 0.4366644295854125 The orange line represents the final function of the hypothesis: theta  + theta  * X_test [:, 1] + theta  * X_test [:, 2] = 0 (adsbygoogle = window.adsbygoogle || []).push({}); ```
``` ```
``` ```
``` ```
``` Books for developers Computer Coding for Kids Learning to code is tremendous fun as you can get instant results, no matter how much more you have to learn. In fact, it’s such fun creating games and programs that it feels effortless once you’r... 23/09/2020 Essential Algorithms A Practical Approach to Computer Algorithms Using Python® and C# Rod Stephens started out as a mathematician, but while studying at MIT, he discovered how much fun algorithms are. He took every al... 23/09/2020 Big Data Analytics Made Easy This book is an indispensable guide focuses on Machine Learning and R Programming, in an instructive and conversational tone which helps them who want to make their career in Big Data Analytics/ Data ... 10/07/2020 The Pragmatic Programmer The Pragmatic Programmer: Your Journey To Mastery, 20th Anniversary Edition (2nd Edition). The Pragmatic Programmer is one of those rare technical books that you will read, reread, and re-read over... 23/09/2021 Get Solution for free from DataCamp guru © 2021 Python.Engineering Best Python tutorials books for beginners and professionals Python.Engineering is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com Computations Development Cryptography For dummies Machine Learning Big Data Loops Counters NumPy NLP PHP Regular Expressions File Handling Arrays String Variables Knowledge Database X Submit new EBook \$(document).ready(function () { \$(".modal_galery").owlCarousel({ items: 1, itemsCustom: false, itemsDesktop: [1300, 1], itemsDesktopSmall: [960, 1], itemsTablet: [768, 1], itemsTabletSmall: false, itemsMobile: [479, 1], singleItem: false, itemsScaleUp: false, pagination: false, navigation: true, rewindNav: true, autoPlay: true, stopOnHover: true, navigationText: [ "<img class='img_no_nav_mob' src='/wp-content/themes/nimani/image/prevCopy.png'>", "<img class='img_no_nav_mob' src='/wp-content/themes/nimani/image/nextCopy.png'>" ], }); \$(".tel_mask").mask("+9(999) 999-99-99"); }) ```