Brief description of logistic regression:
Logistic regression — it is a classification algorithm commonly used in machine learning. This allows you to classify data into distinct classes by examining relationships from a given set of labeled data. It examines linear relationships from a given dataset and then introduces nonlinearity as a sigmoidal function.
In the case of logistic regression, the hypothesis is a straight line sigmoid, i.e.
where
Where is the vector w
represents the weights, and the scalar b
represents the offset of the model.
Let’s render the sigmoid function —
import numpy as np import matplotlib.pyplot as plt def sigmoid (z): return 1 / ( 1 + np.exp ( - z)) plt.plot (np.arange ( - 5 , 5 , 0.1 ), sigmoid (np.arange ( - 5 , 5 , 0.1 ))) plt.title ( ’Visualization of the Sigmoid Function’ ) plt.show () |
Output:
Note that the range of the Sigmoid function is (0, 1), which means that the resulting values are between 0 and 1. This property of the Sigmoid function makes it a really good choice of activation function for binary classification. Also for z = 0, Sigmoid (z) = 0.5
which is the midpoint of the range of the sigmoidal function.
As with linear regression, we need to find the optimal values for w and b, for which the cost function J is minimal. In this case, we will use the Sigmoid Cross Entropy cost function which is defined
This cost function will then be optimized with using gradient descent.
Implementation:
Let’s start by importing the required libraries. We’ll be using Numpy along with Tensorflow for calculations, Pandas for basic data analysis, and Matplotlib for plotting. We will also use the Scikit-Learn
preprocessor module for One Hot Encoding data.
# module import import numpy as np import pandas as pd import tensorflow as tf import matplotlib.pyplot as plt from sklearn.preprocessing import OneHotEncoder |
Next we will import dataset . We will use a subset of the well-known Iris dataset .
data = pd.read_csv ( ’dataset. csv’ , header = None ) print ( "Data Shape:" , data.shape) print (data.head ()) |
Exit:
Data Shape: (100, 4) 0 1 2 3 0 0 5.1 3.5 1 1 1 4.9 3.0 1 2 2 4.7 3.2 1 3 3 4.6 3.1 1 4 4 5.0 3.6 1
Now let’s semi read the matrix of objects and the corresponding labels and render.
# Feature Matrix x_orig = data.iloc [:, 1 : - 1 ]. values # Data Labels y_orig = data.iloc [:, - 1 :]. values print ( "Shape of Feature Matrix : " , x_orig.shape) print ( "Shape Label Vector:" , y_orig.shape) |
Output :
Shape of Feature Matrix: (100, 2) Shape Label Vector: (100, 1)
Render the data to the data.
Amount of positive data x_pos = np.array ([x_orig [i] for i in range ( len (x_orig)) if y_orig [i] = = 1 ]) # Negative data points x_neg = np.array ([x_orig [i] for i in range ( len (x_orig)) if y_orig [i] = = 0 ]) # Plot positive data points plt.scatter (x_pos [:, 0 ], x_pos [:, 1 ], color = ’ blue’ , label = ’Positive’ ) # Plot negative data points plt.scatter (x_neg [:, 0 ], x_neg [:, 1 ], color = ’red’ , label = ’Negative’ ) plt.xlabel ( ’ Feature 1’ ) plt.ylabel ( ’Feature 2’ ) plt.title ( ’Plot of given data’ ) plt.legend () plt.show () |
,
Now we will be One Hot Encoding for the data to work with the algorithm. One hot coding converts categorical features into a format that works better with classification and regression algorithms. We will also set the learning rate and the number of epochs.
# Create One Hot Encoder oneHot = OneHotEncoder () # x_orig encoding oneHot.fit (x_orig) x = oneHot.transform (x_orig) .toarray () # y_orig encoding oneHot.fit (y_orig) y = oneHot.transform (y_orig) .toarray () alpha, epochs = 0.0035 , 500 m, n = x.shape print ( ’m = ’ , m) print ( ’n =’ , n) print ( ’Learning Rate =’ , alpha) print ( ’Number of Epochs =’ , epochs) |
Exit :
m = 100 n = 7 Learning Rate = 0.0035 Number of Epochs = 500
Now we will start creating a model with a subdividing the X
and Y
placeholders so that we can feed our tutorials x
and y
to the optimizer as we learn. We will also create W
and b
trainable variables that can be optimized with the Gradient Descent Optimizer.
# There are n columns in the object matrix # after one hot encoding. X = tf.placeholder (tf.float32, [ None , n]) # Since this is a binary classification problem, # Y can only take 2 values. Y = tf.placeholder (tf.float32, [ None , 2 ]) # Learning weight variables W = tf.Variable (tf.zeros ([n, 2 ])) # Learning variable offset b = tf.Variable (tf.zeros ([ 2 ])) |
Now declare hypothesis, cost function, optimizer and an initializer for global variables.
# Hypothesis Y_hat = tf.nn.sigmoid (tf .add (tf.matmul (X, W), b)) # Sigmoid Cross Entropy Cost Function cost = tf.nn.sigmoid_cross_entropy_with_logits ( logits = Y_hat, labels = Y) # Gradient Descent Optimizer optimizer = tf.train.GradientDescentOptimizer ( learning_rate = alpha) .minimize ( cost) # Global Variables Initializer init = tf.global_variables_initializer () |
Start the tutorial process within a Tensorflow session.
# Start a Tensorflow session with tf.Session () as sess: # Initializing variables sess.run (init) # Lists for storing varying value and accuracy at each epoch cost_history, accuracy_history = [], [] # Loop over all eras for epoch in range (epochs): cost_per_epoch = 0 # Launching the Optimizer sess.run (optimizer, feed_dict = {X: x, Y: y}) # Calculate cost for the current era c = sess.run (cost, feed_dict = {X: x, Y: y}) # Calculate accuracy for the current epoch correct_prediction = tf.equal (tf.argmax (Y_hat , 1 ), tf.argmax (Y, 1 )) accuracy = tf.re duce_mean (tf.cast (correct_prediction, tf.float32)) # Save cost and accuracy in history cost_history.append ( sum ( sum (c))) accuracy_history.append (accuracy. eval ({X: x, Y: y}) * 100 ) # Display the result in the current era if epoch % 100 = = 0 and epoch! = 0 : print ( "Epoch" + str (epoch) + " Cost: " + str (cost_history [ - 1 ])) Weight = sess.run (W) # Optimized weight Bias = sess.run (b) # Optimized slope # Final precision correct_prediction = tf.equal (tf.argmax (Y_hat, 1 ), tf.argmax (Y, 1 )) accuracy = tf.reduce_mean (tf.cast (correct_prediction, tf.float32)) print ( " Accuracy: " , accuracy_history [ - 1 ], "%" ) |
Output:
Epoch 100 Cost: 125.700202942 Epoch 200 Cost: 120.647117615 Epoch 300 Cost: 118.151592255 Epoch 400 Cost: 116.549999237 Accuracy: 91.0000026226%
Let’s outline the cost change during eras.
plt.plot ( list ( range (epochs)), cost_history) plt.xlabel ( ’Epochs’ ) plt.ylabel ( ’Cost’ ) plt.title ( ’Decrease in Cost with Epochs’ ) plt.show () |
Graph precision changes by epoch.
plt.plot ( list ( range (epochs)), accuracy_history) plt.xlabel ( ’Epochs’ ) plt.ylabel ( ’Accuracy’ ) plt.title ( ’Increase in Accuracy with Epochs’ ) plt.show () |
We will now build a Decision Boundary for our trained classifier. Decision boundary — it is a hypersurface that divides the base vector space into two sets, one for each class.
# Calculate decision boundary decision_boundary_x = np.array ([np. min (x_orig [:, 0 ]), np. max (x_orig [:, 0 ])]) decision_boundary_y = ( - 1.0 / Weight [ 0 ]) * (decision_boundary_x * Weight + Bias) decision_boundary_y = [ sum (decision_boundary_y [:, 0 ]), sum (decision_boundary_y [:, 1 ])] Amount of positive data x_pos = np.array ([x_orig [i] for i in range ( len (x_orig)) if y_orig [i] = = 1 ]) # Negative data points x_neg = np.array ([x_orig [i] for i in range ( len (x_orig)) if y_orig [i] = = 0 ]) # Plot positive data points plt.scatter (x_pos [:, 0 ], x_pos [:, 1 ], color = ’blue’ , label = ’ Positive’ ) # Plot negative data points plt.scatter (x_neg [:, 0 ], x_neg [:, 1 ], color = ’red’ , label = ’ Negative’ ) # Building a decision boundary plt.plot (decision_boundary_x, decision_boundary_y) plt.xlabel ( ’Feature 1’ ) plt. ylabel ( ’Feature 2’ ) plt.title ( ’Plot of Decision Boundary’ ) plt.legend () plt.show () |
code class = "keyword"> =
=
0
])
# Plot positive data points
plt.scatter (x_pos [:,
0
], x_pos [:,
1
],
color
=
’blue’
, label
=