Change language

Understanding Logistic Regression

| | |

Contrary to popular belief, logistic regression is a regression model. The model builds a regression model to predict the likelihood that a given data record belongs to the category designated "1". Just like Linear Regression assumes that the data follows a linear function, Logistic Regression models the data using a sigmoidal function.

Logistic regression becomes a classification technique only when a decision threshold is introduced into the picture. Setting the threshold is a very important aspect of logistic regression and depends on the classification problem itself.

The decision on the threshold value depends mainly on the values ​​ precision and recall. Ideally, we want both precision and recall to be 1, but this is rarely the case. In the case of a Precision-Recall compromise, we use the following arguments to decide the threshold:

1. Low fidelity / high re-invocation. Where we want to reduce the number of false negatives without the need to reduce false positives, we choose a solution value that has a low precision value or a high recall value. For example, when diagnosing cancer, we do not want any sick patient to be classified as unaffected, ignoring if the patient is mistakenly diagnosed with cancer. This is because the absence of cancer can be detected by other medical conditions, but the presence of the disease cannot be detected in an already rejected candidate.

2. High accuracy / Low recall: In those cases When we want to reduce the number of false positives without having to reduce the number of false negatives, we choose a solution value that has a high Accuracy value or a low Recall value. For example, if we categorize customers as to whether they will respond positively or negatively to personalized ads, we want to be absolutely sure that the customer will respond positively to the ad, because otherwise negative feedback could lead to a loss of potential sales from the customer. ,

Based on the number of categories, logistic regression can be classified as:

  1. binomial: the target variable can have only 2 possible types: “0 "Or" 1 ", which can represent win versus lose, pass versus fail, dead versus live, etc.
  2. polynomial: the target variable can have 3 or more possible types that are not ordered (that is, the types are not quantifiable), such as disease A versus disease B versus disease C.
  3. sequence number: it deals with target variables with ordered categories. For example, a test result can be categorized as "very bad", "bad", "good", "very good". Here, each category can be assigned a score, for example, 0, 1, 2, 3.

First of all, we will investigate the simplest form of logistic regression, i.e. binomial logistic regression .

Binomial Logistic Regression

Let’s look at an example dataset that compares the number of training hours to exam results. The result can only take two values: passed (1) or failed (0):

 
Hours (x) 0.50 0.751.001.251.501.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.754.004.254.504.75 5.00 5.50
Pass (y) 0 0 0 0 0 0 1 0 1 0 10101 1 1 1 1 1

So we have

ie y is a categorical target variable that can only take two possible types: "0" or "1".
To generalize our model, we assume that:

  • The dataset has function variables "p" and observations "n".
  • The object matrix is ​​represented as:

    Here, denotes values ​​ feature for observation.
    Here we stick to the convention of = 1. (Continue reading, you will understand the logic in a few minutes).
  • observation, , can be presented like:
  • represents the predicted response for observation, i.e. , The formula we are using to calculate called the hypothesis .

If you went through linear regression, you must remember that in linear regression, the hypothesis we used to predict was:

where, are the coefficients regression.
Let the regression coefficient be matrix / vector, be:

Then, in a more compact form,

The reason for taking = 1 is pretty clear now.
We needed to do a matrix product, but there was no
actual multiplied to in original hypothesis formula. So, we defined = 1.

Now, if we try to apply linear regression to the above problem, we will probably get continuous values ​​using the hypothesis we discussed above. In addition, it does not make sense for to take values ​​greater than 1 or less than 0.
So, we have made some changes to the hypothesis for classification :

where,

called logistic function or sigmoid function .
Here is a graph showing g (z):

From the above graph, we can conclude, what:

  • g (z) tends to 1 like
  • g ( z) tends to 0 as
  • g (z) is always limited between 0 and 1

So now we can define conditional probabilities for 2 labels (0 and 1) for observation as:

We can write this more compactly as:

We now define another term, parameter probability as:

Likelihood is nothing but the probability of data (training examples), given a model and specific parameter values ​​(here, ). It measures the support provided by the data for each possible value of the . We obtain it by multiplying all for given .

And for simpler calculations we take the logarithmic probability :

value function for logistic regression is proportional to the inverse probability of the parameters. Therefore, we can get an expression for the cost function J using the log likelihood equation as:

and our goal is is to evaluate so the cost function is minimized !!

Using the gradient descent algorithm

First, we take partial derivatives for each to get the stochastic gradient descent rule (here we only present the final derived value):

Here y and h (x) represent the response vector and the predicted response vector (respectively). Also, a vector representing the observation values ​​for feature.
Now to get min ,

where is called learning rate and must be set explicitly.
Let’s see the implementation of the above technique in Python using a dataset as an example (download it from here ):

2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00 5.50

import csv

import numpy as np

import matplotlib.pyplot as plt

 

 

def loadCSV (filename):

"" "

  function to load the dataset

"" "

  with open (filename, "r" ) as csvfile:

lines = csv.reader (csvfile)

dataset = list (lines)

for i in range ( len (dataset)):

  dataset [i] = [ float (x) for x in dataset [i]] 

return np.array (dataset)

 

 

def normalize (X):

"" "

  element matrix normalization function, X

"" "

  mins = np. min (X, axis = 0 )

maxs = np. max (X, axis = 0 )

rng = maxs - mins

norm_X = 1 - ((maxs - X) / rng)

return norm_X

 

 

def logistic_func (beta, X):

“»»

logistic (sigmoidal) function

"" "

  return 1.0 / ( 1 + np.exp ( - np.dot (X, beta .T)))

 

 

def log_gradient (beta, X, y):

"" "

  logistic gradient function

  " ""

first_calc = logistic_func (beta, X) - y.reshape (X.shape [ 0 ], - 1 )

  final_calc = np.dot (first_ calc.T, X)

return final_calc

 

 

def cost_func (beta, X, y):

" ""

cost function, J

"" "

log_func_v = logistic_func (beta , X)

y = np.squeeze (y)

step1 = y * np.log (log_func_v)

step2 = ( 1 - y) * np.log ( 1 - log_func_v)

final = - step1 - step2

return np.mean (final)

 

 

def grad_desc (X, y, beta, lr = . 01 , converge_change = . 001 ):

" ""

gradient descent function

"" "

  cost = cost_func (beta, X, y)

change_cost = 1

num_iter = 1

 

while (change_cost" converge_change):

  old_cost = cost

  beta = beta - (lr * log_gradient (beta, X, y) )

cost = cost_func (beta, X, y)

change_cost = old_cost - cost

num_iter + = 1

 

return beta, num_iter 

 

 

def pred_values ​​(beta, X):

"" "

  function for predicting labels

"" "

  pred_prob = logistic_func (beta, X)

pred_value = np.where (pred_prob" = . 5 , 1 , 0 )

return np.squeeze (pred_value)

 

 

def plot_reg (X, y, beta):

  "" "

  function for building decision boundary

  "" "

  # marked observations

x_0 = X [np.where (y = = 0.0 )]

x_1 = X [np.where (y = = 1.0 )]

 

# drawing points with diff color for diff label

plt.scatter ([x_0 [:, 1 ]], [x_0 [:, 2 ]], c = ’b’ , label = ’y = 0’ )

  plt.scatter ([x_1 [:, 1 ]], [x_1 [:, 2 ]], c = ’r’ , label = ’ y = 1’ )

 

# building decision boundary

x1 = np.arange ( 0 , 1 , 0.1 )

x2 = - (beta [ 0 , 0 ] + beta [ 0 , 1 ] * x1) / beta [ 0 , 2 ]

plt.plot (x1, x2, c = ’ k ’ , label = ’reg line’ )

 

plt.xlabel ( ’x1’ )

plt.ylabel ( ’x2’ )

  plt.legend ()

plt.show ()

 

 

 

if __ name__ = = "__ main__" :

  # load dataset

dataset = loadCSV ( ’dataset1.csv ’ )

  

  # normalizing feature matrix

                                                                                                                                                                                                                                                                                                                                                                                                                                             X = normalize (dataset [:,: - 1 ])

 

# stacking columns with all in the object matrix

X = np.hstack ((np.matrix (np.ones (X.shape [ 0 ])). T , X))

 

# response vector

y = dataset [:, - 1 ]

  

  # initial beta values ​​

beta = np .matrix (np.zeros (X.shape [ 1

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method