Contrary to popular belief, logistic regression is a regression model. The model builds a regression model to predict the likelihood that a given data record belongs to the category designated “1”. Just like Linear Regression assumes that the data follows a linear function, Logistic Regression models the data using a sigmoidal function.
Logistic regression becomes a classification technique only when a decision threshold is introduced into the picture. Setting the threshold is a very important aspect of logistic regression and depends on the classification problem itself.
The decision on the threshold value depends mainly on the values precision and recall. Ideally, we want both precision and recall to be 1, but this is rarely the case. In the case of a PrecisionRecall compromise, we use the following arguments to decide the threshold:
1. Low fidelity / high reinvocation. Where we want to reduce the number of false negatives without the need to reduce false positives, we choose a solution value that has a low precision value or a high recall value. For example, when diagnosing cancer, we do not want any sick patient to be classified as unaffected, ignoring if the patient is mistakenly diagnosed with cancer. This is because the absence of cancer can be detected by other medical conditions, but the presence of the disease cannot be detected in an already rejected candidate.
2. High accuracy / Low recall: In those cases When we want to reduce the number of false positives without having to reduce the number of false negatives, we choose a solution value that has a high Accuracy value or a low Recall value. For example, if we categorize customers as to whether they will respond positively or negatively to personalized ads, we want to be absolutely sure that the customer will respond positively to the ad, because otherwise negative feedback could lead to a loss of potential sales from the customer. ,
Based on the number of categories, logistic regression can be classified as:
First of all, we will investigate the simplest form of logistic regression, i.e. binomial logistic regression .
Binomial Logistic Regression
Let`s look at an example dataset that compares the number of training hours to exam results. The result can only take two values: passed (1) or failed (0):
Hours (x)  0.50  0.75  1.00  1.25  1.50  1.75  2.00  2.25  2.50  2.75  3.00  3.25  3.50  3.75  4.00  4.25  4.50  4.75  5.00  5.50 

Pass (y)  0  0  0  0  0  0  1  0  1  0  1  0  1  0  1  1  1  1  1  1 
So we have
ie y is a categorical target variable that can only take two possible types: “0” or “1”.
To generalize our model, we assume that:
If you went through linear regression, you must remember that in linear regression, the hypothesis we used to predict was:
where, are the coefficients regression.
Let the regression coefficient be matrix / vector, be:
Then, in a more compact form,
The reason for taking
= 1 is pretty clear now.
We needed to do a matrix product, but there was no
actualmultiplied to in original hypothesis formula. So, we defined = 1.
Now, if we try to apply linear regression to the above problem, we will probably get continuous values using the hypothesis we discussed above. In addition, it does not make sense for to take values greater than 1 or less than 0.
So, we have made some changes to the hypothesis for classification :
where,
called logistic function or sigmoid function .
Here is a graph showing g (z):
From the above graph, we can conclude, what:
So now we can define conditional probabilities for 2 labels (0 and 1) for observation as:
We can write this more compactly as:
We now define another term, parameter probability as:
Likelihood is nothing but the probability of data (training examples), given a model and specific parameter values (here,
). It measures the support provided by the data for each possible value of the . We obtain it by multiplying all for given .
And for simpler calculations we take the logarithmic probability :
value function for logistic regression is proportional to the inverse probability of the parameters. Therefore, we can get an expression for the cost function J using the log likelihood equation as:
and our goal is is to evaluate so the cost function is minimized !!
Using the gradient descent algorithm
First, we take partial derivatives for each to get the stochastic gradient descent rule (here we only present the final derived value):
Here y and h (x) represent the response vector and the predicted response vector (respectively). Also, a vector representing the observation values for feature.
Now to get min ,
where is called learning rate and must be set explicitly.
Let`s see the implementation of the above technique in Python using a dataset as an example (download it from here ):
import
csv
import
numpy as np
import
matplotlib.pyplot as plt
def
loadCSV (filename):
"" "
function to load the dataset
"" "
with
open
(filename,
"r"
) as csvfile:
lines
=
csv.reader (csvfile)
dataset
=
list (lines)
for
i
in
range
( len
(dataset)):
dataset [i]
=
[
float
(x)
for
x
in
dataset [i]]
return
np.array (dataset)
def
normalize (X):
"" "
element matrix normalization function, X
"" "
mins
=
np.
min
(X, axis
=
0
)
maxs
=
np.
max
(X, axis
=
0
)
rng
=
maxs

mins
norm_X
=
1
 ((maxs

X)
/
rng)
return
norm_X
def
logistic_func (beta, X):
“ »»
logistic (sigmoidal) function
"" "
return
1.0
/
(
1
+
np.exp (

np.dot (X, beta .T)))
def
log_gradient (beta, X, y):
"" "
logistic gradient function
" ""
first_calc
=
logistic_func (beta, X)

y.reshape (X.shape [
0
],

1
)
final_calc
=
np.dot (first_ calc.T, X)
return
final_calc
def
cost_func (beta, X, y):
" ""
cost function, J
"" "
log_func_v
=
logistic_func (beta , X)
y
=
np.squeeze (y)
step1
=
y
*
np.log (log_func_v)
step2
=
(
1

y)
*
np.log (
1

log_func_v)
final
=

step1

step2
return
np.mean (final)
def
grad_desc (X, y, beta, lr
=
.
01
, converge_change
=
.
001
):
" ""
gradient descent function
"" "
cost
=
cost_func (beta, X, y)
change_cost
=
1
num_iter
=
1
while
(change_cost & gt; converge_change):
old_cost
=
cost
beta
=
beta

(lr
*
log_gradient (beta, X, y) )
cost
= cost_func (beta, X, y)
change_cost =
old_cost

cost
num_iter
+
=
1
return
beta, num_iter
def
pred_values (beta, X):
"" "
function for predicting labels
"" "
pred_prob
=
logistic_func (beta, X)
pred_value
=
np.where (pred_prob & gt;
=
.
5
, 1
,
0
)
return
np.squeeze (pred_value)
def
plot_reg (X, y, beta):
"" "
function for building decision boundary
"" "
# marked observations
x_0
=
X [np.where (y
=
=
0.0
)]
x_1
=
X [np.where (y
=
=
1.0
)]
# drawing points with diff color for diff label
plt.scatter ([x_0 [:,
1
]], [x_0 [:,
2
]], c
=
`b`
, label
=
`y = 0`
)
plt.scatter ([x_1 [:,
1
]], [x_1 [:,
2
]], c
=
`r`
, label
=
` y = 1`
)
# building decision boundary
x1
=
np.arange (
0
,
1
,
0.1
)
x2
=

(beta [
0
,
0
] +
beta [
0
,
1
]
*
x1)
/
beta [
0
,
2
]
plt.plot (x1, x2, c
=
` k `
, label
=
`reg line`
)
plt.xlabel (
`x1`
)
plt.ylabel (
`x2`
)
plt.legend ()
plt.show ()
if
__ name__
=
= "__ main__"
:
# load dataset
dataset
=
loadCSV (
`dataset1.csv `
)
# normalizing feature matrix
X
=
normalize (dataset [:,:

1
])
# stacking columns with all in the object matrix
X
=
np.hstack ((np.matrix (np.ones (X.shape [
0
])). T , X))
# response vector
y
=
dataset [:,

1 ]
# initial beta values
beta
=
np .matrix (np.zeros (X.shape [
1
X
Submit new EBook