Change language

ML | Logistic regression v / s Decision tree classification

| | |

We can compare two algorithms in different categories —

Criteria Logistic Regression Decision Tree Classification
Interpretability Less interpretable More interpretable
Decision Boundaries Linear and single decision boundary Bisects the space into smaller spaces
Ease of Decision Making A decision threshold has to be set Automatically handles decision making
Overfitting Not prone to overfitting Prone to overfitting
Robustness to noise Robust to noise Majorly affected by noise
Scalability Requires a large enough training set Can be train ed on a small training set

As a simple experiment, we run two models on the same dataset and compare their characteristics.

Step 1: Import the required libraries

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeC lassifier

Step 2: Read and clear the dataset

cd C: UsersDevDesktopKaggleSinking Titanic
# Change workplace to file location

df = pd.read_csv ( ’_train.csv’ )

y = df [ ’Survived’ ]

  

X = df.drop ( ’Survived’ , axis = 1 )

X = X.drop ([ ’Name’ , ’Ticket’ , ’ Cabin’ , ’Embarked’ ], axis = 1 )

 

X = X.replace ([ ’ male’ , ’female’ ], [ 2 , 3 ])

# Hot coding categorical variables

  

X.fillna (method = ’ ffill’ , inplace = True )

# Handling missing values ​​

Step 3: Train and evaluate the Logisitc regression model

X_train, X_test, y_train, y_test = train_test_split (

X, y, test_size = 0.3 , random_state = 0 )

  

lr = LogisticRegression ( )

lr.fit (X_train, y_train)

print (lr.score (X_test, y_test))

Step 4: Train and evaluate the decision tree classifier model

criteria = [ ’gini’ , ’ entropy’ ]

scores = { }

 

for c in criteria:

dt = DecisionTreeClassifier (criterion = c)

  dt.fit (X_train, y_train)

  test_score = dt.score (X_test, y_test)

scores = test_score

 

print ( scores)

Comparing the scores, we see that the logistic regression model performed better in the current dataset, but this may not always be the case.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method