Change language

# ML | Logistic regression v / s Decision tree classification

| | |

We can compare two algorithms in different categories —

Criteria Logistic Regression Decision Tree Classification
Interpretability Less interpretable More interpretable
Decision Boundaries Linear and single decision boundary Bisects the space into smaller spaces
Ease of Decision Making A decision threshold has to be set Automatically handles decision making
Overfitting Not prone to overfitting Prone to overfitting
Robustness to noise Robust to noise Majorly affected by noise
Scalability Requires a large enough training set Can be train ed on a small training set

As a simple experiment, we run two models on the same dataset and compare their characteristics.

Step 1: Import the required libraries

 ` import ` ` numpy as np ` ` import ` ` pandas as pd ` ` from ` ` sklearn.model_selection ` ` import ` ` train_test_split ` ` from ` ` sklearn.linear_model ` ` import ` ` LogisticRegression ` ` from ` ` sklearn.tree ` ` import ` ` DecisionTreeC lassifier `

Step 2: Read and clear the dataset

` `

` cd C: UsersDevDesktopKaggleSinking Titanic # Change workplace to file location df = pd.read_csv ( ’_train.csv’ ) y = df [ ’Survived’ ]    X = df.drop ( ’Survived’ , axis = 1 ) X = X.drop ([ ’Name’ , ’Ticket’ , ’ Cabin’ , ’Embarked’ ], axis = 1 )   X = X.replace ([ ’ male’ , ’female’ ], [ 2 , 3 ]) # Hot coding categorical variables    X.fillna (method = ’ ffill’ , inplace = True ) # Handling missing values ​​ `

` `

Step 3: Train and evaluate the Logisitc regression model

` `

` X_train, X_test, y_train, y_test = train_test_split ( X, y, test_size = 0.3 , random_state = 0 )    lr = LogisticRegression ( ) lr.fit (X_train, y_train) print (lr.score (X_test, y_test)) `

` `

Step 4: Train and evaluate the decision tree classifier model

 ` criteria ` ` = ` ` [` ` ’gini’ ` `, ` `’ entropy’ ` `] ` ` scores ` ` = ` ` { } `   ` for ` ` c ` ` in ` ` criteria: ` ` dt ` ` = ` ` DecisionTreeClassifier (criterion ` ` = ` ` c) ` ` ` ` dt.fit (X_train, y_train) ` ` ` ` test_score ` ` = ` ` dt.score (X_test, y_test) ` ` scores ` ` = ` ` test_score `   ` print ` ` ( scores) `

Comparing the scores, we see that the logistic regression model performed better in the current dataset, but this may not always be the case.

## Shop

Learn programming in R: courses

\$

Best Python online courses for 2022

\$

Best laptop for Fortnite

\$

Best laptop for Excel

\$

Best laptop for Solidworks

\$

Best laptop for Roblox

\$

Best computer for crypto mining

\$

Best laptop for Sims 4

\$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

NUMPYNUMPY

How to specify multiple return types using type-hints

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

NUMPYNUMPY

glob exclude pattern

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

NUMPYNUMPY

Python CSV error: line contains NULL byte

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries