 # ML | Logistic regression v / s Decision tree classification

We can compare two algorithms in different categories —

Criteria Logistic Regression Decision Tree Classification
Interpretability Less interpretable More interpretable
Decision Boundaries Linear and single decision boundary Bisects the space into smaller spaces
Ease of Decision Making A decision threshold has to be set Automatically handles decision making
Overfitting Not prone to overfitting Prone to overfitting
Robustness to noise Robust to noise Majorly affected by noise
Scalability Requires a large enough training set Can be train ed on a small training set

As a simple experiment, we run two models on the same dataset and compare their characteristics.

Step 1: Import the required libraries

 ` import ` ` numpy as np ` ` import ` ` pandas as pd ` ` from ` ` sklearn.model_selection ` ` import ` ` train_test_split ` ` from ` ` sklearn.linear_model ` ` import ` ` LogisticRegression ` ` from ` ` sklearn.tree ` ` import ` ` DecisionTreeC lassifier `

Step 2: Read and clear the dataset

` `

``` cd C: UsersDevDesktopKaggleSinking Titanic # Change workplace to file location df = pd.read_csv ( `_train.csv` ) y = df [ `Survived` ]    X = df.drop ( `Survived` , axis = 1 ) X = X.drop ([ `Name` , `Ticket` , ` Cabin` , `Embarked` ], axis = 1 )   X = X.replace ([ ` male` , `female` ], [ 2 , 3 ]) # Hot coding categorical variables    X.fillna (method = ` ffill` , inplace = True ) # Handling missing values ​​ ```

` `

Step 3: Train and evaluate the Logisitc regression model

` `

``` X_train, X_test, y_train, y_test = train_test_split ( X, y, test_size = 0.3 , random_state = 0 )    lr = LogisticRegression ( ) lr.fit (X_train, y_train) print (lr.score (X_test, y_test)) Step 4: Train and evaluate the decision tree classifier model criteria = [ `gini` , ` entropy` ] scores = { }   for c in criteria: dt = DecisionTreeClassifier (criterion = c)   dt.fit (X_train, y_train)   test_score = dt.score (X_test, y_test) scores = test_score   print ( scores) ``` Comparing the scores, we see that the logistic regression model performed better in the current dataset, but this may not always be the case.