Change language

ML | Additional tree classifier for selecting objects

| |

Extra Trees Classifier — it is an ensemble learning technique that combines the results of several uncorrelated decision trees collected in a “forest” to produce classification results. It is very similar in concept to the random forest classifier and differs from it only in the way it builds decision trees in the forest.

Each Decision Tree in the Additional Trees Forest is built from the original training pattern. Then, at each test node, each tree is given a random sample of k features from a set of functions, from which each decision tree must choose the best feature to partition the data based on some mathematical criteria (usually the Gini index). This random sample of features results in many uncorrelated decision trees.

To perform feature selection using the above forest structure, when constructing the forest, for each feature, the overall decrease in the mathematical criteria used to determine the partition function (index Gini, if the Gini index is used when constructing the forest) is calculated. This value is called Gini Function Importance. To perform feature selection, each feature is ordered in descending order according to the Gini importance of each feature, and the user selects the best k features according to their choice.

Consider the following data:

Let’s build a hypothetical Additional Trees Forest for the above data with five decision trees and a value of k , which decides that the number of objects in a random sample of objects is two . Here, information collection will be used as a decision-making criterion. First, let’s calculate the entropy of the data. Pay attention to the formula for calculating entropy:

where c — the number of unique class labels and the proportion of rows with an output label is i.

Therefore, for data data entropy :

Let the decision trees be built so that:

  • 1st decision tree receives data from Outlook and Temperature functions:

    Note that the formula for getting information is:




  • 2nd decision tree receives data with Temperature and Wind functions:

    Using the above formulas: —

  • strong" 3rd decision tree gets data with Outlook and Humidity functions:
  • 4th decision tree receives data with Temperature and Humidity functions:
  • 5th decision tree receives data with Wind and Humidity functions:

    Calculate general information ion gain for each function: —

      Total Info Gain for Outlook = 0.246 + 0.246 = 0.492 Total Info Gain for Temperature = 0.029 + 0.029 + 0.029 = 0.087 Total Info Gain for Humidity = 0.151 + 0.151 + 0.151 = 0.453 Total Info Gain for Wind = 0.048 + 0.048 = 0.096  
  • Thus, the most important variable for determining the output label in accordance with the forest built above is additional trees is the Outlook function.

    The code below demonstrates how to select objects using additional tree classifiers.

    Step 1: Import required libraries

    import pandas as pd

    import numpy as np

    import matplotlib.pypl ot as plt

    from sklearn.ensemble import ExtraTreesClassifier

    Step 2: Loading and Clearing Data

    # Change workplace to file location
    cd C: UsersDevDesktopKaggle

    # Loading data

    df = pd.read_csv ( ’data.csv’ )

    # Separate dependent and independent variables

    y = df [ ’Play Tennis’ ]

    X = df.drop ( ’ Play Tennis’ , axis = 1 )

    X.head ()

    Step 3: Build a forest of additional trees and calculate the values ​​of individual functions

    # Building the model

    extra_tree_forest = ExtraTreesClassifier (n_estimators = 5 ,

    criterion = ’ entropy’ , max_features = 2 )

    # Train the model (X, y)

    # Calculate the importance of each features

    feature_importance = extra_tree_forest.feature_importances_

    # Normalizing individual values ​​

    feature_importance_normalized = np.std ([tree.feature_importances_ for tree in  


    axis = 0 )

    Step 4: Visualize and compare results

    # Building a histogram for comparing models (X.columns, feature_importance_normalized)

    plt.xlabel ( ’ Feature Labels’ )

    plt.ylabel ( ’Feat ure Importances’ )

    plt.title ( ’Comparison of different Feature Importances’ ) ()

    Thus, the above output confirms our theory of object selection using the Extra Trees Classifier. The importance of objects can have different meanings due to the random nature of the samples of objects.


    Learn programming in R: courses


    Best Python online courses for 2022


    Best laptop for Fortnite


    Best laptop for Excel


    Best laptop for Solidworks


    Best laptop for Roblox


    Best computer for crypto mining


    Best laptop for Sims 4


    Latest questions


    Common xlabel/ylabel for matplotlib subplots

    12 answers


    How to specify multiple return types using type-hints

    12 answers


    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    12 answers


    Flake8: Ignore specific warning for entire file

    12 answers


    glob exclude pattern

    12 answers


    How to avoid HTTP error 429 (Too Many Requests) python

    12 answers


    Python CSV error: line contains NULL byte

    12 answers


    csv.Error: iterator should return strings, not bytes

    12 answers



    Python | How to copy data from one Excel sheet to another

    Common xlabel/ylabel for matplotlib subplots

    Check if one list is a subset of another in Python


    How to specify multiple return types using type-hints


    Printing words vertically in Python


    Python Extract words from a given string

    Cyclic redundancy check in Python

    Finding mean, median, mode in Python without libraries


    Python add suffix / add prefix to strings in a list

    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    Python - Move item to the end of the list

    Python - Print list vertically