Extra Trees Classifier — it is an ensemble learning technique that combines the results of several uncorrelated decision trees collected in a “forest” to produce classification results. It is very similar in concept to the random forest classifier and differs from it only in the way it builds decision trees in the forest.
Each Decision Tree in the Additional Trees Forest is built from the original training pattern. Then, at each test node, each tree is given a random sample of k features from a set of functions, from which each decision tree must choose the best feature to partition the data based on some mathematical criteria (usually the Gini index). This random sample of features results in many uncorrelated decision trees.
To perform feature selection using the above forest structure, when constructing the forest, for each feature, the overall decrease in the mathematical criteria used to determine the partition function (index Gini, if the Gini index is used when constructing the forest) is calculated. This value is called Gini Function Importance. To perform feature selection, each feature is ordered in descending order according to the Gini importance of each feature, and the user selects the best k features according to their choice.
Consider the following data:
Let`s build a hypothetical Additional Trees Forest for the above data with five decision trees and a value of k , which decides that the number of objects in a random sample of objects is two . Here, information collection will be used as a decision-making criterion. First, let`s calculate the entropy of the data. Pay attention to the formula for calculating entropy:
where c — the number of unique class labels and the proportion of rows with an output label is i.
Therefore, for data data entropy strong>:
Let the decision trees be built so that:
Note that the formula for getting information is:
Using the above formulas: —
Calculate general information ion gain for each function: —
Total Info Gain for Outlook = 0.246 + 0.246 = 0.492 Total Info Gain for Temperature = 0.029 + 0.029 + 0.029 = 0.087 Total Info Gain for Humidity = 0.151 + 0.151 + 0.151 = 0.453 Total Info Gain for Wind = 0.048 + 0.048 = 0.096
Thus, the most important variable for determining the output label in accordance with the forest built above is additional trees is the Outlook function.
The code below demonstrates how to select objects using additional tree classifiers.
Step 1: Import required libraries
Step 2: Loading and Clearing Data
Step 3: Build a forest of additional trees and calculate the values of individual functions
Step 4: Visualize and compare results
Thus, the above output confirms our theory of object selection using the Extra Trees Classifier. The importance of objects can have different meanings due to the random nature of the samples of objects.
Big data is, admittedly, an overhyped buzzword used by software and hardware companies to boost their sales. Behind the hype, however, there is a real and extremely important technology trend with imp...
This book is not just about learning the code; even if you learn to program. If you want to program professionally, learning to code is not enough; For this reason, in addition to helping you program,...
Learn how data literacy is changing the world and giving you a better understanding of life's biggest problems in this "Important and Comprehensive" Guide to Statistical Thinking (New York). The bi...
ig Data applications are growing very rapidly around the globe. This new approach to decision making takes into account data gathered from multiple sources. Here my goal is to show how these diverse s...