Random forest regression in Python



Random Forest — it is an ensemble method capable of performing both regression and classification tasks using multiple decision trees and a technique called Bootstrap Aggregation, commonly known as batching . The basic idea is to combine multiple decision trees in determining the end result, rather than relying on separate decision trees. 
Fit:

  • Select at random K data points from the training set.
  • Build a decision tree associated with these K data points .
  • Select the number of trees you want to build and repeat steps 1 and 2.
  • For a new data point, have each of your Ntree trees predict the Y value for the data point , and assign the new data point the average of all predicted Y values.

Below is a step-by-step Python implementation. 
Step 1: Import the required libraries.

# Library import

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Step 2: Import and print dataset

data = pd.read_csv ( `Salaries.csv` )

print (data)


Step 3: Select all rows and column 1 from dataset in x and all rows and column 2 as y

x = data.iloc [:, 1: 2] .values 
print (x)
y = data.iloc [:, 2] .values 



Step 4: Install the Random Forest regressor into the dataset

# Fitting random forest regression to dataset
# import regressor

from sklearn.ensemble import RandomForestRegressor

  

  # create regressor object

regressor = RandomForestRegressor (n_estimators = 100 , random_state = 0 )

 
# install a regressor with x and y data
regressor.fit (x, y) 


Step 5: predicting a new result

y_pred = regressor.predict ( 6.5 # check the output by changing the values ​​

Step 6: Rendering the result

# Visualize random forest regression results

 
# arange to create a range of values ​​
# from minimum x to maximum
# x value with 0.01 difference
# between two consecutive values ​​

X_grid = np .arange ( min (x), max (x), 0.01

 
# reshape to convert data to array len (X_grid) * 1,
# i.e. make a column from X_grid value

X_grid = X_grid.reshape (( len (X_grid), 1 ))

 
# Scatter plot for source data

plt.scatter (x, y, color = `blue`

  
# predicted data plot
plt.plot (X_grid, regressor.predict (X_grid), 

color = ` green`

plt.title ( `Random Forest Regression` )

plt.xlabel ( `Position level` )

plt.ylabel ( ` Salary` )

plt.show ()