Python | Decision tree regression using sklearn

Decision tree algorithm falls under the category of supervised learning algorithms. It works for both continuous and categorical output variables.

The branches / edges represent the result of a node, and the nodes have either:

  1. Conditions [Decision Nodes]
  2. Result [End Nodes]

The branches / edges represent the true / false statement and make a decision based on this in the example below, which shows a decision tree that evaluates to the smallest of three numbers :

Decision Tree Regression:
Decision Tree Regression tracks object features and trains the model in the tree structure to predict future data for meaningful continuous inference. Continuous output means that the output / result is not discrete, that is, it is not represented as just a discrete, known set of numbers or values.

An example of discrete output: a weather forecasting model that predicts whether it will rain on a specific day. 
Example of Continuous Output: a profit forecasting model that indicates the probable profit that can be made from the sale of a product.

Here continuous values ​​are forecast using a decision tree regression model

Let`s see a step-by-step implementation of —

  • Step 1: Import the required libraries.

    # import numpy package for arrays and stuff

    import numpy as np 

     
    # import matplotlib.pyplot to plot our result

    import matplotlib.pyplot as plt

     
    # import pandas for impo mouth of CSV files

    import pandas as pd 

  • Step 2: Initialize and print the dataset.

    # data import
    # dataset = pd.read_csv (& # 39; Data.csv & # 39;)
    # alternatively open .csv file to read data

     

    dataset = np.array (

    [[ `Asset Flip` , 100 , 1000 ],

    [ `Text Based` , 500 , 3000 ],

    [ `Visual Novel` , 1500 , 5000 ],

    [ `2D Pixel Art` , 3500 , 8000 ],

    [ `2D Vector Art` , 5000 , 6500 ],

    [ `Strategy` , 6000 , 7000 ],

    [ ` First Person Shooter` , 8000 , 15000 ],

    [ `Simulator` , 9500 , 20000 ],

    [ `Racing` , 12000 , 21000 ],

    [ `RPG` , 14000 , 25000 ],

    [ `Sandbox` , 15500 , 27000 ],

    [ `Open-World` , 16500 , 30000 ],

    [ `MMOFPS` , 25000 , 52000 ],

    [ `MMORPG` , 30000 , 80000 ]

    ])

     
    # print dataset < / p>

    print (dataset) 

  • Step 3: Select all rows and column 1 from dataset to "X".

    # select all rows by: and column 1
    # in 1: 2 representing features

    X = dataset [:, 1 : 2 ]. astype ( int

     
    # print X

    print (X)

  • Step 4: Select all rows and column 2 from dataset in "y".

    # select all lines by: and column 2
    # 2 to Y representing the labels

    y = dataset [:, 2 ]. astype ( int

      
    # print from

    print (y)

  • Step 5: Mouth update the decision tree regressor into the dataset

    # import regressor

    from sklearn.tree import DecisionTreeRegressor 

     
    # create a regressor object

    regressor = DecisionTreeRegressor (random_state = 0

     
    # install regressor with X and Y data
    regressor.fit (X, y)

     

  • Step 6: predicting a new value

    # predicting a new value

     
    # check the output by changing values ​​like 3750

    y_pred = regressor.predict ( 3750 )

     
    # print predicted price

    print ( "Predicted price :% d " % y_pred) 

  • Step 7: Rendering the result

    # arange to create a range of values ​​
    # from minimum X value to maximum X value
    # with a difference of 0.01 between the two
    # consecutive values ​​

    X_grid = np.arange ( min (X), max (X), 0.01 )

     
    # change the form to convert the data to
    # len (X_grid) * 1 array, i.e. make
    # column of X_grid values ​​

    X_grid = X_grid.reshape (( len (X_grid) , 1 )) 

     
    # scatter plot for raw data

    plt.scatter (X, y, color = `red` )

     
    # predicted data plot

    plt. plot (X_grid, regressor.predict (X_grid), color = `blue`

      
    # specify a title

      plt.title ( `Profit to Production Cost (Decision Tree Regression)`

     
    # specify an X-axis label

    plt.xlabel ( `Production Cost` )

     
    # specify Y-axis label

    plt.ylabel ( `Profit` )

      
    # show plot
    plt.show ()

  • Step 8: The tree is finally exported and shown in the TREE STRUCTURE below, rendered using http: // www.webgraphviz.com/ by copying data from the tree.dot file.

    < / table>

    Exit (decision tree):


    # import export_graphviz

    from sklearn.tree import export_graphviz 

     
    # export the decision tree to tree.dot
    # to render the plot anywhere

    export_graphviz (regressor, out_file = `tree.dot` ,

      feature_names = [ ` Production Cost` ])