Change language

Python | Decision tree regression using sklearn

| | | |

Decision tree algorithm falls under the category of supervised learning algorithms. It works for both continuous and categorical output variables.

The branches / edges represent the result of a node, and the nodes have either:

  1. Conditions [Decision Nodes]
  2. Result [End Nodes]

The branches / edges represent the true / false statement and make a decision based on this in the example below, which shows a decision tree that evaluates to the smallest of three numbers :

Decision Tree Regression:
Decision Tree Regression tracks object features and trains the model in the tree structure to predict future data for meaningful continuous inference. Continuous output means that the output / result is not discrete, that is, it is not represented as just a discrete, known set of numbers or values.

An example of discrete output: a weather forecasting model that predicts whether it will rain on a specific day. 
Example of Continuous Output: a profit forecasting model that indicates the probable profit that can be made from the sale of a product.

Here continuous values ​​are forecast using a decision tree regression model

Let’s see a step-by-step implementation of —

  • Step 1: Import the required libraries.

    # import numpy package for arrays and stuff

    import numpy as np 

     
    # import matplotlib.pyplot to plot our result

    import matplotlib.pyplot as plt

     
    # import pandas for impo mouth of CSV files

    import pandas as pd 

  • Step 2: Initialize and print the dataset.

    # data import
    # dataset = pd.read_csv (& # 39; Data.csv & # 39;)
    # alternatively open .csv file to read data

     

    dataset = np.array (

    [[ ’Asset Flip’ , 100 , 1000 ],

    [ ’Text Based’ , 500 , 3000 ],

    [ ’Visual Novel’ , 1500 , 5000 ],

    [ ’2D Pixel Art’ , 3500 , 8000 ],

    [ ’2D Vector Art’ , 5000 , 6500 ],

    [ ’Strategy’ , 6000 , 7000 ],

    [ ’ First Person Shooter’ , 8000 , 15000 ],

    [ ’Simulator’ , 9500 , 20000 ],

    [ ’Racing’ , 12000 , 21000 ],

    [ ’RPG’ , 14000 , 25000 ],

    [ ’Sandbox’ , 15500 , 27000 ],

    [ ’Open-World’ , 16500 , 30000 ],

    [ ’MMOFPS’ , 25000 , 52000 ],

    [ ’MMORPG’ , 30000 , 80000 ]

    ])

     
    # print dataset < / p>

    print (dataset) 

  • Step 3: Select all rows and column 1 from dataset to "X".

    # select all rows by: and column 1
    # in 1: 2 representing features

    X = dataset [:, 1 : 2 ]. astype ( int

     
    # print X

    print (X)

  • Step 4: Select all rows and column 2 from dataset in "y".

    # select all lines by: and column 2
    # 2 to Y representing the labels

    y = dataset [:, 2 ]. astype ( int

      
    # print from

    print (y)

  • Step 5: Mouth update the decision tree regressor into the dataset

    # import regressor

    from sklearn.tree import DecisionTreeRegressor 

     
    # create a regressor object

    regressor = DecisionTreeRegressor (random_state = 0

     
    # install regressor with X and Y data
    regressor.fit (X, y)

     

  • Step 6: predicting a new value

    # predicting a new value

     
    # check the output by changing values ​​like 3750

    y_pred = regressor.predict ( 3750 )

     
    # print predicted price

    print ( "Predicted price :% d " % y_pred) 

  • Step 7: Rendering the result

    # arange to create a range of values ​​
    # from minimum X value to maximum X value
    # with a difference of 0.01 between the two
    # consecutive values ​​

    X_grid = np.arange ( min (X), max (X), 0.01 )

     
    # change the form to convert the data to
    # len (X_grid) * 1 array, i.e. make
    # column of X_grid values ​​

    X_grid = X_grid.reshape (( len (X_grid) , 1 )) 

     
    # scatter plot for raw data

    plt.scatter (X, y, color = ’red’ )

     
    # predicted data plot

    plt. plot (X_grid, regressor.predict (X_grid), color = ’blue’

      
    # specify a title

      plt.title ( ’Profit to Production Cost (Decision Tree Regression)’

     
    # specify an X-axis label

    plt.xlabel ( ’Production Cost’ )

     
    # specify Y-axis label

    plt.ylabel ( ’Profit’ )

      
    # show plot
    plt.show ()

  • Step 8: The tree is finally exported and shown in the TREE STRUCTURE below, rendered using http:// www.webgraphviz.com/ by copying data from the tree.dot file.

    < / table>

    Exit (decision tree):

    Shop

    Learn programming in R: courses

    $

    Best Python online courses for 2022

    $

    Best laptop for Fortnite

    $

    Best laptop for Excel

    $

    Best laptop for Solidworks

    $

    Best laptop for Roblox

    $

    Best computer for crypto mining

    $

    Best laptop for Sims 4

    $

    Latest questions

    NUMPYNUMPY

    psycopg2: insert multiple rows with one query

    12 answers

    NUMPYNUMPY

    How to convert Nonetype to int or string?

    12 answers

    NUMPYNUMPY

    How to specify multiple return types using type-hints

    12 answers

    NUMPYNUMPY

    Javascript Error: IPython is not defined in JupyterLab

    12 answers


    Wiki

    Python OpenCV | cv2.putText () method

    numpy.arctan2 () in Python

    Python | os.path.realpath () method

    Python OpenCV | cv2.circle () method

    Python OpenCV cv2.cvtColor () method

    Python - Move item to the end of the list

    time.perf_counter () function in Python

    Check if one list is a subset of another in Python

    Python os.path.join () method

    # import export_graphviz

    from sklearn.tree import export_graphviz 

     
    # export the decision tree to tree.dot
    # to render the plot anywhere

    export_graphviz (regressor, out_file = ’tree.dot’ ,

      feature_names = [ ’ Production Cost’ ])