Change language

Python | Decision tree regression using sklearn

| | | |

Decision tree algorithm falls under the category of supervised learning algorithms. It works for both continuous and categorical output variables.

The branches / edges represent the result of a node, and the nodes have either:

  1. Conditions [Decision Nodes]
  2. Result [End Nodes]

The branches / edges represent the true / false statement and make a decision based on this in the example below, which shows a decision tree that evaluates to the smallest of three numbers :

Decision Tree Regression:
Decision Tree Regression tracks object features and trains the model in the tree structure to predict future data for meaningful continuous inference. Continuous output means that the output / result is not discrete, that is, it is not represented as just a discrete, known set of numbers or values.

An example of discrete output: a weather forecasting model that predicts whether it will rain on a specific day. 
Example of Continuous Output: a profit forecasting model that indicates the probable profit that can be made from the sale of a product.

Here continuous values ​​are forecast using a decision tree regression model

Let’s see a step-by-step implementation of —

  • Step 1: Import the required libraries.

    # import numpy package for arrays and stuff

    import numpy as np 

     
    # import matplotlib.pyplot to plot our result

    import matplotlib.pyplot as plt

     
    # import pandas for impo mouth of CSV files

    import pandas as pd 

  • Step 2: Initialize and print the dataset.

    # data import
    # dataset = pd.read_csv (& # 39; Data.csv & # 39;)
    # alternatively open .csv file to read data

     

    dataset = np.array (

    [[ ’Asset Flip’ , 100 , 1000 ],

    [ ’Text Based’ , 500 , 3000 ],

    [ ’Visual Novel’ , 1500 , 5000 ],

    [ ’2D Pixel Art’ , 3500 , 8000 ],

    [ ’2D Vector Art’ , 5000 , 6500 ],

    [ ’Strategy’ , 6000 , 7000 ],

    [ ’ First Person Shooter’ , 8000 , 15000 ],

    [ ’Simulator’ , 9500 , 20000 ],

    [ ’Racing’ , 12000 , 21000 ],

    [ ’RPG’ , 14000 , 25000 ],

    [ ’Sandbox’ , 15500 , 27000 ],

    [ ’Open-World’ , 16500 , 30000 ],

    [ ’MMOFPS’ , 25000 , 52000 ],

    [ ’MMORPG’ , 30000 , 80000 ]

    ])

     
    # print dataset < / p>

    print (dataset) 

  • Step 3: Select all rows and column 1 from dataset to "X".

    # select all rows by: and column 1
    # in 1: 2 representing features

    X = dataset [:, 1 : 2 ]. astype ( int

     
    # print X

    print (X)

  • Step 4: Select all rows and column 2 from dataset in "y".

    # select all lines by: and column 2
    # 2 to Y representing the labels

    y = dataset [:, 2 ]. astype ( int

      
    # print from

    print (y)

  • Step 5: Mouth update the decision tree regressor into the dataset

    # import regressor

    from sklearn.tree import DecisionTreeRegressor 

     
    # create a regressor object

    regressor = DecisionTreeRegressor (random_state = 0

     
    # install regressor with X and Y data
    regressor.fit (X, y)

     

  • Step 6: predicting a new value

    # predicting a new value

     
    # check the output by changing values ​​like 3750

    y_pred = regressor.predict ( 3750 )

     
    # print predicted price

    print ( "Predicted price :% d " % y_pred) 

  • Step 7: Rendering the result

    # arange to create a range of values ​​
    # from minimum X value to maximum X value
    # with a difference of 0.01 between the two
    # consecutive values ​​

    X_grid = np.arange ( min (X), max (X), 0.01 )

     
    # change the form to convert the data to
    # len (X_grid) * 1 array, i.e. make
    # column of X_grid values ​​

    X_grid = X_grid.reshape (( len (X_grid) , 1 )) 

     
    # scatter plot for raw data

    plt.scatter (X, y, color = ’red’ )

     
    # predicted data plot

    plt. plot (X_grid, regressor.predict (X_grid), color = ’blue’

      
    # specify a title

      plt.title ( ’Profit to Production Cost (Decision Tree Regression)’

     
    # specify an X-axis label

    plt.xlabel ( ’Production Cost’ )

     
    # specify Y-axis label

    plt.ylabel ( ’Profit’ )

      
    # show plot
    plt.show ()

  • Step 8: The tree is finally exported and shown in the TREE STRUCTURE below, rendered using http:// www.webgraphviz.com/ by copying data from the tree.dot file.

    < / table>

    Exit (decision tree):

    Shop

    Learn programming in R: courses

    $

    Best Python online courses for 2022

    $

    Best laptop for Fortnite

    $

    Best laptop for Excel

    $

    Best laptop for Solidworks

    $

    Best laptop for Roblox

    $

    Best computer for crypto mining

    $

    Best laptop for Sims 4

    $

    Latest questions

    NUMPYNUMPY

    Common xlabel/ylabel for matplotlib subplots

    12 answers

    NUMPYNUMPY

    How to specify multiple return types using type-hints

    12 answers

    NUMPYNUMPY

    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    12 answers

    NUMPYNUMPY

    Flake8: Ignore specific warning for entire file

    12 answers

    NUMPYNUMPY

    glob exclude pattern

    12 answers

    NUMPYNUMPY

    How to avoid HTTP error 429 (Too Many Requests) python

    12 answers

    NUMPYNUMPY

    Python CSV error: line contains NULL byte

    12 answers

    NUMPYNUMPY

    csv.Error: iterator should return strings, not bytes

    12 answers


    Wiki

    Python | How to copy data from one Excel sheet to another

    Common xlabel/ylabel for matplotlib subplots

    Check if one list is a subset of another in Python

    sin

    How to specify multiple return types using type-hints

    exp

    Printing words vertically in Python

    exp

    Python Extract words from a given string

    Cyclic redundancy check in Python

    Finding mean, median, mode in Python without libraries

    cos

    Python add suffix / add prefix to strings in a list

    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    Python - Move item to the end of the list

    Python - Print list vertically

    # import export_graphviz

    from sklearn.tree import export_graphviz 

     
    # export the decision tree to tree.dot
    # to render the plot anywhere

    export_graphviz (regressor, out_file = ’tree.dot’ ,

      feature_names = [ ’ Production Cost’ ])