ML | Boston Housing Kaggle Challenge with Linear Regression

Dataset description taken from

Let`s make a linear regression model predicting prices on housing

Libraries input and dataset.

# Importing libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

 
# Data import

from sklearn.datasets < / code> import load_boston

boston = load_boston ()

Boston data entry and function_names form

boston.data.shape

boston.feature_names

Converting nd array data to data frame and adding data names into data

data = pd.DataFrame (boston.data)

data.columns = boston.feature_names

 

data.head ( 10 )

Adding the “Price” column to the dataset

# Adding the “Price” column (target) to the data
boston.target.shape

data [ `Price` ] = boston.target

data.head ()

Boston dataset description

data.describe ()

Boston dataset information

data.info ()

Getting input and output data and further dividing the data into a dataset for training and testing.

# Input data

x = boston.data

 
# Output

y = boston.target

 

 
# splitting data into training and test suites data.

from sklearn.cross_validation import train_test_split

xtrain, xtest, ytrain, ytest = train_test_split (x, y, test_size = 0.2 ,

  random_state = 0 )

  

print ( "xtrain shape:" , xtrain.shape)

print ( "xtest shape :" , xtest.shape)

print ( " ytrain shape: " , ytrain.shape)

print ( "ytest shape :" , ytest.shape)

Applying a linear regression model to a dataset and price prediction.

# Fitting the ML regression model to the learning model

from sklearn.linear_model import LinearRegression

regressor = LinearRegression ()

regressor.fit (xtrain, ytrain)

 
# predicting test case results

y_pred   = regressor.predict (xtest)

Build a scatter plot to display the forecast results — ytrue value versus y_pred value

# Scatter plot to display the forecast
# results - ytrue value versus y_pred value

plt.scatter (ytest, y_pred, c = `green` )

plt.xlabel ( "Price: in $ 1000`s" )

plt.ylabel ( "Predicted value" )

plt .title ( "True value vs predicted value: Linear Regression" )

plt.show ()


Linear regression results, i.e. root mean square error.

# Linear regression results.

from sklearn.metrics import mean_squared_error

mse = mean_squared_error (ytest, y_pred)

print ( "Mean Square Error:" , mse)