Linear Regression — it is a supervised learning based machine learning algorithm. It performs a regression task. Regression models the prediction target based on the explanatory variables. It is mainly used to figure out the relationship between variables and forecasting. Different regression models differ depending on the type of relationship between the dependent and explanatory variables they are looking at and the number of explanatory variables used.
This article will demonstrate how to use various Python libraries to implement linear regression on a given set data. We will demonstrate a binary linear model as it will be easier to visualize.
In this demo, the model will use the Gradient Descent for training. You can find out about this
import
pandas as pd
import
seaborn as sns
import
matplotlib.pyplot as plt
from
sklearn
import
preprocessing, svm
from
sklearn.model_selection
import
train_test_split
from
sklearn.linear_model
import
LinearRegression
Step 2: Read the dataset
You can download the dataset here.
Step 3: Examining data scatter
cd C: UsersDevDesktopKaggleSalinity
# Change the file reading location to match the dataset location
df
=
pd.read_csv (
’bottle.csv’
)
df_binary
=
df [[
’Salnty’
,
’ T_degC’
]]
# Take only two selected attributes from the dataset
df_binary.columns
=
[
’Sal’
,
’ Temp’
]
# Renaming columns for easier coding
df_binary.head ()
# Display only the first lines along with the column names
sns.lmplot (x
=
"Sal"
, y
=
" Temp "
, data
=
df_binary, order
=
2
, ci
=
None
)
# Plotting data scatter
Step 4: Clean up the data
< code class = "plain"> df_binary.fillna (method 
Step 5: Train Our Model

Step 6: Examine our results

The low accuracy of our model indicates that our regression model did not fit the existing ones very well data. This suggests that our data is not suitable for linear regression. But sometimes a dataset can accept a linear regressor if we only consider a part of it. Let’s check it out.
Step 7: Working with a smaller dataset
df_binary500
=
df_binary [:] [:
500
]
# Select the first 500 lines of data
sns.lmplot (x
=
"Sal"
, y
=
"Temp"
, data
=
df_binary500,
order
=
2
, ci
= None
)
We already see that the first 500 lines follow a linear models. Continue with the same steps as before.

