Change language

Python | Linear regression using sklearns

| | |

Linear Regression — it is a supervised learning based machine learning algorithm. It performs a regression task. Regression models the prediction target based on the explanatory variables. It is mainly used to figure out the relationship between variables and forecasting. Different regression models differ depending on the type of relationship between the dependent and explanatory variables they are looking at and the number of explanatory variables used.

This article will demonstrate how to use various Python libraries to implement linear regression on a given set data. We will demonstrate a binary linear model as it will be easier to visualize.

In this demo, the model will use the Gradient Descent for training. You can find out about this

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn import preprocessing, svm

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

Step 2: Read the dataset

You can download the dataset here.

Step 3: Examining data scatter

cd C: UsersDevDesktopKaggleSalinity

 
# Change the file reading location to match the dataset location

df = pd.read_csv ( ’bottle.csv’ )

df_binary = df [[ ’Salnty’ , ’ T_degC’ ]]

 
# Take only two selected attributes from the dataset

df_binary.columns = [ ’Sal’ , ’ Temp’ ]

 
# Renaming columns for easier coding
df_binary.head ()

 
# Display only the first lines along with the column names

sns.lmplot (x = "Sal" , y = " Temp " , data = df_binary, order = 2 , ci = None )

  
# Plotting data scatter

Step 4: Clean up the data

# Eliminate NaNs or missing input numbers

< code class = "plain"> df_binary.fillna (method = ’ffill’ , inplace = True )

Step 5: Train Our Model

X = np.array (df_binary [ ’Sal’ ]). reshape ( - 1 , 1 )

y = np.array (df_binary [ ’Temp’ ]). reshape ( - 1 , 1 )

 
# Separating data into independent and dependent variables
# Convert each data frame to a NumPy array
# since each data frame contains only one column

df_binary.dropna (inplace = True )

 
# Delete any lines with Nan values ​​

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.25 )

 
  # Divide data into training and test data

regr = LinearRegression ()

 
regr.fit (X_train, y_train)

print (regr.score (X_test, y_test))

Step 6: Examine our results

y_pred = regr.predict (X_test)

plt. scatter (X_test, y_test, color = ’b’ )

plt.plot (X_t est, y_pred, color = ’k’ )

 
plt.show ()
# Scatter of data by predicted values ​​

The low accuracy of our model indicates that our regression model did not fit the existing ones very well data. This suggests that our data is not suitable for linear regression. But sometimes a dataset can accept a linear regressor if we only consider a part of it. Let’s check it out.

Step 7: Working with a smaller dataset

df_binary500 = df_binary [:] [: 500 ]

  
# Select the first 500 lines of data

sns.lmplot (x = "Sal" , y = "Temp" , data = df_binary500,

order = 2 , ci = None )

We already see that the first 500 lines follow a linear models. Continue with the same steps as before.

df_binary500.fillna (method = ’ffill’ , inplace = True )

 

X = np.array (df_binary500 [ ’ Sal’ ]). reshape ( - 1 , 1 )

y = np.array (df_binary500 [ ’Temp’ ]). reshape ( - 1 , 1 )

 

df_binary500.dropna (inplace = True )

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.25 )

 

regr = LinearRegression ()

regr.fit (X_train, y_train)

print (regr.score (X_test, y_test))

y_pred = regr.predict (X_test)

plt.scatter (X_test, y_test, color = ’b’ )

plt.plot (X_test, y_pred, color = ’k’ )

 
plt.show ()

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically