Python | Polynomial Regression Implementation

Why Polynomial Regression:

  • There are some relationships that the researcher suggests are curvilinear. It is clear that such cases will include a polynomial term.
  • Inspection of residuals. If we try to fit a linear model to curved data, the scatter plot of the residuals (y-axis) on the predictor (x-axis) will have many positive residuals in the middle. Therefore, this is not appropriate in such a situation.
  • Conventional multiple linear regression analysis assumes that all explanatory variables are independent. This assumption is not fulfilled in a polynomial regression model.

Using polynomial regression:
They are mainly used to define or describe a non-linear phenomenon such as:

  • The rate of tissue growth.
  • The progression of disease epidemics
  • The distribution of carbon isotopes in lake sediments

The main purpose of the regression analysis is modeling the expected value of the dependent variable y in terms of the value of the independent variable x. In simple regression, we used the following equation —

  y  = a + bx + e 

Here y — dependent variable, a — intersection y, b — slope, and e — error rate.

In many cases, this linear model will not work. For example, if we analyze the production of a chemical synthesis in terms of the temperature at which the synthesis occurs, then we use a quadratic model.

  y  = a + b1x + b2 ^ 2 + e 

Here y — dependent variable of x, a — intercept y, and e — error rate.

In general, we can simulate it for the nth value.

  y  = a + b1x + b2x ^ 2 + ... . + bnx ^ n 

Since the regression function is linear in terms of unknown variables, therefore, these models are linear in terms of estimation.

Therefore, using the least squares method, let`s calculate the answer value which is y.

Polynomial Regression in Python:
To get the dataset used for polynomial regression analysis, press here .

Step 1: Importing libraries and datasets
Import important libraries and dataset that we use to perform polynomial regression.

# Importing libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Import dataset

datas = pd.read_csv ( `data.csv` )


Step 2: Split the dataset into 2 components

Split the dataset into two components, i.e. X and yX will contain a column between 1 and 2.y will contain column 2.

X = datas.iloc [:, 1 : 2 ]. Values ​​

y = datas.iloc [:, 2 ]. values ​​

Step 3: Fitting the linear regression to the dataset

Fitting the linear regression model in two components.

# Fit linear regression to the dataset

from sklearn. linear_model import LinearRegression

lin = LinearRegression () (X , y)

Step 4: Fitting a polynomial regression to a dataset

Fitting a polynomial regression model to two components X and y.

# Fitting polynomial regression to the dataset

from sklearn.preprocessing import PolynomialFeatures


poly = PolynomialFeatures (degree = 4 )

X_poly = poly.fit_transform (X) (X_poly, y)

lin2 = LinearRegression () (X_poly, y)

Step 5: In this step, we visualize the linear regression results using scatter plot.

# Visualize linear regression results

plt.scatter (X , y, color = `blue` )


plt.plot (X, lin.predict (X), color = ` red` )

plt.title ( `Linear Regression` )

plt.xlabel ( `Temperature` )

plt.ylabel ( `Pressure` ) ()

Step 6: Render the polynomial regression results using a scatter plot.

# Visualization of polynomial regression results

plt.scatter (X, y, color = `blue` )


plt.plot (X, lin2.predict (poly.fit_transform (X)), color = `red` )

plt.title ( `Polynomial Regression` )

plt.xlabel ( `Temperature` )

plt.ylabel ( `Pressure` ) ()

Step 7: Predicting a new result with linear and polynomial regression.

# Predict a new result using linear regression

lin.predict ( 110.0 )

# Predict new result using polynomial regression

lin2.predict (poly.fit_transform ( 110.0 ))

Benefits of using polynomial regression:

  • A wide range of functions can be tailored for this .
  • A polynomial generally matches a wide range of curvature.
  • A polynomial provides the best approximation of the relationship between dependent and independent.

Disadvantages polynomial regression

  • They are too sensitive to outliers.
  • Having one or two outliers in the data can seriously affect the results of nonlinear analysis.
  • Also, unfortunately, there are fewer model validation tools for detecting outliers in nonlinear regression than for linear regression.