ML | Implementing L1 and L2 regularization using Sklearn



This article aims to implement L2 and L1 regularization for linear regression using the Ridge and Lasso modules of the Sklearn library from Python. 
Dataset — Dataset on House Prices .

Step 1: Import required libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, Ridge, Lasso

from sklearn.model_selection import train_test_split, cross_val_score

from statis tics import mean

Step 2: Loading and cleaning data

# Change the desktop locations for data location
cd C: UsersDevDesktopKaggleHouse Prices

 
# Loading data into Pandas DataFrame

data = pd.read_csv ( `kc_house_data.csv` )

 
# Discarding numerically meaningless variables

dropColumns = [ ` id` , `date` , `zipcode` ]

data = data.drop (dropColumns, axis = 1 )

  
# Separate dependent and independent variables

y = data [ ` price` ]

X = data.drop ( ` price` , axis = 1 )

 
# Split data into training and test set

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = 0.25 )

Step 3: Build and evaluate different models

a) Linear regression:

# Building and fitting a linear regression model

linearModel = LinearRegression ()

linearModel.fit (X_train, y_train)

  
# Evaluate the linear regression model

print (linearModel.score (X_test, y_test))

b) Ridge (L2) regression:

# List to support various cross-validation metrics

cross_val_scores_ridge = []

 
# List to maintain different alpha values ​​

alpha = []

 
# The loop for the computation is different x cross validation score values ​​

for i in range ( 1 , 9 ):

  ridgeModel = Ridge (alpha = i * 0.25 )

ridgeModel.fit (X_train, y_train)

scores = cross_val_score (ridgeModel, X, y, cv = 10 )

avg_cross_val_score = mean (scores) * 100

cross_val_scores_ridge.append (avg_cross_val_score)

alpha.append (i * 0.25 )

 
# Loop for printing different cross-validation score values ​​

for i in range ( 0 , len (alpha)):

  print ( str (alpha [i]) + `:` + str (cross_val_scores_ridge [i]))

From the above output, we can conclude that the best alpha value for the data is 2.

# Building and installing the Ridge Regression model

ridgeModelChosen = Ridge (alpha = 2 )

ridgeModelChosen.fit (X_train, y_train)

 
# Ridge regression model estimation

print (ridgeModelChosen.score (X_test, y_test))

c) Lasso (L1) regression:

From the above output, we can conclude that the best lambda value is 2.

 

# List to maintain cross validation scores

cross_val_scores_lasso = []

 
# List for maintaining different lambda values ​​

Lambda = [ ]

 
# Loop for calculating cross-validation results

for i in range ( 1 , 9 ):

lassoModel = Lasso (alpha = i * 0.25 , tol = 0.0925 )

  lassoModel.fit (X_train, y_train)

  scores = cross_val_score (lassoModel, X, y, cv = 10 )

  avg_cross_val_score = mean (scores) * 100

  cross_val_scores_lasso .append (avg_cross_val_score)

Lambda.append (i * 0.25 )

  
# Cycle for printing different values ​​of cross pr overrides

for i in range ( 0 , len (alpha)):

print ( str (alpha [i]) + `:` + str (cross_val_scores_lasso [i]))

# Build and install the Lasso regression model

lassoModelChosen = Lasso (alpha = 2 , tol = 0.0925 )

lassoModelChosen.fit (X_train, y_train)

 
# Evaluate the Lasso regression model

print (lassoModelChosen.score (X_test, y_test) )

Step 4 : Compare and render results

# Build two lists for rendering

models = [ ` Linear Regression` , `Ridge Regression` , ` Lasso Regression` ]

scores = [linearModel.score (X_test, y_test),

ridgeModelChosen.score (X_test, y_test),

  lassoModelChosen.score (X_test, y_test)]

 
# Created no dictionary for comparing scores

mapping = {}

mapping [ `Linear Regreesion` ] = linearModel.score (X_test, y_test)

mapping [ `Ridge Regreesion` ] = ridgeModelChosen.score (X_test, y_test)

mapping [ `Lasso Regression` ] = lassoModelChosen.score (X_test, y_test )

 
# Print scores for different models

for key, val in < / code> mapping.items ():

print ( str (key) + `: ` + str (val))

# Building results
plt.bar (models, scores)

plt.xlabel ( `Regression Models` )

plt.ylabel ( ` Sco re` )

plt.show ()