Change language

Setting SVM Hyperparameter Using GridSearchCV | ML

SVM also has some hyperparameters (for example, which C or gamma values ​​to use), and finding the optimal hyperparameter is very difficult. But you can find it just by trying all the combinations and see which parameters work best. The basic idea is to create a grid of hyperparameters and just try all combinations of them (hence this method is called Gridsearch , but don’t worry! We don’t have to do it manually because Scikit-learn has this functionality built into GridSearchCV.

GridSearchCV uses a dictionary that describes parameters that can be tested on the model to train it. The parameter grid is defined as a dictionary where the keys — these are parameters, and the values ​​— are parameters to check.

This article shows you how to use the method of the search GridSearchCV, to find the optimal hyperparameters and therefore improve the accuracy / prediction results.

Import the required libraries and get the data —

We will use the built-in breast cancer dataset from Scikit Learn. We can get with the function z load:

import pandas as pd

import numpy as np

from sklearn.metrics import classification_report, confusion_matrix

from sklearn.datasets import load_breast_cancer

from sklearn.svm import SVC

 

cancer = load_breast_cancer ()

 
# The dataset is represented as a dictionary:

print (cancer. keys ())

 dict_keys ([’data’,’ target’, ’target_names’,’ DESCR’, ’feature_names’,’ filename’]) 

Now we will extract all features into a new dataframe and our target features into a separate dataframe.

df_feat = pd.DataFrame (cancer [ ’data’ ],

  columns = cancer [ ’feature_names’ ])

 
# cancer column - our goal

df_target = pd.DataFrame (cancer [ ’target’ ], 

  columns = [ ’ Cancer’ ])

 

print ( "Feature Variables:" )

print (df_feat.info ())

print ( " Dataframe looks like: " )

print (df_feat.head ())

Train Test Split

Now we will split our data into training and test suites with a ratio of 70: 30.

from sklearn.model_selection import train_test_split

 

X_train, X_test, y_train, y_test = train_test_split (

df_feat, np.ravel (df_target),

test_size = 0.30 , random_state = 101 )

Train the support vector classifier without tweaking the hyperparameters —

First, we will train our model, calling the standard SVC () function without setting the hyperparameter, and we will see its classification and confusion matrix.

# train the model for the train

model = SVC ()

model.fit (X_train, y_train)

 
# print forecast results

predictions = model.predict (X_test)

print (classification_report (y_test, predictions))

We got 61% accuracy, but did you notice something strange?
Note that the return and precision for class 0 is always 0. This means that the classifier always classifies everything into one class, that is, class 1! This means our model needs to tweak the parameters.

That’s when the usefulness of GridSearch comes into picture. We can search for parameters using GridSearch!

Use GridsearchCV

One of the great things about GridSearchCV is that it is a meta-evaluator. It takes an evaluator like SVC and creates a new evaluator that behaves exactly the same — in this case, as a classifier. You have to add refit = True and choose verbose for whatever number you want: the larger the number, the more verbose (verbose means text output describing the process).

from sklearn.model_selection import GridSearchCV

 
# define a range of parameters

param_grid = { ’ C’ : [ 0.1 , 1 , 10 , 100 , 1000 ], 

’gamma’ : [ 1 , 0.1 , 0.01 , 0.001 , 0.0001 ],

’kernel’ : [ ’rbf’ ]} 

  

grid = GridSearchCV (SVC (), param_grid, refit = True , verbose = 3 )

  
# fitting the model for grid search
grid.fit ( X_train, y_train)

What fits , it’s a little more complicated than usual. First, the same loop is performed with cross validation to find the best combination of parameters. Having obtained the best combination, it does the fit again on all data passed for fitting (no cross validation) to build a single new model using the best parameter setting.

You can check the best parameters found by GridSearchCV in the best_params_ attribute , and the best mark in the best_estimator_ attribute:

# display the best parameter after setting

print (grid.best_params_)

 
# print how our model looks after setting hyperparameters

print (grid. best_estimator_)

Then m, you can rerun the predictions and view the classification report on this mesh object as if you were working with a conventional model.

grid_predictions = grid.predict (X_test)

 
# print classification report

print (classification_report ( y_test, grid_predictions))

We got almost 95% predictable result.

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically