Setting SVM Hyperparameter Using GridSearchCV | ML

SVM also has some hyperparameters (for example, which C or gamma values ​​to use), and finding the optimal hyperparameter is very difficult. But you can find it just by trying all the combinations and see which parameters work best. The basic idea is to create a grid of hyperparameters and just try all combinations of them (hence this method is called Gridsearch , but don`t worry! We don`t have to do it manually because Scikit-learn has this functionality built into GridSearchCV.

GridSearchCV uses a dictionary that describes parameters that can be tested on the model to train it. The parameter grid is defined as a dictionary where the keys — these are parameters, and the values ​​— are parameters to check.

This article shows you how to use the method of the search GridSearchCV, to find the optimal hyperparameters and therefore improve the accuracy / prediction results.

Import the required libraries and get the data —

We will use the built-in breast cancer dataset from Scikit Learn. We can get with the function z load:

import pandas as pd

import numpy as np

from sklearn.metrics import classification_report, confusion_matrix

from sklearn.datasets import load_breast_cancer

from sklearn.svm import SVC

 

cancer = load_breast_cancer ()

 
# The dataset is represented as a dictionary:

print (cancer. keys ())

 dict_keys ([`data`,` target`, `target_names`,` DESCR`, `feature_names`,` filename`]) 

Now we will extract all features into a new dataframe and our target features into a separate dataframe.

df_feat = pd.DataFrame (cancer [ `data` ],

  columns = cancer [ `feature_names` ])

 
# cancer column - our goal

df_target = pd.DataFrame (cancer [ `target` ], 

  columns = [ ` Cancer` ])

 

print ( "Feature Variables:" )

print (df_feat.info ())

print ( " Dataframe looks like: " )

print (df_feat.head ())

Train Test Split

Now we will split our data into training and test suites with a ratio of 70: 30.

from sklearn.model_selection import train_test_split

 

X_train, X_test, y_train, y_test = train_test_split (

df_feat, np.ravel (df_target),

test_size = 0.30 , random_state = 101 )

Train the support vector classifier without tweaking the hyperparameters —

First, we will train our model, calling the standard SVC () function without setting the hyperparameter, and we will see its classification and confusion matrix.

# train the model for the train

model = SVC ()

model.fit (X_train, y_train)

 
# print forecast results

predictions = model.predict (X_test)

print (classification_report (y_test, predictions))

We got 61% accuracy, but did you notice something strange?
Note that the return and precision for class 0 is always 0. This means that the classifier always classifies everything into one class, that is, class 1! This means our model needs to tweak the parameters.

That`s when the usefulness of GridSearch comes into picture. We can search for parameters using GridSearch!

Use GridsearchCV

One of the great things about GridSearchCV is that it is a meta-evaluator. It takes an evaluator like SVC and creates a new evaluator that behaves exactly the same — in this case, as a classifier. You have to add refit = True and choose verbose for whatever number you want: the larger the number, the more verbose (verbose means text output describing the process).

from sklearn.model_selection import GridSearchCV

 
# define a range of parameters

param_grid = { ` C` : [ 0.1 , 1 , 10 , 100 , 1000 ], 

`gamma` : [ 1 , 0.1 , 0.01 , 0.001 , 0.0001 ],

`kernel` : [ `rbf` ]} 

  

grid = GridSearchCV (SVC (), param_grid, refit = True , verbose = 3 )

  
# fitting the model for grid search
grid.fit ( X_train, y_train)

What fits , it`s a little more complicated than usual. First, the same loop is performed with cross validation to find the best combination of parameters. Having obtained the best combination, it does the fit again on all data passed for fitting (no cross validation) to build a single new model using the best parameter setting.

You can check the best parameters found by GridSearchCV in the best_params_ attribute , and the best mark in the best_estimator_ attribute:

# display the best parameter after setting

print (grid.best_params_)

 
# print how our model looks after setting hyperparameters

print (grid. best_estimator_)

Then m, you can rerun the predictions and view the classification report on this mesh object as if you were working with a conventional model.

grid_predictions = grid.predict (X_test)

 
# print classification report

print (classification_report ( y_test, grid_predictions))

We got almost 95% predictable result.