SVM also has some hyperparameters (for example, which C or gamma values to use), and finding the optimal hyperparameter is very difficult. But you can find it just by trying all the combinations and see which parameters work best. The basic idea is to create a grid of hyperparameters and just try all combinations of them (hence this method is called Gridsearch , but don’t worry! We don’t have to do it manually because Scikit-learn has this functionality built into GridSearchCV.
GridSearchCV
uses a dictionary that describes parameters that can be tested on the model to train it. The parameter grid is defined as a dictionary where the keys — these are parameters, and the values — are parameters to check.
This article shows you how to use the method of the search GridSearchCV, to find the optimal hyperparameters and therefore improve the accuracy / prediction results.
Import the required libraries and get the data —
We will use the built-in breast cancer dataset from Scikit Learn. We can get with the function z load:
|
dict_keys ([’data’,’ target’, ’target_names’,’ DESCR’, ’feature_names’,’ filename’])
Now we will extract all features into a new dataframe and our target features into a separate dataframe.
|
|
Train Test Split
Now we will split our data into training and test suites with a ratio of 70: 30.
|
Train the support vector classifier without tweaking the hyperparameters —
First, we will train our model, calling the standard SVC () function without setting the hyperparameter, and we will see its classification and confusion matrix.
|
We got 61% accuracy, but did you notice something strange?
Note that the return and precision for class 0 is always 0. This means that the classifier always classifies everything into one class, that is, class 1! This means our model needs to tweak the parameters.
That’s when the usefulness of GridSearch comes into picture. We can search for parameters using GridSearch!
Use GridsearchCV
One of the great things about GridSearchCV is that it is a meta-evaluator. It takes an evaluator like SVC and creates a new evaluator that behaves exactly the same — in this case, as a classifier. You have to add refit = True and choose verbose for whatever number you want: the larger the number, the more verbose (verbose means text output describing the process).
|
What fits , it’s a little more complicated than usual. First, the same loop is performed with cross validation to find the best combination of parameters. Having obtained the best combination, it does the fit again on all data passed for fitting (no cross validation) to build a single new model using the best parameter setting.
You can check the best parameters found by GridSearchCV in the best_params_ attribute , and the best mark in the best_estimator_ attribute:
|
Then m, you can rerun the predictions and view the classification report on this mesh object as if you were working with a conventional model.
|
We got almost 95% predictable result.