Supervised Learning falls into two categories:
Since we realized that we need labeled data to do supervised learning. How can we get the tagged data? There are various ways to get tagged data:
Now is the time to understand algorithms that can be used to solve the problem of controlled machine learning. In this post, we will be using the popular scikitlearn
.
Note: There are few other packages as well like TensorFlow, Keras etc to perform supervised learning.
This algorithm is used to solve classification model problems … The KNearest Neighbor or KNN algorithm basically creates an imaginary boundary for classifying data. When new data points arrive, the algorithm will try to predict this to the closest border.
Therefore, a larger k value means softer partition curves, resulting in less complex models. Whereas a smaller kvalue tends to outperform the data and lead to complex models.
Note. It is very important to have the correct kvalue when analyzing the dataset to avoid overfitting and underfitting. fitting a dataset.
Using the knearest neighbor algorithm, we fit historical data (or train the model) and predict the future.

The above example takes the following steps:
We have seen how we can use the KNN algorithm to solve supervised machine learning problems. But how do you measure the accuracy of a model?
Consider the example below, where we predicted the performance of the above model:
# Import required modules
from
sklearn.neighbors
import
KNeighborsClassifier
from
sklearn.model_selection
import
train_test_split
from
sklearn.datasets
import
load_iris
# Loading data
irisData
=
load_iris ()
# Create objects and target arrays
X
=
irisData.data
y
=
irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test
=
train_test_split (
X, y, test_size
=
0.2
, random_state
=
42
)
knn
= KNeighborsClassifier (n_neighbors
=
7
)
knn. fit (X_train, y_train)
# Calculate model accuracy
print
(knn.score (X_test, y_test))
Model accuracy:
Everything is going fine. But how do you determine the correct kvalue for a dataset? Obviously, we need to be familiar with the data in order to get the range of the expected k value, but to get the exact k value, we need to test the model for each expected k value. Refer to the example shown below.

Output:
Here, in the example shown above, we create a graph to see the k value for which we have high precision.
Note. This is a method that is not used in the industry to select the correct value for n_neighbors. Instead, we tune the hyperparameter to select the value that provides the best performance. We will cover this in future posts.
Summary —
In this post, we understood what supervised learning is and what its categories are. With a basic understanding of supervised learning, we examined the knearest neighbor algorithm that is used to solve supervised machine learning problems. We also explored measuring the accuracy of the model.