Supervised Learning falls into two categories:
 Refinement : Here our target variable consists of categories.
 Regression : here our target variable is continuous and we usually try to find the line of the curve.
Since we realized that we need labeled data to do supervised learning. How can we get the tagged data? There are various ways to get tagged data:
 Historically tagged data
 An experiment to get the data. We can conduct experiments to create labeled data, such as A / B testing.
 Crowdsourcing
Now is the time to understand algorithms that can be used to solve the problem of controlled machine learning. In this post, we will be using the popular scikitlearn
.
Note: There are few other packages as well like TensorFlow, Keras etc to perform supervised learning.
KNearest Neighbor Algorithm:
This algorithm is used to solve classification model problems ... The KNearest Neighbor or KNN algorithm basically creates an imaginary boundary for classifying data. When new data points arrive, the algorithm will try to predict this to the closest border.
Therefore, a larger k value means softer partition curves, resulting in less complex models. Whereas a smaller kvalue tends to outperform the data and lead to complex models.
Note. It is very important to have the correct kvalue when analyzing the dataset to avoid overfitting and underfitting. fitting a dataset.
Using the knearest neighbor algorithm, we fit historical data (or train the model) and predict the future.
Example of knearest neighbor algorithm

The above example takes the following steps:
 The knearest neighbor algorithm is imported from the scikitlearn package.
 Create feature and target variables.
 Divide the data into training and test data.
 Create a kNN model using the value neighbors.
 Train or put the data in the model.
 Predict the future.
We have seen how we can use the KNN algorithm to solve supervised machine learning problems. But how do you measure the accuracy of a model?
Consider the example below, where we predicted the performance of the above model:
# Import required modules
from
sklearn.neighbors
import
KNeighborsClassifier
from
sklearn.model_selection
import
train_test_split
from
sklearn.datasets
import
load_iris
# Loading data
irisData
=
load_iris ()
# Create objects and target arrays
X
=
irisData.data
y
=
irisData.target
# Split into training and test set
X_train, X_test, y_train, y_test
=
train_test_split (
X, y, test_size
=
0.2
, random_state
=
42
)
knn
= KNeighborsClassifier (n_neighbors
=
7
)
knn. fit (X_train, y_train)
# Calculate model accuracy
print
(knn.score (X_test, y_test))
Model accuracy:
Everything is going fine. But how do you determine the correct kvalue for a dataset? Obviously, we need to be familiar with the data in order to get the range of the expected k value, but to get the exact k value, we need to test the model for each expected k value. Refer to the example shown below.

Output:
Here, in the example shown above, we create a graph to see the k value for which we have high precision.
Note. This is a method that is not used in the industry to select the correct value for n_neighbors. Instead, we tune the hyperparameter to select the value that provides the best performance. We will cover this in future posts.
Summary —
In this post, we understood what supervised learning is and what its categories are. With a basic understanding of supervised learning, we examined the knearest neighbor algorithm that is used to solve supervised machine learning problems. We also explored measuring the accuracy of the model.