Supervised Learning falls into two categories:
Since we realized that we need labeled data to do supervised learning. How can we get the tagged data? There are various ways to get tagged data:
Now is the time to understand algorithms that can be used to solve the problem of controlled machine learning. In this post, we will be using the popular
Note: There are few other packages as well like TensorFlow, Keras etc to perform supervised learning.
This algorithm is used to solve classification model problems … The K-Nearest Neighbor or K-NN algorithm basically creates an imaginary boundary for classifying data. When new data points arrive, the algorithm will try to predict this to the closest border.
Therefore, a larger k value means softer partition curves, resulting in less complex models. Whereas a smaller k-value tends to outperform the data and lead to complex models.
Note. It is very important to have the correct k-value when analyzing the dataset to avoid overfitting and underfitting. fitting a dataset.
Using the k-nearest neighbor algorithm, we fit historical data (or train the model) and predict the future.
The above example takes the following steps:
We have seen how we can use the K-NN algorithm to solve supervised machine learning problems. But how do you measure the accuracy of a model?
Consider the example below, where we predicted the performance of the above model:
Everything is going fine. But how do you determine the correct k-value for a dataset? Obviously, we need to be familiar with the data in order to get the range of the expected k value, but to get the exact k value, we need to test the model for each expected k value. Refer to the example shown below.
Here, in the example shown above, we create a graph to see the k value for which we have high precision.
Note. This is a method that is not used in the industry to select the correct value for n_neighbors. Instead, we tune the hyperparameter to select the value that provides the best performance. We will cover this in future posts.
In this post, we understood what supervised learning is and what its categories are. With a basic understanding of supervised learning, we examined the k-nearest neighbor algorithm that is used to solve supervised machine learning problems. We also explored measuring the accuracy of the model.