The CAP, commonly referred to as the Cumulative Accuracy Profile, is used to evaluate the performance of a classification model. This helps us understand and conclude about the robustness of the classification model. To visualize this, our graph depicts three different curves:
- Random plot
- Plot generated using SVM classifier or random forest classifier
- Ideal plot (ideal line)
We are working with data to understand concept.
Code: load dataset.
Data Head: User ID Gender Age Estimated Salary Purchased 0 15624510 Male 19 19000 0 1 15810944 Male 35 20000 0 2 15668575 Female 26 43000 0 3 15603246 Female 27 57000 0 4 15804002 Male 19 76000 0
Code: data input / output.
Input: Age EstimatedSalary 0 19 19000 1 35 20000 2 26 43000 3 27 57000 4 19 76000 5 27 58000 6 27 84000 7 32 150000 8 25 33000 9 35 65000
Code: Split dataset for training and testing.
Code: random forest classifier
Code: Finding the accuracy classifier.
the number of points in the range from 0 to the total number of data points in the dataset. The y-axis was stored as the total number of points for which the dependent variable from our dataset has a result of 1. A random plot can be understood as a linearly increasing relationship. An example is a model that predicts whether a product was purchased (positive result) by each person in a group of people (classifying parameter) based on factors such as their gender, age, income, etc. If group members are contacted randomly, the total number of products sold will grow linearly to a maximum value corresponding to the total number of buyers in the group. This distribution is called a "random" CAP .
Code: random model
Random forest classifier line
Code: The random forest classification algorithm is applied to the set data for line plot of random classifier .
Explanation: pred — it is a prediction made by a random classifier. We pack the predicted and test values and sort them in reverse order so that the higher values come first and then the lower values. We only extract the y_test values from the array and store them in lm . np.cumsum () creates an array of values by cumulatively adding all previous values in the array to the current value. The x values will range from 0 to a total of +1. We are adding one to the common reason arange () does do not include one in the array and we want the x-axis to be in the range 0 to the grand total.
Then we build the ideal plot (or ideal line). An accurate forecast determines exactly which group members will buy a product so that the maximum number of products sold will be reached with the minimum number of calls. The result is a curve on the CAP curve that stays flat after reaching the maximum (contact with everyone else in the group will not increase sales), which is a “perfect” CAP .
Explanation: the ideal model finds positive results in the same number of attempts as and the number of positive results. In our dataset, there are only 41 positive results, and therefore in exactly 41 the maximum is reached.
In any case, our classifier algorithm should not create the line that lies under the random line. This is considered a really bad model in this case. Since the plotted line of the classifier is close to the ideal line, we can say that our model fits really well. Take an area under the ideal area and name it AP. Take the area under the forecasting model and name it AP . Then take the ratio as aR / aP . This ratio is called Accuracy . The closer the value is to 1, the better the model. This is one way to analyze it.
Another way to analyze — project a line about 50% off the axis in the prediction model and project it onto the y-axis. Let’s say we get the projection value as X%.
-" 60%: it is a really bad model -" 60% "X "70%: it is still a bad model but better than the first case obviously -" 70% "X "80%: it is a good model -" 80% "X "90%: it is a very good model -" 90% "X "100%: it is extraordinarily good and might be one of the overfitting cases.
Thus, according to this analysis, we can determine how accurate our model is.
Link: — wikipedia.org