Deploying a machine learning model using Flask



To use them to predict new data, we have to deploy it over the Internet so that the outside world can use it. In this article, we will talk about how we trained a machine learning model, built a web application on it using Flask.

We have to install a lot of necessary libraries that will be used in this model. Use pip to install all libraries.

 pip install pandas pip install numpy pip install sklearn 

Decision tree —
Decision tree — it is a well-known supervised machine learning algorithm because it is easy to use, elastic and flexible. I have implemented the algorithm on an adult dataset from the UCI machine learning repository.

Retrieve the data —
You can retrieve the dataset by this link .

Getting the dataset is not the end. We have to preprocess the data, which means we need to clear the dataset. Dataset cleansing involves various types of processes such as removing missing values, filling in NA values, etc.

# import dataset

import pandas

import numpy

from sklearn import preprocessing

 

df = pandas.read_csv ( `adult.csv`

df.head ()

Output:

Dataset preprocessing —
It consists of 14 attributes and a class label indicating whether an individual`s income is less or more 50 thousand a year. These attributes range from a person`s age, working class label, to relationship status, and the race the person belongs to. Information about all attributes can be found here.

First, we find and remove all missing values ​​from the data. We have replaced the missing values ​​with the mode value in this column. There are many other ways to replace missing values, but this seemed to be the best for this dataset type.

df = df.drop ([ `fnlwgt` , `educational-num` ], axis = 1 )

  

col_names = df.columns

 

for c in col_names:

df = df.replace ( "? " , numpy. NaN)

df = df. apply ( lambda x: x.fillna (x.value_counts (). index [ 0 ]))

The machine learning algorithm cannot handle categorical data values. It can only handle numeric values.
To fit the data into the forecasting model, we need to convert the categorical values ​​to numeric values. Before doing this, we will assess whether any transformations are needed on the categorical columns.

Discreteness — is a common way to make categorical data more accurate and meaningful. We have applied discretization to the marital_status column where they are narrowed down to only married or unmarried values. Later, we will apply the label encoder on the remaining data columns. Also, there are two redundant columns {& # 39; education & # 39 ;, & # 39; educational-num & # 39;} , so we removed one of them.

df.replace ([ `Divorced` , `Married-AF-spouse`

  `Married-civ-spouse` , ` Married-spouse-absent`

`Never-married` , `Separated` , `Widowed` ],

  [ `divorced` , `married` , ` married` , `married` ,

`not married` , ` not married` , `not married` ], inplace = True )

 

category_col = [ `workclass` , ` race` , `education` , `marital-status` , `occupation` ,

  `relationship` , ` gender` , `native-country` , ` income`

labelEncoder = preprocessing.LabelEncoder ()

 

mapping_dict = {}

for col in category_col:

df [col] = labelEncoder. fit_transform (df [col])

 

le_name_mapping = dict ( zip (labelEncoder.classes_,

labelEncoder. transform (labelEncoder.classes_)))

 

mapping_dict [col] = le_name_mapping

print (mapping_dict)

Output:

{`workclass`: {`?`: 0, `Federal-gov`: 1, `Local-gov`: 2, `Never-worked`: 3, `Private`: 4, `Self-emp-inc`: 5, `Self-emp-not-inc`: 6, `State-gov `: 7,` Without-pay `: 8},` ra ce `: {` Amer-Indian-Eskimo `: 0,` Asian-Pac-Islander `: 1,` Black `: 2,` Other `: 3,` White `: 4},` education `: {` 10th `: 0,` 11th `: 1,` 12th `: 2,` 1st-4th `: 3,` 5th-6th `: 4,` 7th-8th `: 5,` 9th `: 6,` Assoc-acdm `: 7,` Assoc-voc `: 8,` Bachelors`: 9, `Doctorate`: 10, `HS-grad`: 11, `Masters`: 12,` Preschool `: 13,` Prof-school `: 14, `Some-college`: 15}, `marital-status`: {` Divorced `: 0,` Married-AF-spouse `: 1,` Married-civ-spouse `: 2,` Married-spouse-absent `: 3,` Never-married `: 4,` Separated `: 5,` Widowed `: 6},` occupation `: {`? `: 0,` Adm-clerical `: 1,` Armed-Forces`: 2, `Craft-repair`: 3, `Exec-managerial`: 4, `Farming-fishing`: 5, `Handlers-cleaners`: 6, `Machine-op-inspct`: 7, `Other-service`: 8, `Priv-house-serv`: 9, `Prof-specialty`: 10, `Protective-serv`: 11, `Sales`: 12, `Tech-support`: 13, `Transport-moving`: 14} , `relationship`: {`Husband`: 0, `Not-in-family`: 1, `Other-relative`: 2, `Own-child`: 3, `Unmarried`: 4, `Wife`: 5} , `gender`: { `Female`: 0, `Male`: 1}, `native-country`: {`?`: 0, `Cambodia`: 1, `Canada`: 2, `China`: 3, `Columbia`: 4, `Cuba`: 5, `Dominican-Republic`: 6, `Ecuador`: 7, `El-Salvador`: 8, `England`: 9, `France`: 10, `Germany`: 11, `Greece`: 12, `Guatemala`: 13, `Haiti`: 14, `Holand-Netherlands`: 15, `Honduras`: 16, `Hong`: 17, `Hungary`: 18, `India`: 19, `Iran`: 20, `Ireland`: 21, `Italy`: 22, `Jamaica`: 23, `Japan`: 24, `Laos`: 25,` Mexico `: 26,` Nicaragua `: 27,` Outlying-US (Guam -USVI-etc) `: 28,` Peru `: 29,` Philippines`: 30, `Poland`: 31, `Portugal`: 32, `Puerto-Rico`: 33, `Scotland`: 34, `South` : 35, `Taiwan`: 36, `Thailand`: 37, `Trinadad & amp; Tobago`: 38, `United-States`: 39, `Vietnam`: 40, `Yugos lavia`: 41}, `income`: { `50K`: 1}}

Fitting the model —
After preprocessing, the data is ready to be transferred to machine learning algorithm. Then we slice up the data by stripping the labels with the attributes. We have now split the dataset into two halves, one for training and one for testing. This is achieved with the train_test_split () sklearn function.

from sklearn. model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

 

X = df.values ​​[:, 0 : 12 ]

Y = df.values ​​[:, 12 ]

Here we have used the decision tree classifier as the prediction model. We have provided the training part with data for training the model.
Upon completion of training, we validate the accuracy of the model by providing some of the data to test the model.
Thanks to this, we achieve an accuracy of approximately 84%. Now, in order to use this model with new unknown data, we need to save the model so that we can predict values ​​later. To do this, we use Pickle in Python, which is a powerful algorithm for serializing and deserializing the structure of Python objects.

X_train, X_test, y_train , y_test = train_test_split (

X, Y, test_size = 0.3 , random_state = 100 )

 

dt_clf_gini = DecisionTreeClassifier (criterion = "gini" ,

  random_state = 100 ,

max_depth = 5 ,

min_samples_leaf = 5 )

 
dt_clf_gini.fit (X_train, y_train)

y_pred_gini = dt_clf_gini.predict (X_test)

 

print ( "Desicion Tree using Gini Index Accuracy is" ,

accuracy_score (y_test, y_pred_gini) * 100 )

Output:

 Desicion Tree using Gini Index Accuracy is 83.13031016480704 

Now, infusion —
Flask — it is a Python-based micro-framework used for developing small websites. Flask is very easy to build Restful APIs using Python. At this point, we have developed a model.pkl model that can predict the data class based on various data attributes. Class label — Salary & gt; = 50K or & lt; 50K .
Now we will design a web application in which the user will enter all the attribute values ​​and the data will be received by the model, based on the training given to the model, the model will predict what the salary of the person whose data has been fed should be.

HTML Form —
To predict income from various attributes, we first need to collect data (new attribute values) and then use the decision tree model we built above to predict whether the income will be more than 50K or less. Therefore, to collect data, we create an HTML form that will contain all the different options to select from each attribute. Here we have created a simple form using only HTML. If you want to make the form more interactive, you can do so.

& lt; html & gt;

& lt; body & gt;

& lt; h3 & gt; Income Prediction Form & lt; / h3 & gt;

 

& lt; div & gt;

& lt; form action = "/ result" method = "POST" & gt;

& lt; label for = "age" & gt; Age & lt; / label & gt;

& lt; input type = "text" id = "age" name = "age" & gt;

& lt; br & gt;

& lt; label for = "w_class" & gt; Working Class & lt; / label & gt;

& lt; select id = "w_class" name = "w_class" & gt;

& lt; option value = "0" & gt; Federal-gov & lt; / option & gt;

& lt; option value = "1" & gt; Local-gov & lt; / option & gt;

& lt; option value = "2" & gt; Never-worked & lt; / option & gt;

& lt; option value = "3" & gt; Private & lt; / option & gt;

& lt; option value = "4" & gt; Self-emp-inc & lt; / option & gt;

& lt; option value = "5" & gt; Self-emp-not-inc & lt; / option & gt;

& lt; option value = "6" & gt; State-gov & lt; / option & gt;

& lt; option value = "7" & gt; Without-pay & lt; / option & gt;

& lt; / select & gt;

& lt; br & gt;

& lt; label for = "edu" & gt; Education & lt; / label & gt;

& lt; select id = "edu" name = "edu" & gt;

& lt; option value = "0" & gt; 10th & lt; / option & gt;

& lt; option value = "1" & gt; 11th & lt; / option & gt;

& lt; option value = "2" & gt; 12th & lt; / option & gt;

& lt; option value = "3" & gt; 1st-4th & lt; / option & gt;

& lt; option value = "4" & gt; 5th-6th & lt; / option & gt;

& lt; option value = "5" & gt; 7th-8th & lt; / option & gt;

& lt; option value = "6" & gt; 9th & lt; / option & gt;

& lt; option value = "7" & gt; Assoc-acdm & lt; / option & gt;

& lt; option value = "8" & gt; Assoc-voc & lt; / option & gt;

& lt; option value = "9" & gt; Bachelors & lt; / option & gt;

& lt; option value = "10" & gt; Doctorate & lt; / option & gt;

& lt; option value = "11" & gt; HS-grad & lt; / option & gt;

& lt; option value = "12" & gt; Masters & lt; / option & gt;

& lt; option value = "13" & gt; Preschool & lt; / option & gt;

& lt; option value = "14" & gt; Prof-school & lt; / option & gt;

& lt; option value = "15" & gt; 16 - Some-college & lt; / option & gt;

& lt; / select & gt;

& lt; br & gt;

& lt; label for = "martial_stat" & gt; Marital Status & lt; / label & gt;

& lt; select id = "martial_stat" name = "martial_stat" & gt;

& lt; option value = "0" & gt; divorced & lt; / option & gt;

& lt; option value = "1" & gt; married & lt; / option & gt;

& lt; option value = "2" & gt; not married & lt; / option & gt;

& lt; / select & gt;

& lt; br & gt;

code class = "plain"> & gt; divorced & lt; / option & gt;

& lt; option value = "1"