Handwritten Equation Solver in Python

File handling | NumPy | Python Methods and Functions

Retrieving training data

  • Loading dataset
      Load dataset using this link . Unzip the zip file. There will be different folders with images for different mathematical symbols. For simplicity, use 0-9 digits, + ,? -? And, since the images are in our equation solver. Observing the dataset, we can see that it is biased for some numbers / characters as it contains 12000 images for one character and 3000 images for others. To correct this misalignment, reduce the number of images in each folder to approx. 4000.
  • Feature Extraction
      We can use outline extraction to get features.

    1. Invert the image and then convert it to binary image because extracting contours is best when the object is white and the environment is black.
    2. Use findContour to find contours. For objects, get the bounding rectangle of the path using the boundingRect function (the bounding rectangle — is the smallest horizontal rectangle that encloses the entire path).
    3. Since each image in our dataset contains only one character / digit, we only the bounding rectangle of the maximum size is needed. To do this, we calculate the area of ​​the bounding rectangle of each path and select the rectangle with the maximum area.
    4. Now change the maximum size of the bounding rectangle to 28 by 28. Change it to 784 by 1. This will now have 784 pixel values or function. Now assign a label to it (for example, for 0-9 images, the same label as their digit, for — assign a label 10, for + assign a label 11, for a time stamp, assign a label 12). So now our dataset contains 784 feature columns and one label column. After extracting the functions, save the data to a CSV file.
  • Train the data using a convolutional neural network

      Since a convolutional neural network operates on 2D data, and our dataset has a shape of 785 to 1. So we need to change it. First, assign the y_train variable to the label column in our dataset. Then drop the label column from the dataset and then change it to 28 to 28. Our dataset is now ready for CNN.
  • Building a convolutional neural network
      To create a CNN, import all required libraries.

      Convert y_train data to categorical data using the to_categorical function. Use the following line of code to create the model.

    import pandas as pd

    import numpy as np

    import pickle

    np.random.seed ( 1212 )

    import keras

    from keras.models import Model

    from keras.layers < / code> import * from keras import optimizers

    from keras.layers import Input , Dense

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.layers import Dropout

    from keras.layers import Flat ten

    from keras.layers.convolutional import Conv2D

    from keras.layers. convolutional import MaxPooling2D

    from keras.utils import np_utils

    from keras import backend as K

    K.set_image_dim_ordering ( `th` )

    from keras.utils.np_utils import to_categorical

    from ker as.models import model_from_json

    model = Sequential ()

    model.add (Conv2D ( 30 , ( 5 , 5 ), input_shape = ( 1 , 28 , 28 ), activation = `relu` ))

    model.add (MaxPooling2D (pool_size = ( 2 , 2 )))

    model.add (Conv2D ( 15 , ( 3 , 3 ), activation = ` relu` ))

    model.add (MaxPooling2D (pool_size = ( 2 , 2 )))

    model.add (Dropout ( 0.2 ))

    model.add (Flatten ())

    model.add (Dense ( 128 , activation = `relu` ))

    model.add (Dense ( 50 , activation = `relu` ))

    model.add (Dense ( 13 , activation = `softmax` ))

    # Compile the model

    model. compile (loss = `categorical_crossentropy`

    optimizer = `adam` , metrics = [ `accuracy` ])

  • Fitting the model to the data
      Use the following lines of code to fit the CNN to the data.

    model.fit (np.array (l), cat, epochs = 10 , batch_size = 200

    shuffle = True , verbose = 1 )

      Training our model will take about three hours with an accuracy of 98.46%. After training, we can save our model as a json file for future use so that we don`t have to train our model and wait three hours each time. To save our model, we can use the following line of codes.

    model_json = model.to_json ()

    with open ( "model_final.json" , " w " ) as json_file:

      json_file.write (model_json)

    # serialize weights to HDF5

    model .save_weights ( "model_final.h5" )

  • Testing our model or solving an equation with it

      First, import to shu the saved model using the following line of codes.

    json_file = open ( ` model_final.json` , `r` )

    loaded_model_json = json_file.read ()

    json_file.close ()

    loaded_model = model_from_json (loaded_model_json)

    # load weight into new model

    loaded_model.load_weights ( "model_final.h5" )

  •  Now enter an image containing a handwritten equation. Convert the image to binary and then invert the image (if numbers / characters are in black).
  • Now we get the outlines of the image, by default we get the outlines from left to right.
  • Get the bounding rectangle for each outline.
  • This sometimes results in two or more outlines for the same digit / character. To avoid this, check if the bounding rectangle overlaps these two paths or not. If they overlap, then drop the smaller rectangle.
  • Now resize the entire remaining bounding rectangle from 28 to 28.
  • Using the model, predict the corresponding digit / symbol for each bounding rectangle and save it as a string.
  • Then use the & # 39; eval & # 39; in the line to solve the equation.
    1. Download the complete code for solving handwritten equations here .





    Get Solution for free from DataCamp guru