I hope you have seen my previous video on what is BERT in this video explain how BERT works the fundamentals of it in todays video we are going to do email classification whether its a spam or non-spam using BERT now BERT will convert an email sentence you know the the whole email into an embedding vector. So we saw in a previous video that the purpose of BERT is to generate if is embedding vector for the entire sentence and that is something that we can feed into our neural network and do the training.
So here we will generate a vector of 768 length y 768 we have covered that in a previous video and then we will supply that to a very simple neural network with only one dense layer, one neuron in the dense layer as an output will also put a dropout layer in between just to tackle the overfitting.
Now if you open this BERT box by the way it has two components pre-processing and encoding and we talked about that in a previous video as well so previous video watching that is quite a prerequisite so lets jump into coding. Now here I have downloaded this file from kegel simple two columns category hammer spam and here is the content of your email I have imported few basic libraries here in my jupyter notebook and Im going to simply read this CSV file into my Pandas data frame which looks like that and then I will do some basic analysis you know I will do df group by lets say category so here I have four eight two five ham emails and 747 spam emails you can clearly see there is some imbalance in our data set so we need to take care of that but before we do that we will create a new column in my data frame you know well call it spam so lets create a new column and if the value spam the value of this spam column will be 1 ham it will be 0 and you all know if you want to create a new column in a data frame from an existing column.
You can use apply function and that will take lambda and what we you are doing is if x is equal to spam then the value is one you see this is how ternary operator in python works else value is zero.
And now if you do df dot head see we simply create a new column zero one say spam one hem zero all right so far so good now lets do train test split.
So Im going to use our standard sql entry in this split function and in that my x is actually a message and my y is the spam okay the spam column and Im going to store the output into these variables this is pretty much a standard practice in machine learning world and okay I will do our test size to be point two so eighty percent training samples twenty percent taste sample let let me check how it split the spam and non-spam so value counts okay so Im checking this to make sure there is a balance okay so lets see okay 149 divided by 967 okay around 15 percent spam in test and 3859 okay so it is good balance but still to be on a safe side I will say stratify so when you do stratify it will make sure there is a balance you know its not like in your training data set if all the samples are zero and there are lesser two samples which has spam value then model will not be good in terms of detecting the spam.
Okay so thats why I supply this stratify I mean before stratify also it already did a good job but this is just to be on a safe side now comes the most interesting part which is creating the embedding using BERT.
Okay so how do you do that so for that you have to go to this tensorflow hub website and click on text and go to BERT model now in but we are going to use this first model so we saw in a previous video that there is an encoder and there is pre-processing step so first you do pre-processing so you click here you copy the URL okay so this is my pre-process URL all right and you go back you go to text bird and you go here and copy this URL. This is your main encoder okay so this is your pre-processing URL and this is your main encoder URL so Im going to use keras layer hub keras layer basically okay and call that bird pre-process and then I will use the same hope keras layer here and call it BERT encoder see we saw in a presentation there are two steps invert encoding pre-processing and encoding so thats what we did exactly.
Okay when you run it its gonna take some time because it is downloading the BERT model you know its somewhat around 300 megabytes so based on your internet speed it might take some time but essentially you are downloading a train model which is trained on all the wikipedia and book corpus so now in our task well be just directly using that train model to generate the embedding vectors after model is downloaded I am going to define a simple function that takes couple of sentences as an input and returns me an embedding vector so basically the way Ill use this function is okay supply an array of sentences and any sentence okay and that should return me the embedding vector for the entire sentence and if youve seen again my previous video this pre-process handle that you get you can use it as a normal function and you supply your sentences here and it should return you the pre-process text so I will just call it pre-process text and then use the BERT encoder okay and when you use the about encoder it returns a dictionary out of which you need to use pulled out pull output is basically the encoding for the entire sentence again if you want to know what other elements are there in the dictionary you need to watch my previous video its sort of like a prerequisite all right now see when I run this it is generating for this sentence this is my embedding vector the size is 768 for this second sentence this is my embedding vector so we have achieved the major goal here which is generating the vector you know vector using the BERT and I just give you a simple function but in reality we will be using tensorflow layers okay but before we go there let me generate some embedding vectors so for some more you know words lets just generate it for words lets say banana I want to see what kind of embedding vectors it generates for a couple of fruits and then you know what I will compare the fruits with Jeff Bezos and Bill Gates so these three are people these three are fruits so lets see what kind of embedding vector it generates and now I have all these embedding vectors right so if you do e 6 by 7 68 okay i am going to use cosine similarity so if you have seen my cosine similarity video if you do cosine similarity you will find this video where I have explained you know what is exactly cosine similarity so if you dont know watch it. It is used to compare two vectors so here i will compare lets say banana bananas embedding vector with uh grapes embedding vector now this takes a two dimensional array so Im just going to wrap it up in a two dimensional array okay you see 0.99 so if it is near to 1 it means these two vectors are very similar so banana and grapes are similar because they are fruits banana and mango is also similar because they are fruit but let me compare banana with Jeff Bzos its kind of weird right comparing banana with Jeff Bezos see 0.84 but still theyre not 0.99 theyre not as similar as banana and grapes okay and by the way you have to use this with a with a little caution I mean cosine similarity is not exact vector similarity okay so sometimes you might see some unexpected result but thats okay now let me compare Jeff Bezos with Elon Musk say again 0.98 so you get the point behind BERT now okay now lets build a model so far in our deep learning series we have built tensorflow models using sequential model okay we are going to now use functional models so there are two type of model sequential and functional okay so what is the difference between the two? Im going to link a good article here so in the sequential model you add layers one by one as a sequence you see but in a functional model you create your input then you create a hidden layer lets say and supply input as a function argument then you create hidden one then you supply that into hidden twos argument and so on and then you create model using inputs and outputs now this allows you to create a model which might have multiple inputs multiple outputs like something like rest net you know you can also share network layers with other other models so there are there are some differences you read the article you will get an idea so here Im going to build a functional model okay so the first step is you create your input layer the shape is going to be this because the sentence length is varying and my data type is string and name of this layer I will call it text or input whatever you know you can give it the name that you like the most and this will be my input layer then we are going to do this these two things so here I supply input okay same thing and then BERT encoder and BERT encoder takes let me just do output here so outputs okay and from the outputs I get pulled output so pull output will be the sentence encoding so pool output will be this okay this I need to I will create one dropout layer which I have not shown in the picture so let me feed that pulled output into a dropout layer and then last layer will be one neuron dense layer okay so lets create dropout layer here dropout layer is used to tackle the overfitting sometimes even if you dont do it its okay it helps and in that dropout layer you pass this as an input okay so now im going to drop 10 of neuron okay and I will call it this dropout and lets call it l l is the layer and the second one is the dense layer with one neuron and since its a binary classification you know like one zero kind of thing I will do sigmoid and the name of this layer is output and again we are using functional API so you need to treat this as like a function and pass in the previous layer here and then I will say overwrite the same variable you know okay and then in the end my model is nothing but this model which has two parameters inputs outputs now the inputs will be this so its an array you can supply multiple inputs as well so here this is the input and output will be l okay and you can do model dot summary okay output is not defined so output outputs great now here my trainable parameters are 769 because I have 768 neuron here and this one so total 769 my non-trainable parameters are so much so these are the parameters from my birth model bird is already trained so I dont need to train them again and when you are doing you know model building you know that you do model compile where optimizer loss these are like pretty much standard things that we use in all our tutorials loss is binary cross entropy because we are doing binary classification here and then Im going to now run the training so model dot fit x train y train epochs lets do tan epochs now this is gonna take time because the whole encoding process is little slow and we have so many samples so based on your computer it might take time I have a powerful computer and gpu but still takes few minutes so you to be little passion you can reduce epochs if you want okay so I reduced epochs to five and I got ninety three percent accuracy then I do model dot evaluate on my extra sweaters I got ninety five percent accuracy which is which is so good actually so now I do inference so I have a couple of emails actually its not reviews its emails and on that email when I do predict see the first three emails are spam the second the rest of them are not spam theyre legit emails and in sigmoid whenever the value is more than 0.5 it means its a spam and it is less than 0.5 it means its not a spam. So you see so these things worked out really well for us all right so this tutorial provided you a very like a simple explanation of how you can do text classification using BERT you can use BERT for variety of other problems as well just such as movie review classification or name entity recognization and by the way I have an exercise for you and the exercise is actually very simple you have to just do copy paste so go to Google red tensorflow tutorial and in that go to text tutorial and look at classified text with BERT so what you need to do is you need to just run this code on your computer so just copy paste these these lines you know step by step in your notebook and just run it and try to understand it this tutorial is similar to what we did but the data set is much bigger they are using tensorflow data set API so in terms of API also they are using little different we use Pandas and they are also using some caching the model is also little different so if you practice this you will consolidate your learning from this particular video so I hope youre youre going to practices I trust you all youre a census student so please open a notebook copy paste these lines one by one try to understand it and see how it works if you are confident you can just load the data set and finish rest of the tutorials by referring to this page but without referring to this page okay so thank you very much for watching I will see you in the next video if you like this particular video give it a thumbs up and share it with your friends.