Problem statement. For any input word and text file, predict the next n words that may appear after the input word in the text file.
Examples: strong >
Input: is Output: is it simply makes sure that there are never Input: is Output : is split, all the maximum amount of objects, it Input: the Output: the exact same position. There will be some.
Note. To illustrate the example, I`ve assigned a variable body to some text. If you want to check the data against real text data, you can find it here .
Solution — We can approach this problem using the concepts of probability. First, we have to calculate the frequency of all words occurring immediately after input in the text file (n-gram, here it is 1-gram, because we always find the next 1 word in the entire data file). Then, using those frequencies, calculate the CDF of all those words and just pick a random word from it. To select this random word, we take a random number and find the smallest CDF greater than or equal to the random number. We do this because we want the most likely answer for each case. So this can be achieved with cdf as it gives the cumulative probability for each word in the list.
Having found the CDF, we can easily find the matching word and add that word to the output string. Now, if you want, you can also add a word to the input string and send the whole string to repeat the process to find the next word, or you can just post the word you found using cdf. I did it using the old approach.
Note. If you enter the same word multiple times, you will get a different output. It depends on the size of your data file. The larger the file, the more likely there is another exit.
Code for the above algorithm
The above concept is used in areas such as Natural Langauage Processing. This is a naive approach just to illustrate the concept. In fact, there are many more algorithms for word prediction. You can find one of them here