You can easily install
num2words using pip.
pip install num2words
Consider the following two excerpts from various files taken from 20 newsgroups, the popular NLP database. The preprocessing of 20 newsgroups continued to be of interest.
In article, Martin Preston writes: Why not use the PD C library for reading / writing TIFF files? It took me a good 20 minutes to start using them in your own app.
ISCIS VIII is the eighth of a series of meetings which have brought together computer scientists and engineers from about twenty countries. This year’s conference will be held in the beautiful Mediterranean resort city of Antalya, in a region rich in natural as well as historical sites.
In the two excerpts above, you can see that the number “20»Appears in both numerical and alphabetical form. Simply performing preprocessing steps that include tokenization, lemmatization, etc. will not be able to map "20" and "twenty" to the same stem, which has contextual meaning. Fortunately, we have a built-in library
num2words that solves this problem in one line.
Below is an example of using the tool.
thirty-six thirty-sixth 36th zero euro, thirty-six cents treinta y seis
Therefore, in the preprocessing step, you can convert ALL numeric values to words for greater precision in subsequent steps.