NLP | Wordlist Corpus



How do I create a dictionary corpus?

    The WordListCorpusReader class is one of the simplest CorpusReader classes. This

  • WordListCorpusReader — it is one of the simplest CorpusReader classes.
  • This class provides access to files that contain a list of words or one word per line
  • A wordlist file can be a CSV file or a txt file containing one word per line. In our list file
     we have added: geeks for geeks welcomes you to nlp articles 
  • Two arguments to give
  • a path to the directory with files
  • list of file names

Code # 1: Create a wordlist corpus

from nltk.corpus.reader import WordListCorpusReader

x = WordListCorpusReader ( `.` , [ ` C: Users dell Desktop wordlist.txt` ])

x.words ()

 
x.fileids ()

Output:

 [`geeks`,` for`, `geeks`,` welcomes`, `you`,` to`, `nlp`,` articles`] [`C:  Users  dell  Desktop  wordlist.txt`] 

Code # 2: Access to raw materials.

x.raw ()

 

from nltk.tokenize import line_tokenize

print ( "Wordlist:" , line_tokenize (x.raw ()))

Exit :

 `geeks for geeks welcomes you to nlp articles` Wordlist: [`geeks`,` for`, `geeks`,` welcomes`, `you`,` to`, `nlp`,` articles`] 

Code # 3: Access to the wordlist name corpus

Output:

 Path: [`female.txt`,` male.txt`] No. of female names: 5001 No. of male names: 2943 

Code No. 4: Access to the English Corpus Wordlist

# Access to a predefined list of words

from nltk.corpus import names

 

print ( "Path:" , names.fileids ())

 

print ( " No. of female names: " , len (names.words ( `female.txt` )))

  

print ( "No. of male names: " , len (names.words ( `male.txt` )))

# Access a predefined list of words

from nltk.corpus import words

 

print ( "File:" , words.fileids ())

  

print ( "No. of female names: " , len (words.words ( `en-basic` )))

  

print ( "No. of male names: " , len (words.words ( `en` )))

Output:

 File: [`en`,` en-basic`] No. of female names: 850 No. of male names: 235886