Using the NLTK tokenize.regexp ()
module, we can extract tokens from a string using a regex with RegexpTokenizer ()
.
Syntax: tokenize.RegexpTokenizer ()
Return: Return array of tokens using regular expression
Example # 1:
In this example we use RegexpTokenizer ()
to retrieve the stream tokens using regular expressions.
# import RegexpTokenizer () method from nltk from nltk.tokenize import RegexpTokenizer # Create a reference variable for the RegexpT class okenizer tk = RegexpTokenizer ( ’s +’ , gaps = True ) # Create input string gfg = "I love Python" # Use the tokenization method geek = tk.tokenize (gfg) print ( geek) |
Output:
[’I’, ’love’, ’Python’]
P Example # 2:
# import RegexpTokenizer () method from nltk from nltk.tokenize import RegexpTokenizer # Create a reference variable for the RegexpTokenizer class tk = RegexpTokenizer ( ’s +’ , gaps = True ) # Create input line gfg = "Geeks for Geeks" # Use the tokenization method geek = tk.tokenize (gfg) print (geek) |
Output:
[’Geeks’, ’for’, ’Geeks’]