Change language

Python NLTK | tokenize.regexp ()

| |

Using the NLTK tokenize.regexp () module, we can extract tokens from a string using a regex with RegexpTokenizer () .

Syntax: tokenize.RegexpTokenizer ()
Return: Return array of tokens using regular expression

Example # 1:
In this example we use RegexpTokenizer () to retrieve the stream tokens using regular expressions.

# import RegexpTokenizer () method from nltk

from nltk.tokenize import RegexpTokenizer

 
# Create a reference variable for the RegexpT class okenizer

tk = RegexpTokenizer ( ’s +’ , gaps = True )

  
# Create input string

gfg = "I love Python"

 
# Use the tokenization method

geek = tk.tokenize (gfg)

 

print ( geek)

Output:

[’I’, ’love’, ’Python’]

P Example # 2:

# import RegexpTokenizer () method from nltk

from nltk.tokenize import RegexpTokenizer

 
# Create a reference variable for the RegexpTokenizer class

tk = RegexpTokenizer ( ’s +’ , gaps = True )

  
# Create input line

gfg = "Geeks for Geeks"

  
# Use the tokenization method

geek = tk.tokenize (gfg)

 

print (geek)

Output:

[’Geeks’, ’for’, ’Geeks’]