+

Python | Extract words from a given string

Method # 1: Using split()
Using the split function, we can split a string into a list of words, and this is the most common and recommended method if someone wants to complete this specific task. But the downside is that it doesn`t work in cases where the string contains punctuation marks.

# Python3 demo code
# extract words from string
# using split ()

 
# initialization string

test_string = "Geeksforgeeks is best Computer Science Portal"

 
# print the original line

print ( "The original string is:" +   test_string)

  
# using split ()
# extract words from string

res = test_string.split ()

 
# print result

print ( "The list of words is:" +   str (res))

Exit:

The original string is: Geeksforgeeks is best Computer Science Portal
The list of words is: [`Geeksforgeeks`,` is`, `best`, `Computer`, `Science`, `Portal`]

Method # 2: Using regex (findall ())
In cases that contain all special characters and punctuation marks, as discussed above, tr The traditional method of searching for words in a string using splitting can fail and therefore requires regular expressions to accomplish this task. The findall function returns a list after filtering the string and extracting words, ignoring punctuation marks.

# Python3 demo code
# extract words from string
# using regular expression (findall ())

import re

 
# initializing string

test_string = "Geeksforgeeks, is best @ # Computer Science Portal. !!!"

 
# print original line

print ( "The original string is : " +   test_string)

  
# using regular expression (findall ())
# extract words from string

res = re.findall (r `w +` , test_string)

 
# print result

print ( "The list of words is:" +   str (res))

Output:

The original string is: Geeksforgeeks, is best @ # Computer Science Portal. !!!
The list of words is: [`Geeksforgeeks`, `is`, `best`, `Co mputer `,` Science `,` Portal `]

Method # 3: Using regex () + string.punctuation
This method also used regular expressions, but the get all punctuation string function is used to ignore all punctuation and get the filtered result string.

# Python3 demo code
# extract words from the string
# using regex () + string.punctuation

import re

import string

 
# initializing string

test_string = " Geeksforgeeks, is best @ # Computer Science Portal. !!! "

  
# print original string

print ( " The original string is: " +   test_string)

  
# using regex () + string.punctuation
# extract words from string

res = re.sub ( `[` + string.punctuation + `] ` ,` `, test_string) .split ()

 
# print result

print ( "The list of words is:" +   str (res))

Exit:

The original string is: Geeksforgeeks, is best @ # Computer Science Portal. !!!
The list of words is: [`Geeksforgeeks`, `is`, `best`, `Computer`, `Science`, `Portal`]

Get Solution for free from DataCamp guru