Python | Pandas Series.str.extract ()

Python Methods and Functions | Regular Expressions

Series.str can be used to access the values ​​of a series as strings and apply multiple methods to it. Series.str.extract() Pandas Series.str.extract() is used to extract capture groups in regular expression as columns in a DataFrame. For each subject line in the Series, extract the groups from the first match of the regular expression pat .

Syntax: Series.str.extract (pat, flags = 0, expand = True)

Parameter:
pat: Regular expression pattern with capturing groups.
flags: int, default 0 (no flags)
expand: If True, return DataFrame with one column per capture group.

Returns : DataFrame or Series or Index

Example # 1: Use Series.str.extract () to extract groups from a string in the underlying data of this series object.

# import pandas as pd

import pandas as pd

 
# re import for regular expressions

import re

 
# Create series

sr = pd.Series ([ 'New_York' , 'Lisbon' , ' Tokyo' , 'Paris' , 'Munich' ])

  
# Create index

idx = [ 'City 1' , ' City 2' , ' City 3' , 'City 4' , 'City 5' ]

  
# set index

sr.index = idx

 
# Print series

print (sr)

Output:

We will now use Series.str.extract () to extract groups from strings in a given series object.

# fetch groups, I have vowels followed by
# any character

result = sr. str . extract (pat = '([aeiou].)' )

 
# print result

print (result)

Output:

As we can see from the output, Series.str.extract () returned a data frame containing the column of the extracted group.

Example # 2: Use Series.str.extract () to extract groups from a string in the underlying data of a given series object.

< table border = "0" cellpadding = "0" cellspacing = "0">

# import pandas as pd

import pandas as pd

 
# re import for regular expressions

import re

 
# Create a series

sr = pd.Series ([ 'Mike' , ' Alessa' , 'Nick' , ' Kim' , 'Britney' ])

 
# Create index

idx = [ 'Name 1' , ' Name 2' , 'Name 3' , 'Name 4' , ' Name 5' ]

 
# set index

sr.index = idx

  
# Print series

print (sr)

Exit :

< p> We will now use Series.str.extract () to extract groups from strings in a given series object.

# retrieve groups containing capital letters
# followed by & # 39; i & # 39; and any other character

result = sr . str . extract (pat = '([AZ] i.)' )

  
# print the result

print (result )

Output:

As we can see from the output, Series.str.extract () returned a data frame containing the column of the extracted group.





Get Solution for free from DataCamp guru