Change language

Split string into columns using regex in pandas DataFrame

Method # 1 :
In this method we will use re.search (pattern, string, flags = 0) . Here, pattern refers to the pattern we want to find. It accepts a string with the following values:

  • / w matches alphanumeric characters
  • / d matches numbers, which means 0-9
  • / s matches whitespace
  • / S matches non-whitespace characters
  • , matches any character other than the newline character / n
  • * matches 0 or more instances of the pattern

# import regex library

import pandas as pd

import re

  
# Create a list with all lines

movie_data = [ "Name: The_Godfather Year: 1972 Rating: 9.2" ,

"Name: Bird_Box Year: 2018 Rating: 6.8" ,

"Name: Fight_Club Year: 1999 Rating: 8.8 " ]

  
# Create a dictionary with the required columns
# Used later to convert to DataFrame

movies = { " Name " : [], "Year" : [], " Rating " : []}

  < / p>

for item in movie_data:

  

  # For name field

name_field = re.search ( "Name:. *" , item)

  

  if name_field is not None :

name = re.search ( ’w * sw * ’ , name _field.group ())

else :

name = None

movies [ "Name" ]. append (name.group ())

 

  # Year field

year_field = re.search ( "Year:. *" , item)

if year_field is not None  :

year = re.search ( ’sdddd’ , year_field.group ())

  else :

year = None

movies [ "Year" ]. append (year.group (). strip ())

 

# For the rating field

rating_field = re.search ( " Rating:. * " , item)

  if rating_field is not None

rating = re.search ( ’sd.d ’ , rating_field.group ())

  else

  rating - None

movies [ "Rating" ]. append (rating.group (). strip ())

  
# Create DataFrame

df = pd.DataFrame (movies)

print (df)

Output:

Explanation :

  • In the above code, we are using a for loop to iterate over the movie data so that we can work with each movie in turn. We create a movie dictionary that will store all the details of every detail such as rating and title.
  • We then find the complete Name field using re.search () . means any character except / n and * expands it to the end of the line. Assign that to the variable name_field .
  • But the data isn’t always straightforward. This may contain surprises. For example, what if there is no Name: field? The script will give an error and break. We anticipate errors from this script and check for " No" .
  • We again use the re.search () function to retrieve the last required row from the name_field. For the name, we use / w * to represent the first word, / s to represent the space between them, and / w * for the second word.
  • Do the same for the year and rankings and get the final required vocabulary.

Method # 2:
To split a string, we will use Series.str.extract (pat, flags = 0, expand = True) . Here pat refers to the pattern we want to find.

import pandas as pd

 

dict = { ’movie_data’ : [ ’The Godfather 1972 9.2’ ,

  ’Bird Box 2018 6.8’ ,

  ’Fight Club 1999 8.8’ ]}

  
# Convert dictionary to data frame

df = pd.DataFrame ( dict )

  
# Extract name from string

df [ ’Name’ ] = df [ ’movie_data’ ]. str . extract ( ’(w * sw *)’ , expand = True )

  
# Extract year from string

df [ ’Year’ ] = df [ ’movie_data’ ].   str . extract ( ’(dddd)’ , expand = True )

 
# Extract rating from string

df [ ’Rating’ ] = df [ ’movie_data’ ]. str . extract ( ’ (dd) ’ , expand = True )

print (df)

Output:

Shop

Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best laptop for development

$499+
Gifts for programmers

Best laptop for Cricut Maker

$299+
Gifts for programmers

Best laptop for hacking

$890
Gifts for programmers

Best laptop for Machine Learning

$699+
Gifts for programmers

Raspberry Pi robot kit

$150

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically