Python | Pandas Reverse splitting strings into two lists / columns using str.rsplit ()



Pandas provide a method to split a string around a missing delimiter or delimiter. The row can then be saved as a list in a series, or it can also be used to create a multi-column dataframe from one single row.  rsplit () works similarly to .split () but rsplit () starts splitting from the right side. This function is also useful when the delimiter / delimiter occurs more than once.

.str must be prefixed each time before calling this method to distinguish it from the default function in Python, otherwise it will throw an error.

Syntax:
Series.str.rsplit (pat = None, n = – 1, expand = False)

Parameters:
pat: String value, separator or delimiter to separate string at.
n: Numbers of max separations to make in a single string, default is -1 which means all.
expand: Boolean value, returns a data frame with different value in different columns if True. Else it returns a series with list of strings

Return type: Series of list or Data frame depending on expand Parameter

To download used CSV file, click here.

In the following examples, the data frame used contains data about some NBA players. An image of the data frame before any operations is attached below. 

Example # 1: Splitting a line from the right side into a list

In this example, the row in the Team column is split each time “t” appears. The parameter n is kept at 1, so the maximum number of splits per line is 1. Since rsplit () is used, the line will be split from the right side.

# import pandas module

import pandas as pd

 
# read CSV file from URL

data = pd.read_csv ( " https://media.python.engineering/wp-content/uploads/nba.csv " )

  
# deleting null columns to avoid errors

data.dropna (inplace = True )

 
# new data frame with delimited columns

data [ "Team" ] = data [ "Team" ]. str . rsplit ( "t" , n = 1 , expand = False )

 
# display
data

Output:
As shown in the output image, the line was split by "t" in "Celtics" and the "t" in "Boston". This is because the split happened in the opposite order. Since the expansion parameter was left False, a list was returned. 

Example # 2: Creating single columns from a row using .rsplit ()

In this example, the Name column is separated by a space ("") and the extension parameter is set to True, which means it will return a data frame with all the rows separated in another column ... The dataframe is then used to create new columns and the old name column is dropped using the .drop () method.

The n parameter is kept at 1 because there can also be middle names (more than one space per line). In this case rsplit () is useful as it calculates from the right side and hence the middle name row will be included in the name column since the maximum number of splits is kept 1.

# pandas module import

import pandas as pd 

 
# read CSV file from URL

data = pd.read_csv ( " https://media.python.engineering/wp-content/uploads /nba.csv "

  
# deleting null columns o values ​​to avoid errors

data.dropna (inplace = True

 
# new data frame with delimited columns

new = data [ "Name" ]. str . split ( " " , n = 1 , expand = True

 
# create a separate name column from a new data frame

data [ "First Name" ] = new [ 0

 
# create a separate last name column from a new data frame

data [ "Last Name" ] = new [ 1

 
# Remove old columns Name

data.drop (columns = [ "Name" ], inplace = True

  
# df display
data 

Output:
As shown in the output image, two new columns have been created and the old Name column has been removed.