Pandas provide a method to split a string around a missing delimiter or delimiter. The row can then be saved as a list in a series, or it can also be used to create a multi-column dataframe from one single row.
rsplit () works similarly to
.split () but
rsplit () starts splitting from the right side. This function is also useful when the delimiter / delimiter occurs more than once.
.str must be prefixed each time before calling this method to distinguish it from the default function in Python, otherwise it will throw an error.
Series.str.rsplit (pat = None, n = – 1, expand = False)
pat: String value, separator or delimiter to separate string at.
n: Numbers of max separations to make in a single string, default is -1 which means all.
expand: Boolean value, returns a data frame with different value in different columns if True. Else it returns a series with list of strings
Return type: Series of list or Data frame depending on expand Parameter
To download used CSV file, click here.
In the following examples, the data frame used contains data about some NBA players. An image of the data frame before any operations is attached below.
Example # 1: Splitting a line from the right side into a list
In this example, the row in the Team column is split each time “t” appears. The parameter n is kept at 1, so the maximum number of splits per line is 1. Since rsplit () is used, the line will be split from the right side.
As shown in the output image, the line was split by "t" in "Celtics" and the "t" in "Boston". This is because the split happened in the opposite order. Since the expansion parameter was left False, a list was returned.
Example # 2: Creating single columns from a row using .rsplit ()
In this example, the Name column is separated by a space ("") and the extension parameter is set to True, which means it will return a data frame with all the rows separated in another column ... The dataframe is then used to create new columns and the old name column is dropped using the .drop () method.
The n parameter is kept at 1 because there can also be middle names (more than one space per line). In this case rsplit () is useful as it calculates from the right side and hence the middle name row will be included in the name column since the maximum number of splits is kept 1.
As shown in the output image, two new columns have been created and the old Name column has been removed.