It works similarly to the default " str.split () can be applied to the entire series. .str must be prefixed every time before calling this method to distinguish it from the default function in Python, otherwise it will throw an error.
Syntax: Series.str.split (pat = None, n = -1, expand = False)
Parameters:
pat: String value, separator or delimiter to separate string at.
n: Numbers of max separations to make in a single string, default is -1 which means all.
expand: Boolean value, returns a data frame with different value in different columns if True. Else it returns a series with list of strings.Return Type: Series of list or Data frame depending on expand Parameter
To download CSV used in the code, click here.
In the following examples, the data frame used contains details of some NBA players. An image of the data frame before any operations is attached below.
Example # 1: Splitting a line into list
This data uses the split function to split the "Command" column at each "t". The parameter is set to 1, and therefore the maximum number of splits per line will be 1. The expansion parameter is False, and therefore a series with a list of lines is returned instead of a data frame.
|
Output:
As shown in the output image, the Team column now has a list. The line was split on the first occurrence of "t" and not on subsequent occurrences because the parameter n was set to 1 (max. 1 split per line).
Example # 2: Creating separate columns from a string
In this example, the Name column is separated by a space ("") and the extension parameter is set to True, which means it will return a data frame with all the separated rows in different columns. The dataframe is then used to create new columns and the old name column is dropped using the .drop () method.
|
Output:
As shown in the output image, the split () function returned a new dataframe, and it was used to create two new columns (First Name and Last Name) in the dataframe.
New dataframe
Data frame with added columns