The Pandas function dataframe.groupby()
is used to divide data into groups based on some criteria. Pandas objects can be subdivided into any of their axes. An abstract definition of grouping is to ensure that labels are mapped to group names.
Syntax: DataFrame.groupby (by = None, axis = 0, level = None, as_index = True, sort = True, group_keys = True, squeeze = False, ** kwargs)
Parameters:
by: mapping , function, str, or iterable
axis: int, default 0
level: If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index = False is effectively “SQL-style” grouped output
sort: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.
group_keys: When calling apply, add group keys to index to identify pieces
squeeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent typeReturns: GroupBy object
To reference the CSV file used in the code, click here
Example # 1: Use groupby ()
to group data based on Team.
|
Now apply groupby ()
.
|
Output:
Let’s print the value contained in any of the group. To do this, use the command name. We use the get_group ()
function to find records contained in any of the groups.
|
Output:
Example # 2: Use groupby ( )
to form groups based on more than one category (i.e. use more than one column to perform the split).
|
Output:
groupby ()
— very powerful feature with many variations. This makes the task of splitting a data frame according to some criteria really simple and efficient.