Python | Pandas dataframe.resample ()

Python Methods and Functions

The Pandas dataframe.resample() function is mainly used for time series data. 
Time series — it is a series of data points indexed (or listed or plotted) in order of time. Most often the time series — it is a sequence taken at successive equal intervals of time. It is a convenient technique for frequency conversion and resampling of time series. The object must have a date / time type index (DatetimeIndex, PeriodIndex, or TimedeltaIndex) or pass date / time values ​​to the on or level keyword.

Syntax: DataFrame. resample (rule, how = None, axis = 0, fill_method = None, closed = None, label = None, convention = 'start', kind = None, loffset = None, limit = None, base = 0, on = None, level = None)

Parameters:
rule: the offset string or object representing target conversion
axis: int, optional, default 0
closed: {'right', 'left'}
label: {'right', 'left'}
convention: For PeriodIndex only, controls whether to use the start or end of rule
loffset: Adjust the resampled time labels
base: For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for '5min' frequency, base could range from 0 through 4. Defaults to 0.
on: For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
level: For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.

Resampling generates a unique distribution of the sample based on the actual data. We can apply different frequencies to resample our time series data. This is a very important method in the field of analytics. 
Most commonly used time series frequencies — 
W: weekly frequency
M: month ending frequency
SM: semester ending frequency (15 and end of month)
Q: quarter-end frequency

There are many other types of time series available. Let's see how to apply the frequency of these time series to the data and change it.

To link to the CSV file used in the code, click here

This is Apple stock price data for a period of 1 year from (13-11-17) to (13-11 -18)

Example # 1: recalculation of data on a monthly frequency

# import pandas as pd

import pandas as pd

  
# By default, the" date "column was in string format,
# us you need to convert it to date and time format

 
# parse_dates = [& quot; date & quot;], envelopes eases & quot; date & quot;
# column in date and time format. We know this
# resampling only works with time series data
# so we convert the column "date" ; to index

 
# index_col = & quot; date & quot ;, creates a column & quot; date & quot ;, dataframe index

df = pd.read_csv ( "apple.csv" , parse_dates = [ "date" ], index_col = "date" )

 
# Print the first 10 lines of data

df [: 10 ]

# Resampling time series data by month
# we apply it in stock closing price
" M "stands for month

monthly_resampled_data = df.close.resample ( 'M' ). mean ()

 
# the above command will find the average closing price
# of each month for 12 months.
monthly_resampled_data

Output:

Example # 2: data recalculation at weekly frequency

# import pandas as pd

import pandas as pd

 
# We know that resampling works with time series data
# only to convert the date column to an index
# index_col = & quot; date & quot ;, creates a column & quot; date & quot ;.

  

df = pd.read_csv ( "apple.csv" , parse_dates = [ "date" ], index_col = "date" )

< p>  
# Resampling time series data based on weekly frequency
# we apply it to the opening price of the stock & # 39; W & # 39; indicates a week

weekly_resampled_data = df. open . resample ( 'W' ). mean ()

 
# find the average opening price every week
# every week for 1 year.
weekly_resampled_data

Output:

Example # 3: quarterly frequency conversion

# import pandas as pd

import pandas as pd

 
# We know resampling works with time series
# only data, so convert our "date" column to index
# index_col = "date", creates a column & quot; date & quot;

df = pd .read_csv ( "apple.csv" , parse_dates = [ "date" ], index_col = " date " )

  
# Resampling time series data
# based on quarterly frequency
# & # 39; Q & # 39; stands for a quarter

 

Quarterly_resampled_data = df. open . resample ( 'Q' ). mean ()

  
# average opening price of each quarter
# within 1 year.
Quarterly_resampled_data

Output:





Tutorials