In this tutorial, we will learn about the builtin data visualization capabilities of pandas! It`s built in from matplotlib, but baked into pandas for easier use!
Let`s take a look!
Install
The easiest way to install pandas — use pip:
pip install pandas
or download it from here
This article demonstrates how to use the builtin data visualization function in pandas by building various types of charts.
Sample csv files df1 and df2 used in this tutorial can be downloaded here .

Matplotlib has style sheets that you can use to make your graphics look a little better. These style sheets include plot_bmh
, plot_fivethirtyeight
, plot_ggplot
and others. They basically create a set of style rules that your graphics follow. We recommend using them, they make all your areas look similar and look more professional. We can even create our own if we want the company areas to look the same (albeit a bit tedious).
Here`s how to use them.
Before plt.style.use ()
graphics look like this:

Output:
Name the style:
Now the plots look like this after calling the ggplot
style:

Output:
The graphs look like this after calling the style bmh
:

Output:
The graphs look like this after calling the dark_background
style:

Exit:
Plots look like this after calling the fivethirtyeight
style :

Exit:
Exist There are several types of graphs built into pandas, most of which are statistical in nature:
You can also simply call df.plot (kind = & # 39; hist & # 39;)
, or replace the kind argument with any of the key terms shown in the list above (for example, & # 39; box & # 39 ;, & # 39; barh & # 39; etc.). Let`s start through them!
An area chart or area chart displays graphically quantitative data. It is based on a line chart. The area between the axis and the line is usually highlighted with colors, textures, and shading. Usually one compares two or more quantities to an area chart.

Output:
Histogram or histogram — it is a chart or chart that presents categorical data with rectangular bars with heights or lengths proportional to the values they represent. Bars can be applied vertically or horizontally. A vertical bar chart is sometimes called a line chart.

Output:

Output:

Output:
Histogram — it is a graph that allows you to discover and show the underlying frequency distribution (shape) of a continuous dataset. This allows data to be checked for underlying distribution (e.g. normal distribution), outliers, skewness, etc.

Output:
Line Chart — it is a graph that shows the frequency of data along a number line. It is best to use a line chart when the data is a time series. It`s a quick and easy way to organize your data.

Output:
Scatter plots are used when you want to show the relationship between two variables. Scatterplots are sometimes called correlation plots because they show how two variables are correlated.

Output:
You can use c to paint based on a different column value. Use cmap to specify the color map to use. For all color maps, check: http://matplotlib.org/users/colormaps.html

Output:
Or use s to specify a size based on another column. S must be an array, not just a column name:

Output:
This is a graph that draws a rectangle to represent the second and third quartiles, usually with a vertical line inside to represent the median value. The lower and upper quartiles are shown with horizontal lines on either side of the rectangle.
Square graph — it is a standardized way to display the distribution of data based on a fivenumber summary (minimum, first quartile (Q1), median, third quartile (Q3), and maximum). He can tell you about your emissions and their values. It can also tell you if your data is symmetric, how tightly your data is grouped, and if and how your data is garbled.
df2.plot.box ( )
# You can also pass a by = argument for groupby
Output:
Hexagonal binning — this is another way to solve the problem of having many points that start to overlap. The hexagonal density is binning, not points. The points are combined into hexagons with a grid, and the distribution (number of points per hexagon) is displayed using the color or area of the hexagons.
Useful for 2D data, alternative to scatter plot:

Output:
KDE — it is a technique that allows you to create a smooth curve from a dataset.
This can be useful if you only want to visualize the "shape" of some data, as a kind of continuous replacement for a discrete histogram. It can also be used to create points that look like they come from a specific dataset — this behavior can lead to simple modeling when the modeled objects are modeled based on real data.

Output:
df2.plot.density ()
Output:
This is it! Hopefully you can see why this plotting method is much easier to use than full matplotlib, it balances ease of use with control over pattern. Many plot calls also accept additional arguments to their parent matplotlib plt. call.