Exploring Data Distribution | Set 2



Terms related to data dissemination research

 - & gt; Boxplot - & gt; Frequency Table - & gt; Histogram - & gt; Density Plot 

To get a link to the csv file in use, click here .

Loading Libraries

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

Loading data

data = pd .read_csv ( "../ data / state.csv" < / code> )

 
# Adding new derived data column

data [ `PopulationInMillions` ] = data [ ` Population` ] / 1000000

 

print (data.head ( 10 ))

Output:

  • Histogram: is a way to visualize the distribution of data through a table frequencies with cells along the X-axis and counting data along the Y-axis.

    Code — histogram

    # Histogram population in millions

     

    fig, ax2 = plt.subplots ()

    fig.set_size_inches ( 9 15 )

     

    ax2 = sns.distplot (data.PopulationInMillions, kde = False )

    ax2.set_ylabel ( "Frequency" , fontsize = 15 )

    ax2.set_xlabel ( "Population by State in Millions" , fontsize = 15 )

    ax2.set_title ( "Population - Histogram" , fontsize = 20 )

    Output:

  • Density plot : it is associated with a histogram as it shows the data values ​​distributed as a continuous line. This is a smoothed version of the histogram. The output below is — it is the density of the density superimposed on the histogram.

    Code — Data density plot

    # Density Plot - Population

     

    fig, ax3 = plt.subplots ()

    fig.set_size_inches ( 7 9 )

      

    ax3 = sns.distplot (data.Population, kde = True )

    ax3.set_ylabel ( "Density" , fontsize = 15 )

    ax3.set_xlabel ( " Murder Rate per Million " , fontsize = 15 )

    ax3.set_title ( "Desnsity Plot - Population" , fontsize = 20 )

    Output: