Data Analysis and Visualization with Python



Installation
The easiest way to install pandas — use pip:

 pip install pandas 

or download it from here

Creating a DataFrame in Pandas

Creating a dataframe is done by passing multiple Series to the DataFrame with using the pd.Series method. Here it is passed in two Series objects, s1 as the first row and s2 as the second row. 
Example:

# assignment of two series s1 and s2

s1 = pd.Series ([ 1 , 2 ])

s2 = pd.Series ([ " Ashish " , "Sid" ])

# crop series objects into data

df = pd.DataFrame ([s1, s2])

# show data frame
df

 
# cropping the data in a different way
# getting the index and column values ​​

dframe = pd.DataFrame ([[ 1 , 2 ], [ "Ashish" , "Sid" ]],

index = [ "r1" , "r2" ],

columns = [ "c1" , "c2" ])  

dframe

 
# crop differently
# dict-like container

dframe = pd.DataFrame ({

"c1" : [ 1 , " Ashish " ],

  " c2 " : [ 2 , "Sid" ]})

dframe

Output:

         

Importing data using pandas

The first step is to read the data. The data is stored as comma separated values ​​or a CSV file, with each row separated by a new line and each column — comma (,). To be able to work with data in Python, you need to read the csv file into the Pandas DataFrame. DataFrame — it is a way of presenting and working with tabular data. Tabular data has rows and columns, just like this CSV file (click Download). 
Example:

# Import pandas library renamed to pd

import pandas as pd

 
# Read the IND_data.csv in the DataFrame assigned to df

df = pd.read_csv ( "IND_data.csv" )

 
# Prints the first 5 lines of the DataFrame by default
df.head ()

  
# of rows and columns of DataFrame
df.shape

Exit d:

     
 29,10 

Indexing data frames with pandas

Indexing is possible with using the pandas.DataFrame.iloc method. The iloc method allows you to get as many rows and columns by position. 
Examples :

# prints the first 5 lines and each column that copies df.head ()

df.iloc [ 0 : 5 ,:]

# prints entire lines and columns
df.iloc [:,:]
# prints 5 lines and first 5 columns

df.iloc [ 5 :,: 5 ]

Indexing using tags in Pandas

For indexing, you can work with tags with using the method pandas.DataFrame.loc  which allows indexing using labels instead of positions. 
Examples:

# prints the first five lines, including the 5th index and all df columns

df.loc [ 0 : 5 ,:]

# prints from the 5th row and whole columns

df = df. loc [ 5 :,:]

The above doesn`t really differ much from df.iloc [0: 5,:]. This is because while the row labels can be anything, our row labels correspond exactly to the positions. But column labels can make working with data a lot easier. Example:

# Prints the first 5 lines of the time period
# value

df.loc [: 5 , "Time period" ]

     

DataFrame Math with pandas

Computing data frames can be done using the statistical functions of the pandas tools. 
Examples:

# calculates various summary statistics excluding NaN values ​​
df .describe ()
# to calculate correlations
df.corr ()
# calculates numeric data ranks
df.rank ()

           

Pandas Plotting

The plots in these examples are created using the standard convention for referencing the matplotlib API, which provides the basics in pandas to easily create decent looking graphs. 
Examples:

# import the required module

import matplotlib.pyplot as plt

# plot histogram

df [ `Observation Value` ]. hist (bins = 10 )

  
# indicates a lot of outliers / extremes

df.boxplot (column = `Observation Value` , by = `Time period` )

  
# drawing points as a scatter plot

x = df [ "Observation Value" ]

y = df [ "Time period" ]

plt.scatter (x, y, label = "stars" , color = "m"

marker = "*" , s = 30 )

X-axis label

plt.xlabel ( `Observation Value` )

# frequency tag

plt.ylabel ( `Time period ` )

# plot display function
plt.show ()