Python | Data analysis using pandas

Pandas — the most popular Python library used for data analysis. It provides highly optimized performance, with source code written exclusively in C or Python .

 We can analyze data in pandas with: 
  1. Series
  2. DataFrames

Series:

Series — it is a one-dimensional (1-D) array defined in pandas that can be used to store any type of data.

Code # 1: Create Series

# Program for creating series

import pandas as pd  # Importing the Panda library

 
# Create series with data and index

a = pd.Series (Data, index = Index) 

Here data can be:

  1. Scalar value, which can be an integerValue, string
  2. Dictionary b Python, which can be a key, value pair
  3. NDarray

Note . Default index: 0, 1, 2, … (n-1), where n — the length of the data.

Code # 2: when the data contains scalar values ​​

Quit :

   

Scalar Data with default Index

Scalar Data with Index

Code # 3: when the data contains a dictionary

# Program for creating series with scalar values ​​

Data = [ 1 , 3 , 4 , 5 , 6 , 2 , 9 # Numeric data

 
# Create a series with default index values ​​

s = pd.Series (Data) 

 
# predefined index values ​​

Index = [ `a` , `b` , ` c` , `d` , ` e` , `f` , ` g`

 
# Create a series with predefined index values ​​

si = pd .Series (Data, Index) 

# Program for creating a series dictionary

dictionary = { ` a` : 1 < code class = "plain">, `b` : 2 , `c` : 3 , ` d` : 4 , `e` : 5

 
# Create a series of dictionary type

sd = pd.Series (dictionary) 

Quit :

    Dictionary type data 

Code # 4: when the data contains Ndarray

# Program for creating ndarray series

Data = [[ 2 , 3 , 4 ], [ 5 , 6 , 7 ]]  # Definition of 2darray

 
# Create a 2darray series

snd = pd.S eries (Data) 

Exit :

   

Data as Ndarray

DataFrames:

DataFrames — it is a two-dimensional (2-D) data structure defined in pandas that consists of rows and columns.

Code # 1: Creating a DataFrame

# Program for creating DataFrame

import pandas as pd  # Library import

 

a = pd.DataFrame (Data)  # Create a DataFrame with data

Here data can be:

  1. One or more dictionaries
  2. One or more series
  3. 2D-Numpy Ndarray

Code # 2: when the data is a dictionary mi

# Program for creating a data frame with two dictionaries

dict1 = { `a` : 1 , `b` : 2 , `c ` : 3 , ` d` : 4 # Define dictionary 1

dict2 = { `a` : 5 , `b` : 6 , `c` : 7 , `d` : 8 , `e` : 9 } # Define dictionary 2

Data = { `first` : dict1, ` second` : dict2}  # Define data using dict1 and dict2

df = pd.DataFrame (Data)  # Create DataFrame

Exit :

   

DataFrame with two dictionaries

Code # 3: when batch data

# Program for creating Dataframe from three series

import pandas as pd

 

s1 = pd.Series ([ 1 , 3 , 4 , 5 , 6 , 2 , 9 ])  # Define series 1

s2 = pd.Series ([ 1.1 , 3.5 , 4.7 , 5.8 , 2.9 , 9.3 ]) # Define series 2

s3 = pd.Series ([ `a` , ` b` < code class = "plain">, `c` , ` d ` , ` e` ])  # Define Series 3

 

 

Data = { `first` : s1, ` second` : s2, ` third` : s3} # Define data

dfseries = pd.DataFrame (Data )  # Create DataFrame

Quit :

   

DataFrame with three series

Code # 4: When the data is two-dimensional
Note . There is one limitation that must be observed when creating 2D DataFrames. The dimensions of the 2D array must be the same.

# Program for creating DataFrame from 2D array

import pandas as pd # Library import

d1 = [[[ 2 , 3 , 4 ], [ 5 , 6 , 7 ]] # Define 2d array 1

d2 = [[ 2 , 4 , 8 ], [ 1 , 3 , 9 ]] # Define 2d array 2

Data = { `first` : d1, ` second` : d2} # Define data

df2d = pd.DataFrame (Data)  # Create DataFrame

Exit :

   

DataFrame with 2d ndarray