Data Analysis and Visualization with Python | Set 2

Prerequisites:

import pandas as pd

 
# assignment of three series s1, s2, s3

s1 = pd.Series ([ 0 , 4 , 8 ])

s2 = pd.Series ([ 1 , 5 , 9 ])

s3 = pd.Series ([ 2 , 6 , 10 ])

 
# get index and column values ​​

dframe = pd.DataFrame ([s1, s2, s3])

  
# assign column name

dframe.columns = [ ' Geeks' , 'For' , 'Geeks' ]

 
# write b data to CSV file

dframe.to_csv ( 'pythonengineering.csv' , index = False

dframe.to_csv ( 'pythonengineering1.csv' , index = True )

Output:

 pythonengineering1.csv   pythonengineering2.csv   

2. Handling Missing Data

The Data Analysis Phase also includes the ability to handle missing data from our dataset, and it's no surprise that Pandas also live up to that expectation. This is where dropna and / or fillna come into play. When dealing with missing data, you, as a data analyst, have to either drop the column containing NaN values ​​ (dropna method) or fill the missing data with the mean or mode of the entire record in the column (fillna method), this is the solution is very important and depends on the data and the impact will create in our results.

  • Delete missing data:
    Please note that this is the DataFrame generated below code:

    import pandas as pd

     
    # Create DataFrame

    dframe = pd.DataFrame ({ 'Geeks' : [ 23 , 24 , 22 ], 

    'For' : [ 10 , 12 , np.nan],

    'geeks' : [ 0 , np.nan, np.nan]},

    columns = [ 'Geeks' , 'For' , ' geeks' ])

     
    # This will remove everything
    # lines with NAN values ​​

     
    # If the axis is not defined, then
    # it is line by line i.e. axis = 0

    dframe.dropna (inplace = True )

    print ( dframe)

     
    # if axis is 1

    dframe.dropna (axis = 1 , inplace = True )

     

    print (dframe )

    Output:

     axis = 0   axis = 1   
  • Fill in missing values:
    Now, to replace any NaN value with the mean or data mode, fillna is used, which can replace all NaN values ​​from a specific column, or even the whole DataFrame as per

    import numpy as np

    import pandas as pd

     
    # Create DataFrame

    dframe = pd.DataFrame ({ 'Geeks' : [ 23 , 24 , 22 ], 

    'For' : [ 10 , 12 , np.nan],

    'geeks' : [ 0 , np.nan, np.nan]},

    columns = [ 'Geeks' , ' For' , 'geeks' ])

     

     
    # Use full Dataframe padding

     
    The # value function will be applied to each column

    dframe.fillna ( value = dframe.mean (), inplace = True )

    print (dframe)

     
    # fill value of one column

    dframe [ 'For' ]. fillna ( value = dframe [ 'For' ]. mean (),

    inpl ace = True )

    print (dframe)

    Output:

         

3. Group method (aggregation):

The groupby method allows us to group data based on any row or column, so we can additionally apply aggregate functions to analyze our data. Group rows using a cartographer (dict or key function, apply this function to a group, return the result as a row) or by a series of columns.

Note that this is a DataFrame generated by the following code:

import pandas as pd

import numpy as np

 
# create DataFrame

dframe = pd.DataFrame ({ 'Geeks' : [ 23 , 24 , 22 , 22 , 23 , 24 ], 

'For' : [ 10 , 12 , 13 , 14 , 15 , 16 ],

  'geeks' : [ 122 , 142 , 112 , 122 , 114 , 112 ]},

columns = [ 'Geeks' , ' For' , ' geeks' ]) 

 
# Apply group and aggregate functions
# max find the maximum value of the column

 
# & amp; quot; For & amp; quot; and the geeks column for each
# different value for the & quot; Geeks & quot; column.

  

print (dframe.groupby ([ 'Geeks' ]). max ())

Output:

   




Get Solution for free from DataCamp guru