In Pandas, missing data is represented by two values:
None: None — it is a Python singleton object and is often used for missing data in Python code.
NaN: NaN (short for Not a Number) — it is a special floating point value recognized by all systems that use the IEEE standard floating point notation.
Pandas consider None and NaN essentially interchangeable to indicate missing or null values. To facilitate this convention, the Pandas DataFrame has several useful functions for detecting, removing and replacing empty values:
In this article, we are using a CSV file, to load the CSV file we are using, click here .
Check missing values using isnull () and notnull()
To check for missing values in the Pandas DataFrame, we use the isnull () function and notnull () . Both functions help to check if the value is NaN or not. These functions can also be used in the Pandas series to find null values in a series.
Check for missing values with isnull () h4 >
To check for null values in a Pandas DataFrame, we use isnull () this function returns a data frame with Boolean values equal to True for NaN values.
# using the notnull () function df.notnull () code >
Code # 4:
# pandas package import
import pandas as pd p>
# create data frame from CSV file
data = pd.read_csv ( "employees.csv" )
# create a bool True series for NaN values
bool_series = pd.notnull (data [ "Gender" ])
# filtering data # display data only with Gender = Not NaN data [bool_series]
Output: strong> As shown in the output image, only strings that have Gender = NOT NULL are displayed.
Filling in missing values with fillna () , replace () and interpolate()
To fill in null values in datasets, we use fillna () , replace () and interpolate () these functions replace NaN values with some native value. All of these functions help fill in null values in DataFrame datasets. The Interpolate () function is mainly used to fill in NA values in a data frame, but it uses various interpolation techniques to fill in missing values rather than hardcoding the value.