ML | Handling Missing Data with a Simple Imputer



SimpleImputer — it is a training class that helps to handle missing data in the predictive model dataset. It replaces NaN values ​​with the specified placeholder. 
This is done with the SimpleImputer () method, which takes the following arguments:

missing_data : The missing_data placeholder which has to be imputed. By default is
stategy : The data which will replace the NaN values ​​from the dataset. The strategy argument can take the values ​​- `mean` (default),` median `,` most_frequent `and` constant `.
fill_value : The constant value to be given to the NaN data using the constant strategy.

Code: Python code illustrating the use of the SimpleImputer class.

import numpy as np

 
# Importing the SimpleImputer class

from sklearn.impute import SimpleImputer

 
# Imputable object using middle strategy and
# missing datatype for imputation

imputer = SimpleImputer (missing_data = np.nan, 

strategy = `mean` )

 

data = [[ 12 , np.nan, 34 ], [ 10 , 32 , np.nan], 

[np.nan, 11 , 20 ]]

 

< code class = "functions"> print ( "Original Data:" , data)

# Fitting data to the imputer object

imputer = imputer.fit (data)

 
# Data imputation

data = imputer.transform (data)

 

print ( "Imputed Data:" , data)

Exit

 Original Data: 
[[12, nan, 34] [10, 32, nan] [nan, 11, 20]]
Imputed Data:
[[12, 21.5 , 34] [10, 32, 27] [11, 11, 20]]

Remember: average her or median is taken along the matrix column