Change language

Python | Replace NaN values ​​with column mean

To solve this problem, one possible method is to replace the nan values ​​with the mean of the columns. Here are several ways to solve this problem.

Method # 1: Using np.col mean and np.take

# Python code for demonstration
# replace nan values ​​
# with average columns

 

import numpy as np

 
# Initializing the numpy array

ini_array = np.array ([[ 1.3 , 2.5 , 3.6 , np.nan], 

  [ 2.6 , 3.3 , np.nan, 5.5 ],

[ 2.1 , 3.2 , 5.4 , 6.5 ]])

 
# print the original array

print ( "initial array" , ini_array)

 
number of columns

col_ mean = np.nan mean (ini_array, axis = 0 )

 
# printing a column mean s

print ( "columns mean " , str (col_ mean ))

 
# find indexes where nan is present

inds = np.where (np.isnan (ini_array))

 
# replace indexes with column mean

ini_array [inds] = np.take (col_ mean , inds [ 1 ])

 
# print the final array

print ( "final array" , ini_array)

Exit:

 initial array [[1.3 2.5 3.6 nan] [2.6 3.3 nan 5.5] [2.1 3.2 5.4 6.5]] columns  mean  [2. 3. 4.5 6.] final array [[1.3 2.5 3.6 6.] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]] 

Method # 2: Using np.ma and np.where

# Python code for demo
# replace nan values ​​
# with average columns

 

import numpy as np

 
# Initializing the numpy array

ini_array = np. array ([[ 1.3 , 2.5 , 3.6 , np.nan],

[ 2.6 , 3.3 , np.nan, 5.5 ],

[ 2.1 , 3.2 , 5.4 , 6.5 ]])

 
# print the original array

print ( "initial array" , ini_array)

 
# replace nan with col

res = np.where (np.isnan (ini_array), np.ma.array (ini_array,

mask = np.isnan (ini_array)). mean (axis = 0 ), ini_array) 

  
# print the final array

print "final array" , res)

Exit:

 initial array [[1.3 2.5 3.6 nan] [2.6 3.3 nan 5.5] [2.1 3.2 5.4 6.5]] final array [[1.3 2.5 3.6 6.] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]] 

Method # 3: Using Naive and zip

# Python code for demonstration
# replace nan values ​​
# with average number of columns

 

import numpy as np

 
# Initializing the numpy array

ini_array = np.arra y ([[ 1.3 , 2.5 , 3.6 , np.nan],

[ 2.6 , 3.3 , np.nan, 5.5 ],

[ 2.1 , 3.2 , 5.4 , 6.5 ]])

 
# print the original array

print ( "initial array" , ini_array)

 
# indexes where values this is nan in an array

indices = np .where (np.isnan (ini_array))

 
# Loop through the numpy array to replace nan with values ​​

for row, col in zip ( * indices):

  ini_array [row, col] = np. mean (ini_array [

~ np.isnan (ini_array [:, col]), col])

 
# print the final array

print ( "final array" , ini_array)

Exit:

 initial array [[1.3 2.5 3.6 nan] [2.6 3.3 nan 5.5] [2.1 3.2 5.4 6.5]] final array [[1.3 2.5 3.6 6.] [2.6 3.3 4.5 5.5] [2.1 3.2 5.4 6.5]] 

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically