StackOverflow

### Answer rating: 176

I"m looking for the fastest way to check for the occurrence of NaN (`np.nan`

) in a NumPy array `X`

. `np.isnan(X)`

is out of the question, since it builds a boolean array of shape `X.shape`

, which is potentially gigantic.

I tried `np.nan in X`

, but that seems not to work because `np.nan != np.nan`

. Is there a fast and memory-efficient way to do this at all?

(To those who would ask "how gigantic": I can"t tell. This is input validation for library code.)

Ray"s solution is good. However, on my machine it is about 2.5x faster to use `numpy.sum`

in place of `numpy.min`

:

```
In [13]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 244 us per loop
In [14]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 97.3 us per loop
```

Unlike `min`

, `sum`

doesn"t require branching, which on modern hardware tends to be pretty expensive. This is probably the reason why `sum`

is faster.

**edit** The above test was performed with a single NaN right in the middle of the array.

It is interesting to note that `min`

is slower in the presence of NaNs than in their absence. It also seems to get slower as NaNs get closer to the start of the array. On the other hand, `sum`

"s throughput seems constant regardless of whether there are NaNs and where they"re located:

```
In [40]: x = np.random.rand(100000)
In [41]: %timeit np.isnan(np.min(x))
10000 loops, best of 3: 153 us per loop
In [42]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop
In [43]: x[50000] = np.nan
In [44]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 239 us per loop
In [45]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.8 us per loop
In [46]: x[0] = np.nan
In [47]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 326 us per loop
In [48]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop
```

This book serves as a practical guide on how to utilize big data to store, process, and analyze structured data, focusing on three of the most popular Apache projects in the Hadoop ecosystem: Apache S...

10/07/2020

Systems programming provides the basis for global calculation. Developing performance-sensitive code requires a programming language that allows programmers to control the use of memory, processor tim...

23/09/2021

Black Hat Python, 2nd Edition: Python Programming for Hackers and Pentesters PDF, 2nd Edition. Fully updated for Python 3, the second edition of this worldwide bestseller (over 100,000 copies sold)...

22/08/2021

Data is “unreasonably effective”. Nobel laureate Eugene Wigner referred to the unreasonable effectiveness of mathematics in the natural sciences. What is big data? Its sizes are in the order of te...

10/07/2020

X
# Submit new EBook