  # matrix operations | randn () function

NumPy | Python Methods and Functions | randn

`numpy.matlib.randn()` — another function to do matrix operations in numpy. It returns a matrix of random values ​​from a one-dimensional "normal" (Gaussian) distribution of mean 0 and variance 1.

Syntax: numpy.matlib.randn (* args)

Parameters:
* args: [Arguments] Shape of the output matrix. If given as N integers, each integer specifies the size of one dimension. If given as a tuple, this tuple gives the complete shape.If there are more than one argument and the first argument is a tuple then other arguments are ignored.

Return: The matrix of random values ​​drawn from the standard normal distribution.

Code # 1:

` `

` # Python program explaining # numpy.matlib.randn () function    # import matrix library from numpy import numpy as geek import numpy.matlib   # desired 3 x 4 random output matrix out_mat = geek.matlib.randn (( 3 , 4 ))  print ( " Output matrix: " , out_mat)  `

` ` Output:

` Output matrix: [[ 0.78620217 0.41624612 -0.28417131 0.1071018] [0.77645105 0.30858858 -1.98901344 1.25977209] [0.26279443 -0.41026178 -0.60834494 2.82552737]] `

Code # 2:

>

 ` # Python program explaining ` ` # numpy.matlib.randn () function `   ` # import numpy library and matrix ` ` import ` ` numpy as geek ` ` import ` ` numpy.matlib `   ` # desired random output matrix 1 x 5 ` ` out_mat ` ` = ` ` geek.matlib. randn (` ` 5 ` `) ` ` print ` ` (` ` "Output matrix:" ` `, out_mat) `

Output:

` Output matrix: [[0.34973625 0.28159132 0.72581405 -1.17511692 1.96773952]] `

Code # 3:

 ` # Python program explaining ` ` # numpy.matlib.randn () function `   ` # import numpy library and matrices ` ` import ` ` numpy as geek ` ` import ` ` numpy.matlib `   ` # more than one argument given ` ` out_mat ` ` = ` ` geek.matlib.randn ((` ` 5 ` `, ` ` 3 ` `), ` ` 4 ` `) ` ` print ` ` (` ` "Output matrix:" ` `, out_mat) `

Output:

` Output matrix: [[0.56784957 0.82980325 1.16683558] [-1.53444326 -0.27743273 0.65819067] [0.99654573 -1.20399432 -0.25603147] [1.74931585 0.58413453 1.67820029] [-1.25643231 0.21610229 0.21694595]]] `

Note: src = "http://espressocode.top/images/aresconcoidili748083.jpg" /> we can use ` sigma * geek.matlib.randn (...) + mu `
For example, creating a 3 x 3 matrix with samples from :

Code # 4:

 ` # Python program explaining ` ` # numpy.matlib .randn () function `   ` # import numpy library and matrix `` import numpy as geek import numpy.matlib   # So here mu = 3, sigma = 2 out_mat = 2 * geek.matlib.randn (( 3 , 3 )) + 3 print ( "Output matrix:" , out_mat)  `

Output:

` Output matrix: [[4.04967121 0.26982021 2.3503067] [5.57757131 2.40051874 -0.84588539] [7.43715651 3.84004412 1.40514615]] `

## matrix operations | randn () function: StackOverflow Questions

The simplest way to get row counts per group is by calling `.size()`, which returns a `Series`:

``````df.groupby(["col1","col2"]).size()
``````

Usually you want this result as a `DataFrame` (instead of a `Series`) so you can do:

``````df.groupby(["col1", "col2"]).size().reset_index(name="counts")
``````

If you want to find out how to calculate the row counts and other statistics for each group continue reading below.

## Detailed example:

Consider the following example dataframe:

``````In : df
Out:
col1 col2  col3  col4  col5  col6
0    A    B  0.20 -0.61 -0.49  1.49
1    A    B -1.53 -1.01 -0.39  1.82
2    A    B -0.44  0.27  0.72  0.11
3    A    B  0.28 -1.32  0.38  0.18
4    C    D  0.12  0.59  0.81  0.66
5    C    D -0.13 -1.65 -1.64  0.50
6    C    D -1.42 -0.11 -0.18 -0.44
7    E    F -0.00  1.42 -0.26  1.17
8    E    F  0.91 -0.47  1.35 -0.34
9    G    H  1.48 -0.63 -1.14  0.17
``````

First let"s use `.size()` to get the row counts:

``````In : df.groupby(["col1", "col2"]).size()
Out:
col1  col2
A     B       4
C     D       3
E     F       2
G     H       1
dtype: int64
``````

Then let"s use `.size().reset_index(name="counts")` to get the row counts:

``````In : df.groupby(["col1", "col2"]).size().reset_index(name="counts")
Out:
col1 col2  counts
0    A    B       4
1    C    D       3
2    E    F       2
3    G    H       1
``````

### Including results for more statistics

When you want to calculate statistics on grouped data, it usually looks like this:

``````In : (df
...: .groupby(["col1", "col2"])
...: .agg({
...:     "col3": ["mean", "count"],
...:     "col4": ["median", "min", "count"]
...: }))
Out:
col4                  col3
median   min count      mean count
col1 col2
A    B    -0.810 -1.32     4 -0.372500     4
C    D    -0.110 -1.65     3 -0.476667     3
E    F     0.475 -0.47     2  0.455000     2
G    H    -0.630 -0.63     1  1.480000     1
``````

The result above is a little annoying to deal with because of the nested column labels, and also because row counts are on a per column basis.

To gain more control over the output I usually split the statistics into individual aggregations that I then combine using `join`. It looks like this:

``````In : gb = df.groupby(["col1", "col2"])
...: counts = gb.size().to_frame(name="counts")
...: (counts
...:  .join(gb.agg({"col3": "mean"}).rename(columns={"col3": "col3_mean"}))
...:  .join(gb.agg({"col4": "median"}).rename(columns={"col4": "col4_median"}))
...:  .join(gb.agg({"col4": "min"}).rename(columns={"col4": "col4_min"}))
...:  .reset_index()
...: )
...:
Out:
col1 col2  counts  col3_mean  col4_median  col4_min
0    A    B       4  -0.372500       -0.810     -1.32
1    C    D       3  -0.476667       -0.110     -1.65
2    E    F       2   0.455000        0.475     -0.47
3    G    H       1   1.480000       -0.630     -0.63
``````

### Footnotes

The code used to generate the test data is shown below:

``````In : import numpy as np
...: import pandas as pd
...:
...: keys = np.array([
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["A", "B"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["C", "D"],
...:         ["E", "F"],
...:         ["E", "F"],
...:         ["G", "H"]
...:         ])
...:
...: df = pd.DataFrame(
...:     np.hstack([keys,np.random.randn(10,4).round(2)]),
...:     columns = ["col1", "col2", "col3", "col4", "col5", "col6"]
...: )
...:
...: df[["col3", "col4", "col5", "col6"]] =
...:     df[["col3", "col4", "col5", "col6"]].astype(float)
...:
``````

Disclaimer:

If some of the columns that you are aggregating have null values, then you really want to be looking at the group row counts as an independent aggregation for each column. Otherwise you may be misled as to how many records are actually being used to calculate things like the mean because pandas will drop `NaN` entries in the mean calculation without telling you about it.

The idiomatic way to do this with Pandas is to use the `.sample` method of your dataframe to sample all rows without replacement:

``````df.sample(frac=1)
``````

The `frac` keyword argument specifies the fraction of rows to return in the random sample, so `frac=1` means return all rows (in random order).

Note: If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

``````df = df.sample(frac=1).reset_index(drop=True)
``````

Here, specifying `drop=True` prevents `.reset_index` from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean `id(df_old)` is not the same as `id(df_new)`), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

``````\$ python3 -m memory_profiler .	est.py
Filename: .	est.py

Line #    Mem usage    Increment   Line Contents
================================================
5     68.5 MiB     68.5 MiB   @profile
6                             def shuffle():
7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

``````

This post aims to give readers a primer on SQL-flavored merging with Pandas, how to use it, and when not to use it.

In particular, here"s what this post will go through:

• The basics - types of joins (LEFT, RIGHT, OUTER, INNER)

• merging with different column names
• merging with multiple columns
• avoiding duplicate merge key column in output

What this post (and other posts by me on this thread) will not go through:

• Performance-related discussions and timings (for now). Mostly notable mentions of better alternatives, wherever appropriate.
• Handling suffixes, removing extra columns, renaming outputs, and other specific use cases. There are other (read: better) posts that deal with that, so figure it out!

Note Most examples default to INNER JOIN operations while demonstrating various features, unless otherwise specified.

Furthermore, all the DataFrames here can be copied and replicated so you can play with them. Also, see this post on how to read DataFrames from your clipboard.

Lastly, all visual representation of JOIN operations have been hand-drawn using Google Drawings. Inspiration from here.

# Enough talk - just show me how to use `merge`!

### Setup & Basics

``````np.random.seed(0)
left = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
right = pd.DataFrame({"key": ["B", "D", "E", "F"], "value": np.random.randn(4)})

left

key     value
0   A  1.764052
1   B  0.400157
2   C  0.978738
3   D  2.240893

right

key     value
0   B  1.867558
1   D -0.977278
2   E  0.950088
3   F -0.151357
``````

For the sake of simplicity, the key column has the same name (for now).

An INNER JOIN is represented by Note This, along with the forthcoming figures all follow this convention:

• blue indicates rows that are present in the merge result
• red indicates rows that are excluded from the result (i.e., removed)
• green indicates missing values that are replaced with `NaN`s in the result

To perform an INNER JOIN, call `merge` on the left DataFrame, specifying the right DataFrame and the join key (at the very least) as arguments.

``````left.merge(right, on="key")
# Or, if you want to be explicit
# left.merge(right, on="key", how="inner")

key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
``````

This returns only rows from `left` and `right` which share a common key (in this example, "B" and "D).

A LEFT OUTER JOIN, or LEFT JOIN is represented by This can be performed by specifying `how="left"`.

``````left.merge(right, on="key", how="left")

key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
``````

Carefully note the placement of NaNs here. If you specify `how="left"`, then only keys from `left` are used, and missing data from `right` is replaced by NaN.

And similarly, for a RIGHT OUTER JOIN, or RIGHT JOIN which is... ...specify `how="right"`:

``````left.merge(right, on="key", how="right")

key   value_x   value_y
0   B  0.400157  1.867558
1   D  2.240893 -0.977278
2   E       NaN  0.950088
3   F       NaN -0.151357
``````

Here, keys from `right` are used, and missing data from `left` is replaced by NaN.

Finally, for the FULL OUTER JOIN, given by specify `how="outer"`.

``````left.merge(right, on="key", how="outer")

key   value_x   value_y
0   A  1.764052       NaN
1   B  0.400157  1.867558
2   C  0.978738       NaN
3   D  2.240893 -0.977278
4   E       NaN  0.950088
5   F       NaN -0.151357
``````

This uses the keys from both frames, and NaNs are inserted for missing rows in both.

The documentation summarizes these various merges nicely: ### Other JOINs - LEFT-Excluding, RIGHT-Excluding, and FULL-Excluding/ANTI JOINs

If you need LEFT-Excluding JOINs and RIGHT-Excluding JOINs in two steps.

For LEFT-Excluding JOIN, represented as Start by performing a LEFT OUTER JOIN and then filtering (excluding!) rows coming from `left` only,

``````(left.merge(right, on="key", how="left", indicator=True)
.query("_merge == "left_only"")
.drop("_merge", 1))

key   value_x  value_y
0   A  1.764052      NaN
2   C  0.978738      NaN
``````

Where,

``````left.merge(right, on="key", how="left", indicator=True)

key   value_x   value_y     _merge
0   A  1.764052       NaN  left_only
1   B  0.400157  1.867558       both
2   C  0.978738       NaN  left_only
3   D  2.240893 -0.977278       both``````

And similarly, for a RIGHT-Excluding JOIN, ``````(left.merge(right, on="key", how="right", indicator=True)
.query("_merge == "right_only"")
.drop("_merge", 1))

key  value_x   value_y
2   E      NaN  0.950088
3   F      NaN -0.151357``````

Lastly, if you are required to do a merge that only retains keys from the left or right, but not both (IOW, performing an ANTI-JOIN), You can do this in similar fashion‚Äî

``````(left.merge(right, on="key", how="outer", indicator=True)
.query("_merge != "both"")
.drop("_merge", 1))

key   value_x   value_y
0   A  1.764052       NaN
2   C  0.978738       NaN
4   E       NaN  0.950088
5   F       NaN -0.151357
``````

### Different names for key columns

If the key columns are named differently‚Äîfor example, `left` has `keyLeft`, and `right` has `keyRight` instead of `key`‚Äîthen you will have to specify `left_on` and `right_on` as arguments instead of `on`:

``````left2 = left.rename({"key":"keyLeft"}, axis=1)
right2 = right.rename({"key":"keyRight"}, axis=1)

left2

keyLeft     value
0       A  1.764052
1       B  0.400157
2       C  0.978738
3       D  2.240893

right2

keyRight     value
0        B  1.867558
1        D -0.977278
2        E  0.950088
3        F -0.151357
``````
``````left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")

keyLeft   value_x keyRight   value_y
0       B  0.400157        B  1.867558
1       D  2.240893        D -0.977278
``````

### Avoiding duplicate key column in output

When merging on `keyLeft` from `left` and `keyRight` from `right`, if you only want either of the `keyLeft` or `keyRight` (but not both) in the output, you can start by setting the index as a preliminary step.

``````left3 = left2.set_index("keyLeft")
left3.merge(right2, left_index=True, right_on="keyRight")

value_x keyRight   value_y
0  0.400157        B  1.867558
1  2.240893        D -0.977278
``````

Contrast this with the output of the command just before (that is, the output of `left2.merge(right2, left_on="keyLeft", right_on="keyRight", how="inner")`), you"ll notice `keyLeft` is missing. You can figure out what column to keep based on which frame"s index is set as the key. This may matter when, say, performing some OUTER JOIN operation.

### Merging only a single column from one of the `DataFrames`

For example, consider

``````right3 = right.assign(newcol=np.arange(len(right)))
right3
key     value  newcol
0   B  1.867558       0
1   D -0.977278       1
2   E  0.950088       2
3   F -0.151357       3
``````

If you are required to merge only "new_val" (without any of the other columns), you can usually just subset columns before merging:

``````left.merge(right3[["key", "newcol"]], on="key")

key     value  newcol
0   B  0.400157       0
1   D  2.240893       1
``````

If you"re doing a LEFT OUTER JOIN, a more performant solution would involve `map`:

``````# left["newcol"] = left["key"].map(right3.set_index("key")["newcol"]))
left.assign(newcol=left["key"].map(right3.set_index("key")["newcol"]))

key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0
``````

As mentioned, this is similar to, but faster than

``````left.merge(right3[["key", "newcol"]], on="key", how="left")

key     value  newcol
0   A  1.764052     NaN
1   B  0.400157     0.0
2   C  0.978738     NaN
3   D  2.240893     1.0
``````

### Merging on multiple columns

To join on more than one column, specify a list for `on` (or `left_on` and `right_on`, as appropriate).

``````left.merge(right, on=["key1", "key2"] ...)
``````

Or, in the event the names are different,

``````left.merge(right, left_on=["lkey1", "lkey2"], right_on=["rkey1", "rkey2"])
``````

### Other useful `merge*` operations and functions

This section only covers the very basics, and is designed to only whet your appetite. For more examples and cases, see the documentation on `merge`, `join`, and `concat` as well as the links to the function specifications.

*You are here.

jwilner"s response is spot on. I was exploring to see if there"s a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

``````df.isnull().values.any()
`````` ``````import numpy as np
import pandas as pd
import perfplot

def setup(n):
df = pd.DataFrame(np.random.randn(n))
df[df > 0.9] = np.nan
return df

def isnull_any(df):
return df.isnull().any()

def isnull_values_sum(df):
return df.isnull().values.sum() > 0

def isnull_sum(df):
return df.isnull().sum() > 0

def isnull_values_any(df):
return df.isnull().values.any()

perfplot.save(
"out.png",
setup=setup,
kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
n_range=[2 ** k for k in range(25)],
)
``````

`df.isnull().sum().sum()` is a bit slower, but of course, has additional information -- the number of `NaNs`.

It"s the index column, pass `pd.to_csv(..., index=False)` to not write out an unnamed index column in the first place, see the `to_csv()` docs.

Example:

``````In :
df = pd.DataFrame(np.random.randn(5,3), columns=list("abc"))

Out:
Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335
``````

compare with:

``````In :

Out:
a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335
``````

You could also optionally tell `read_csv` that the first column is the index column by passing `index_col=0`:

``````In :

Out:
a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335
``````

In `PyTorch`, for every mini-batch during the training phase, we need to explicitly set the gradients to zero before starting to do backpropragation (i.e., updation of Weights and biases) because PyTorch accumulates the gradients on subsequent backward passes. This is convenient while training RNNs. So, the default action has been set to accumulate (i.e. sum) the gradients on every `loss.backward()` call.

Because of this, when you start your training loop, ideally you should `zero out the gradients` so that you do the parameter update correctly. Else the gradient would point in some other direction than the intended direction towards the minimum (or maximum, in case of maximization objectives).

Here is a simple example:

``````import torch
import torch.optim as optim

def linear_model(x, W, b):

data, targets = ...

for sample, target in zip(data, targets):
# clear out the gradients of all Variables
# in this optimizer (i.e. W, b)
output = linear_model(sample, W, b)
loss = (output - target) ** 2
loss.backward()
optimizer.step()
``````

Alternatively, if you"re doing a vanilla gradient descent, then:

``````W = Variable(torch.randn(4, 3), requires_grad=True)

for sample, target in zip(data, targets):
# clear out the gradients of Variables
# (i.e. W, b)

output = linear_model(sample, W, b)
loss = (output - target) ** 2
loss.backward()

``````

Note:

There are a few operations on Tensors in PyTorch that do not change the contents of a tensor, but change the way the data is organized. These operations include:

`narrow()`, `view()`, `expand()` and `transpose()`

For example: when you call `transpose()`, PyTorch doesn"t generate a new tensor with a new layout, it just modifies meta information in the Tensor object so that the offset and stride describe the desired new shape. In this example, the transposed tensor and original tensor share the same memory:

``````x = torch.randn(3,2)
y = torch.transpose(x, 0, 1)
x[0, 0] = 42
print(y[0,0])
# prints 42
``````

This is where the concept of contiguous comes in. In the example above, `x` is contiguous but `y` is not because its memory layout is different to that of a tensor of same shape made from scratch. Note that the word "contiguous" is a bit misleading because it"s not that the content of the tensor is spread out around disconnected blocks of memory. Here bytes are still allocated in one block of memory but the order of the elements is different!

When you call `contiguous()`, it actually makes a copy of the tensor such that the order of its elements in memory is the same as if it had been created from scratch with the same data.

Normally you don"t need to worry about this. You"re generally safe to assume everything will work, and wait until you get a `RuntimeError: input is not contiguous` where PyTorch expects a contiguous tensor to add a call to `contiguous()`.

Pandas will recognise a value as null if it is a `np.nan` object, which will print as `NaN` in the DataFrame. Your missing values are probably empty strings, which Pandas doesn"t recognise as null. To fix this, you can convert the empty stings (or whatever is in your empty cells) to `np.nan` objects using `replace()`, and then call `dropna()`on your DataFrame to delete rows with null tenants.

To demonstrate, we create a DataFrame with some random values and some empty strings in a `Tenants` column:

``````>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.randn(10, 2), columns=list("AB"))
>>> df["Tenant"] = np.random.choice(["Babar", "Rataxes", ""], 10)
>>> print df

A         B   Tenant
0 -0.588412 -1.179306    Babar
1 -0.008562  0.725239
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
4  0.805304 -0.834214
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes
9  0.066946  0.375640
``````

Now we replace any empty strings in the `Tenants` column with `np.nan` objects, like so:

``````>>> df["Tenant"].replace("", np.nan, inplace=True)
>>> print df

A         B   Tenant
0 -0.588412 -1.179306    Babar
1 -0.008562  0.725239      NaN
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
4  0.805304 -0.834214      NaN
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes
9  0.066946  0.375640      NaN
``````

Now we can drop the null values:

``````>>> df.dropna(subset=["Tenant"], inplace=True)
>>> print df

A         B   Tenant
0 -0.588412 -1.179306    Babar
2  0.282146  0.421721  Rataxes
3  0.627611 -0.661126    Babar
5 -0.514568  1.890647    Babar
6 -1.188436  0.294792  Rataxes
7  1.471766 -0.267807    Babar
8 -1.730745  1.358165  Rataxes
``````

For people looking at this today, I would recommend the Seaborn `heatmap()` as documented here.

The example above would be done as follows:

``````import numpy as np
from pandas import DataFrame
import seaborn as sns
%matplotlib inline

Index= ["aaa", "bbb", "ccc", "ddd", "eee"]
Cols = ["A", "B", "C", "D"]
df = DataFrame(abs(np.random.randn(5, 4)), index=Index, columns=Cols)

sns.heatmap(df, annot=True)
`````` Where `%matplotlib` is an IPython magic function for those unfamiliar.

You have a couple of options.

``````import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan
``````

Now the data frame looks something like this:

``````          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
``````
• Option 1: `df.isnull().any().any()` - This returns a boolean value

You know of the `isnull()` which would return a dataframe like this:

``````       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False
``````

If you make it `df.isnull().any()`, you can find just the columns that have `NaN` values:

``````0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool
``````

One more `.any()` will tell you if any of the above are `True`

``````> df.isnull().any().any()
True
``````
• Option 2: `df.isnull().sum().sum()` - This returns an integer of the total number of `NaN` values:

This operates the same way as the `.any().any()` does, by first giving a summation of the number of `NaN` values in a column, then the summation of those values:

``````df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64
``````

Finally, to get the total number of NaN values in the DataFrame:

``````df.isnull().sum().sum()
5
``````