How do I get the row count of a Pandas DataFrame?

StackOverflow

I"m trying to get the number of rows of dataframe df with Pandas, and here is my code.

Method 1:

total_rows = df.count
print total_rows + 1

Method 2:

total_rows = df["First_columnn_label"].count
print total_rows + 1

Both the code snippets give me this error:

TypeError: unsupported operand type(s) for +: "instancemethod" and "int"

What am I doing wrong?

Answer rating: 1933

For a dataframe df, one can use any of the following:

  • len(df.index)
  • df.shape[0]
  • df[df.columns[0]].count() (slowest, but avoids counting NaN values in the first column)

Performance plot


Code to reproduce the plot:

import numpy as np
import pandas as pd
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
    n_range=[2**k for k in range(25)],
    kernels=[
        lambda df: len(df.index),
        lambda df: df.shape[0],
        lambda df: df[df.columns[0]].count(),
    ],
    labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
    xlabel="Number of rows",
)

Answer rating: 400

Suppose df is your dataframe then:

count_row = df.shape[0]  # Gives number of rows
count_col = df.shape[1]  # Gives number of columns

Or, more succinctly,

r, c = df.shape

Answer rating: 201

Use len(df) :-).

__len__() is documented with "Returns length of index".

Timing info, set up the same way as in root"s answer:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index) directly. But this should not matter in most cases. I find len(df) to be quite readable.

Answer rating: 120

How do I get the row count of a Pandas DataFrame?

This table summarises the different situations in which you"d want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).

Enter image description here

Footnotes

  1. DataFrame.count returns counts for each column as a Series since the non-null count varies by column.
  2. DataFrameGroupBy.size returns a Series, since all columns in the same group share the same row-count.
  3. DataFrameGroupBy.count returns a DataFrame, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, use df.groupby(...)["x"].count() where "x" is the column to count.

#Minimal Code Examples

Below, I show examples of each of the methods described in the table above. First, the setup -

df = pd.DataFrame({
    "A": list("aabbc"), "B": ["x", "x", np.nan, "x", np.nan]})
s = df["B"].copy()

df

   A    B
0  a    x
1  a    x
2  b  NaN
3  b    x
4  c  NaN

s

0      x
1      x
2    NaN
3      x
4    NaN
Name: B, dtype: object

Row Count of a DataFrame: len(df), df.shape[0], or len(df.index)

len(df)
# 5

df.shape[0]
# 5

len(df.index)
# 5

It seems silly to compare the performance of constant time operations, especially when the difference is on the level of "seriously, don"t worry about it". But this seems to be a trend with other answers, so I"m doing the same for completeness.

Of the three methods above, len(df.index) (as mentioned in other answers) is the fastest.

Note

  • All the methods above are constant time operations as they are simple attribute lookups.
  • df.shape (similar to ndarray.shape) is an attribute that returns a tuple of (# Rows, # Cols). For example, df.shape returns (8, 2) for the example here.

Column Count of a DataFrame: df.shape[1], len(df.columns)

df.shape[1]
# 2

len(df.columns)
# 2

Analogous to len(df.index), len(df.columns) is the faster of the two methods (but takes more characters to type).

Row Count of a Series: len(s), s.size, len(s.index)

len(s)
# 5

s.size
# 5

len(s.index)
# 5

s.size and len(s.index) are about the same in terms of speed. But I recommend len(df).

Note size is an attribute, and it returns the number of elements (=count of rows for any Series). DataFrames also define a size attribute which returns the same result as df.shape[0] * df.shape[1].

Non-Null Row Count: DataFrame.count and Series.count

The methods described here only count non-null values (meaning NaNs are ignored).

Calling DataFrame.count will return non-NaN counts for each column:

df.count()

A    5
B    3
dtype: int64

For Series, use Series.count to similar effect:

s.count()
# 3

Group-wise Row Count: GroupBy.size

For DataFrames, use DataFrameGroupBy.size to count the number of rows per group.

df.groupby("A").size()

A
a    2
b    2
c    1
dtype: int64

Similarly, for Series, you"ll use SeriesGroupBy.size.

s.groupby(df.A).size()

A
a    2
b    2
c    1
Name: B, dtype: int64

In both cases, a Series is returned. This makes sense for DataFrames as well since all groups share the same row-count.

Group-wise Non-Null Row Count: GroupBy.count

Similar to above, but use GroupBy.count, not GroupBy.size. Note that size always returns a Series, while count returns a Series if called on a specific column, or else a DataFrame.

The following methods return the same thing:

df.groupby("A")["B"].size()
df.groupby("A").size()

A
a    2
b    2
c    1
Name: B, dtype: int64

Meanwhile, for count, we have

df.groupby("A").count()

   B
A
a  2
b  1
c  0

...called on the entire GroupBy object, vs.,

df.groupby("A")["B"].count()

A
a    2
b    1
c    0
Name: B, dtype: int64

Called on a specific column.





Tutorials