Python | Pandas Index.nunique ()

Counters | nunique | Python Methods and Functions

Index.nunique() Pandas Index.nunique() returns the number of unique elements in an object. It returns a scalar value that is a count of all unique values ​​in the index. By default, NaN values ​​are ignored. If dropna is False then NaN is included in the counter.

Syntax: Index.nunique (dropna = True)

Parameters:
dropna: Don't include NaN in the count.

Returns: nunique: int

Example # 1: Use Index.nunique () () to find the number of unique values ​​in the index. Do not include NaN values ​​in the number.

# import pandas as pd

import pandas as pd

 
# Create index

idx = pd.Index ([ 'Beagle' , 'Pug' , ' Labrador' , 'Pug' ,

  'Mastiff' , None , 'Beagle' ])

< code class = "undefined spaces">  
# Print index
idx

Output:

Let's find the number of unique values ​​in the index.

# to find the number of unique values.

idx.nunique (dropna = True )

Output:

As we see in the output , the function returned 4, indicating that there are only 4 unique values ​​in the index.

Example # 2: Use Ind ex.nunique () find all unique values ​​in the index. Also include missing values, that is, NaN values ​​in the number.

# import pandas as pd

import pandas as pd

  
# Create index

idx = pd.Index ([ 'Beagle' , 'Pug' , ' Labrador' , 'Pug' ,

  'Mastiff' , None , 'Beagle' ])

 
# Print index
idx

Output:

Let's find the number of unique values ​​in index.

# to find the number of unique values.

idx.nunique ( dropna = False )

Output:

As we can see in the output, the function returned 5, indicating that there are only 5 unique values ​​in the index. We've also included missing values ​​in the number.





Python | Pandas Index.nunique (): StackOverflow Questions

Answer #1

You need nunique:

df = df.groupby("domain")["ID"].nunique()

print (df)
domain
"facebook.com"    1
"google.com"      1
"twitter.com"     2
"vk.com"          3
Name: ID, dtype: int64

If you need to strip " characters:

df = df.ID.groupby([df.domain.str.strip(""")]).nunique()
print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip("""))["ID"].nunique()

You can retain the column name like this:

df = df.groupby(by="domain", as_index=False).agg({"ID": pd.Series.nunique})
print(df)
    domain  ID
0       fb   1
1      ggl   1
2  twitter   2
3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Answer #2

Generally to count distinct values in single column, you can use Series.value_counts:

df.domain.value_counts()

#"vk.com"          5
#"twitter.com"     2
#"facebook.com"    1
#"google.com"      1
#Name: domain, dtype: int64

To see how many unique values in a column, use Series.nunique:

df.domain.nunique()
# 4

To get all these distinct values, you can use unique or drop_duplicates, the slight difference between the two functions is that unique return a numpy.array while drop_duplicates returns a pandas.Series:

df.domain.unique()
# array([""vk.com"", ""twitter.com"", ""facebook.com"", ""google.com""], dtype=object)

df.domain.drop_duplicates()
#0          "vk.com"
#2     "twitter.com"
#4    "facebook.com"
#6      "google.com"
#Name: domain, dtype: object

As for this specific problem, since you"d like to count distinct value with respect to another variable, besides groupby method provided by other answers here, you can also simply drop duplicates firstly and then do value_counts():

import pandas as pd
df.drop_duplicates().domain.value_counts()

# "vk.com"          3
# "twitter.com"     2
# "facebook.com"    1
# "google.com"      1
# Name: domain, dtype: int64

Answer #3

Count distinct values, use nunique:

df["hID"].nunique()
5

Count only non-null values, use count:

df["hID"].count()
8

Count total values including null values, use the size attribute:

df["hID"].size
8

Edit to add condition

Use boolean indexing:

df.loc[df["mID"]=="A","hID"].agg(["nunique","count","size"])

OR using query:

df.query("mID == "A"")["hID"].agg(["nunique","count","size"])

Output:

nunique    5
count      5
size       5
Name: hID, dtype: int64

Answer #4

"nunique" is an option for .agg() since pandas 0.20.0, so:

df.groupby("date").agg({"duration": "sum", "user_id": "nunique"})

Answer #5

I believe this is what you want:

table.groupby("YEARMONTH").CLIENTCODE.nunique()

Example:

In [2]: table
Out[2]: 
   CLIENTCODE  YEARMONTH
0           1     201301
1           1     201301
2           2     201301
3           1     201302
4           2     201302
5           2     201302
6           3     201302

In [3]: table.groupby("YEARMONTH").CLIENTCODE.nunique()
Out[3]: 
YEARMONTH
201301       2
201302       3

Answer #6

How about either of:

>>> df
         date  duration user_id
0  2013-04-01        30    0001
1  2013-04-01        15    0001
2  2013-04-01        20    0002
3  2013-04-02        15    0002
4  2013-04-02        30    0002
>>> df.groupby("date").agg({"duration": np.sum, "user_id": pd.Series.nunique})
            duration  user_id
date                         
2013-04-01        65        2
2013-04-02        45        1
>>> df.groupby("date").agg({"duration": np.sum, "user_id": lambda x: x.nunique()})
            duration  user_id
date                         
2013-04-01        65        2
2013-04-02        45        1

Tutorials