Python | Pandas.factorize ()

The pandas.factorize () method helps to get a numeric representation of an array by specifying different values. This method is available as pandas.factorize () and Series.factorize () .

Parameters:
values: 1D sequence.
sort: [bool, Default is False] Sort uniques and shuffle labels.
na_sentinel: [int, default -1] Missing Values ​​to mark `not found`.

Return: Numeric representation of array

Code: explanation of the factorize () method

# import libraries

import numpy as np

import pandas as pd

from pandas.api.types import CategoricalDtype

  

labels, uniques = pd.factorize ([ `b` , `d` , ` d` , `c` , ` a` , `c` , ` a` , `b` ])

 

print ( "Numeric Representation:" , labels)

print ( " Unique Values: " , uniques)

# sorting numbers

label1, unique1 = pd.factorize ([ `b` , `d` , ` d ` , ` c` , `a` , ` c` , `a` , ` b` ], 

  sort = True )

 

print ( "Numeric Representation:" , label1)

print ( "Unique Values:" , unique1)

# Missing values specified

label2, unique2 = pd. factorize ([ `b` , None , `d`  , `c` , None , `a` ,], 

na_sentinel = - 101 )

 

print ( "Numeric Representation:" , label2)

print ( "Unique Values:" , unique2)

# When factoring a panda object; unique will be different

a = pd. Categorical ([ `a` , ` a` , `c` ], categories = [ `a` , `b` , ` c` ])

 

label3, unique3 = pd.factorize (a)

 

print ( "Numeric Representation:" , label3)

print   ( "Unique Values:" , unique3)