Label encoding refers to converting labels to a numeric form in order to convert it to a machine-readable form. Machine learning algorithms can then better figure out how these labels should work. This is an important preprocessing step for a structured dataset in supervised learning.
Example:
Suppose we have a column in some dataset.
After applying the label encoding, the Height column is converted to:
where 0 — label for tall, 1 — label for middle and 2 — for short stature.
We are applying the tag encoding to the iris dataset
in the destination column, which is the View. Contains three species Iris-setosa, Iris-versicolor, Iris-virginica .
|
Exit:
array ([’Iris-setosa’,’ Iris-versicolor’, ’Iris-virginica’], dtype = object)After applying Label Encoding —
|
Exit:
array ([0, 1, 2], dtype = int64)
Label constraint Encoding
An encoding label converts data into machine-readable form, but assigns a unique number (starting at 0) to each data class. This can lead to the formation of a priority problem when training datasets. A high value label is considered to have higher priority than a lower value label.
example
Attribute having output classes Mexico , Paris , Dubai . On the Coding label of this column, let mexico be replaced with 0 , Paris replaced with 1, and Dubai is replaced by 2.
It can be interpreted that Dubai has a higher priority when training the model than Mexico and Paris , but in fact there is no such priority relationship between these cities.