Sometimes in datasets we see columns that contain numbers in no particular order of preference. The data in a column usually denotes a category or category value, and also when the data in a column is encoded as a label. This is confusing machine learning model, to avoid this, the data in the column must be encoded in One Hot format.
This refers to splitting the column which contains numeric categorical data, into multiple columns depending on the number of categories present in that column. Each column contains a "0" or "1" under which it was placed.
Consider the data that lists fruits and their respective categorical values and prices.
|Fruit||Categorical value of fruit||Price|
|apple||1||5 td >|
Output after one hot coding of data is set as follows:
Below when implemented in Python —
The following example shows customer zones and credit ratings, zone — it is a categorical value that should be hot-coded.
For one hot coding of the column zone —
The output contains 5 columns, one column for price, and the remaining 4 columns represent 4 zones.
One hot encoder only accepts numeric categorical values, so any string type value must be encoded in the label before one hot encoder.
The example below contains geography and customer field data that should be encoded first.
Label encoding data —
One Hot Encoding Gender and Geography Columns —
The output contains 5 columns, 2 columns representing gender, male and female, and the remaining 3 columns represent the countries France, Germany and Spain.
This book is not just about learning the code; even if you learn to program. If you want to program professionally, learning to code is not enough; For this reason, in addition to helping you program,...
It would be easy for me to develop native apps using Java, C++ or Objective-C and I am also able to learn Kotlin, Dart or Swift, but things are much easier when you just use Python. I have done a Djan...
As the title promises, this book will introduce you to one of the world’s most popular programming languages: Python. It’s aimed at beginning programmers as well as more experienced programmers wh...
Python Data Science Handbook: Essential Tools for Working with Data - PDF, 1st Edition For many researchers, Python is a first-class tool, primarily because of its libraries for storing, manipulati...