Replacing strings with numbers in Python for data analysis



Note: Before executing, create an example.csv file containing some names and gender

Let`s say we have a table with names and genders. There are two categories in the gender column, male and female, and suppose we want to assign 1 male and 2 female.

Examples:

 Input: -------- ------------- | Name | Gender --------------------- 0 Ram Male 1 Seeta Female 2 Kartik Male 3 Niti Female 4 Naitik Male Output: | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 1:

 To create a dictionary containing two elements with following key-value pair: Key Value male 1 female 2 

Then repeat the for loop on the Gender column of the object DataFrame and replace the values ​​where the keys are.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# read mode

file_handler = open ( "example.csv" , "r" )

 
# creating a Pandas DataFrame
# using the read_csv function
# which reads from a CSV file.

data = pd.read_csv (file_handler, sep = ", " )

 
# closing the file handler
file_handler.close ()

 
# create dict file

gender = { `male` : 1 , ` female` : 2 }

 
# traversal through data frame
# Column and letter gender
# values ​​where the key matches

data.Gender = [gender [item] for item in data.Gender]

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 2:
Method 2 is also similar, but does not require a dictionary file and takes fewer lines of code. In this, we internally iterate over the Gender DataFrame column and change the values ​​if the condition matches.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# reading mode

file_handler = open ( "example.csv" , "r" )

 
# create a Pandas DataFrame
# use using the read_csv function, which
# reads from a CSV file.

data = pd.read_csv (file_handler, sep = "," )

 
# close the file handler
file_handler.close ()

 
# passing through the floor
# dataFrame column and
# write values ​​where
# the condition matches.

data.Gender [data.Gender = = `male` ] = 1

data.Gender [data.Gender = = ` female` ] = 2

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Applications

  1. This method can be applied in data science. Suppose that if we are working with a dataset that contains gender as “male” and “female”, then we can assign numbers, for example, “0” and “1” respectively, so that our algorithms can work with the data.
  2. This method can also be used to replace some specific values ​​in datasets with new values.

References