Change language

Replacing strings with numbers in Python for data analysis

| |

Note: Before executing, create an example.csv file containing some names and gender

Let’s say we have a table with names and genders. There are two categories in the gender column, male and female, and suppose we want to assign 1 male and 2 female.

Examples:

 Input: -------- ------------- | Name | Gender --------------------- 0 Ram Male 1 Seeta Female 2 Kartik Male 3 Niti Female 4 Naitik Male Output: | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 1:

 To create a dictionary containing two elements with following key-value pair: Key Value male 1 female 2 

Then repeat the for loop on the Gender column of the object DataFrame and replace the values ​​where the keys are.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# read mode

file_handler = open ( "example.csv" , "r" )

 
# creating a Pandas DataFrame
# using the read_csv function
# which reads from a CSV file.

data = pd.read_csv (file_handler, sep = ", " )

 
# closing the file handler
file_handler.close ()

 
# create dict file

gender = { ’male’ : 1 , ’ female’ : 2 }

 
# traversal through data frame
# Column and letter gender
# values ​​where the key matches

data.Gender = [gender [item] for item in data.Gender]

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 2:
Method 2 is also similar, but does not require a dictionary file and takes fewer lines of code. In this, we internally iterate over the Gender DataFrame column and change the values ​​if the condition matches.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# reading mode

file_handler = open ( "example.csv" , "r" )

 
# create a Pandas DataFrame
# use using the read_csv function, which
# reads from a CSV file.

data = pd.read_csv (file_handler, sep = "," )

 
# close the file handler
file_handler.close ()

 
# passing through the floor
# dataFrame column and
# write values ​​where
# the condition matches.

data.Gender [data.Gender = = ’male’ ] = 1

data.Gender [data.Gender = = ’ female’ ] = 2

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Applications

  1. This method can be applied in data science. Suppose that if we are working with a dataset that contains gender as "male" and "female", then we can assign numbers, for example, "0" and "1" respectively, so that our algorithms can work with the data.
  2. This method can also be used to replace some specific values ​​in datasets with new values.

References

Replacing strings with numbers in Python for data analysis File handling: Questions

Replacing strings with numbers in Python for data analysis Python functions: Questions

Shop

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Best laptop for Zoom

$499

Best laptop for Minecraft

$590

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News

Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method