Change language

Replacing strings with numbers in Python for data analysis

| |

Note: Before executing, create an example.csv file containing some names and gender

Let’s say we have a table with names and genders. There are two categories in the gender column, male and female, and suppose we want to assign 1 male and 2 female.

Examples:

 Input: -------- ------------- | Name | Gender --------------------- 0 Ram Male 1 Seeta Female 2 Kartik Male 3 Niti Female 4 Naitik Male Output: | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 1:

 To create a dictionary containing two elements with following key-value pair: Key Value male 1 female 2 

Then repeat the for loop on the Gender column of the object DataFrame and replace the values ​​where the keys are.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# read mode

file_handler = open ( "example.csv" , "r" )

 
# creating a Pandas DataFrame
# using the read_csv function
# which reads from a CSV file.

data = pd.read_csv (file_handler, sep = ", " )

 
# closing the file handler
file_handler.close ()

 
# create dict file

gender = { ’male’ : 1 , ’ female’ : 2 }

 
# traversal through data frame
# Column and letter gender
# values ​​where the key matches

data.Gender = [gender [item] for item in data.Gender]

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Method 2:
Method 2 is also similar, but does not require a dictionary file and takes fewer lines of code. In this, we internally iterate over the Gender DataFrame column and change the values ​​if the condition matches.

# import pandas library

import pandas as pd

 
# create a file handler for
# our example.csv file in
# reading mode

file_handler = open ( "example.csv" , "r" )

 
# create a Pandas DataFrame
# use using the read_csv function, which
# reads from a CSV file.

data = pd.read_csv (file_handler, sep = "," )

 
# close the file handler
file_handler.close ()

 
# passing through the floor
# dataFrame column and
# write values ​​where
# the condition matches.

data.Gender [data.Gender = = ’male’ ] = 1

data.Gender [data.Gender = = ’ female’ ] = 2

print (data)

Output:

 | Name | Gender --------------------- 0 Ram 1 1 Seeta 2 2 Kartik 1 3 Niti 2 4 Naitik 1 

Applications

  1. This method can be applied in data science. Suppose that if we are working with a dataset that contains gender as "male" and "female", then we can assign numbers, for example, "0" and "1" respectively, so that our algorithms can work with the data.
  2. This method can also be used to replace some specific values ​​in datasets with new values.

References

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically