Clear string data in specified Pandas Dataframe



Suppose we are dealing with data from an e-commerce site. Product names are not in the correct format. Format the data correctly so that there are no leading and trailing spaces, and the first letters of all products are capitalized.

Solution # 1: In many cases, we are faced with a situation where we you need to write your own custom function suitable for the task at hand.

# import pandas as pd

import pandas as pd

  
# Create data frame

df = pd.DataFrame ({ `Date` : [ `10/2 / 2011` , ` 11/2 / 2011` ,  `12/2 / 2011` , ` 13/2 / 2011` ],

`Product` : [ `UMbreLla` , ` maTress` , `BaDmintoN` , `Shuttle ` ],

  ` Updated_Price` : [ 1250 , 1450 , 1550 , 400 ],

`Discount` : [ 10 < code class = "plain">, 8 , 15 , 10 ]})

 
# Print the data frame

print (df)

Output:

Now we will write our own a custom function to solve this problem.

def Format_data (df):

# iterate over all lines

for i in range (df.shape [ 0 ]):

 

# reassign values ​​to the product column

# first remove the spaces using the strip () function

# then we capitalize with the capitalize () function

df.iat [i, 1 ] = df.iat [i, 1 ]. strip (). capitalize ( )

 
# Let`s go to call the function
Format_data (df)

 
# Print the Dataframe

print (df)

Output:

Solution # 2: Now we will see a better and more efficient approach using the Pandas function DataFrame.apply () .

# import pandas as pd

import pandas as pd

  
# Create data frame

df = pd.DataFrame ( {` ` Date` : [ `10/2 / 2011` , `11/2 / 2011` , `12/2 / 2011` , ` 13/2 / 2011` ],

`Product` : [ `UMbreLla` , ` maTress` , `BaDmintoN` , `Shuttle` ],

  `Updated_Price` : [ 1250 , 1450 , 1550 , 400 ],

`Discount` : [ 10 , 8 , 15 , 10 ]})

 
# Print the data frame

print (df)

Output:

Let`s use DataFrame.apply () Pandas DataFrame.apply () to format Product names in the desired format. Inside the Pandas DataFrame.apply () function, we`ll use a lambda function.

# Using the df.apply () function on a column product

df [ `Product` ] = df [ `Product` ]. apply ( lambda x: x.strip (). capitalize ())

 
# Print the Dataframe

print (df)

Output:
< / p>