Removing rows from a data frame based on specific conditions applied to a column

Python Methods and Functions

We`ve already discussed

Solution # 1: We will use vectorization to filter rows from the dataset that satisfy the applicable condition.

# import pandas as pd

import pandas as pd

 
# Read the CSV file and create
# dataframe

df = pd.read_csv ( `nba.csv` )

 
# Render data frame

print (df.head ( 15 )

 
# Print info frame form

print (df. shape)

Output:

In this data frame, we now have 458 rows and 9 columns. Let`s use a vectorize operation to filter out all those rows that match a given condition.

# Filter all lines for which the player
# age is greater than or equal to 25

df_filtered = df [df [ `Age` ] & gt; = 25 ]

 
# Print new data frame

print (df_filtered.head ( 15 )

 
# Print information form frame

print (df_filtered.shape)

Output:


As we can see from the output, the returned data frame contains only players aged 25 or over.

Solution # 2: We can use DataFrame.drop () to remove such lines that do not meet this condition.

# import pandas as pd

import pandas as pd

 
# Read the CSV file and create
# dataframe

df = pd.read_csv ( ` nba.csv` )

 
# First, filter out those lines that
# does not contain any data

df = df.dropna ( how = `all` )

 
# Filter all lines for which the player
# age is greater than or equal to 25

df.drop (df [df [ ` Age` ] & lt;  25 ]. index, inplace = True )

 
# Print the modified data frame

print (df.head ( 15 ))

 
# Print info frame form

print (df. shape)

Output:


As we can see from the output, we have successfully discarded all those rows that do not satisfy the given condition applied to the column at "Age".





Get Solution for free from DataCamp guru