How to randomly select rows from Pandas DataFrame



Create a simple data frame with a dictionary of lists.

Mathod # 1 : Using the method

# Import pandas package

import pandas as pd

  
# Define a dictionary containing employee data

data = { `Name` : [ ` Jai` , `Princi` , `Gaurav` , ` Anuj` , `Geeku` ],

  `Age` : [ 27 , 24 , 22 , 32 , 15 ],

`Address` : [ `Delhi` , ` Kanpur` , `Allahabad` , ` Kannauj` , `Noida` ],

  `Qualification` : [ ` Msc` , `MA` , `MCA` , ` Phd` , `10th` ]}

  
# Convert dictionary to DataFrame

df = pd.DataFrame (data)

  
# select all columns
df

# Selects a random line using sample ()
# without specifying any parameters.

 
# Import pandas package

import pandas as pd

 
# Define a dictionary containing employee data

data = { `Name` : [ `Jai` , `Princi` , ` Gaurav` , `Anuj` , ` Geeku` ],

`Age` : [ 27 , 24 , 22 , 32 , 15 ],

`Address` : [ ` Delhi` , ` Kanpur` , `Allahabad` , `Kannauj` , ` Noida` ],

`Qualification` : [ ` Msc` , `MA` , ` MCA` , `Phd` , ` 10th` ]}

 
# Convert dictionary to DataFrame

df = pd.DataFrame (data)

 
# Pick a random line using sample ( )
# without specifying any parameters
df.sample ()

Output:

Example 2. Using the n option, which randomly selects n line numbers.

Select n line numbers at random using sample (n) or sample (n = n) . Each time you run this, you get n different lines.

# To get 3 random lines
# this gives 3 different rows each time

 
# df .sample (3) or

df.sample (n = 3 )

Output:

Example 3: Using the frac parameter.

You can make part of the axis elements and get lines. For example, if frac = .5 then the fetch method returns 50% of the rows.

# Line fraction

 
# here you get .50% lines

df.sample (frac = 0.5 )

Output:

Example 4:
First, 70% of the rows of the whole dataframe df are fetched and placed in another df1 dataframe, after which we select 50% frac from df1 .

# line fraction

 

 
# here you get 70% of the line from df
# make put in another data frame df1

df1 = df.sample (frac = . 7 )

 
# Now select 50% of the rows from df1

df1.sample (frac = . 50 )

Output:

Example 5: select multiple lines at random with replace = false

parameter replace d Gives permission to select one row many times (for example). The default value for the replacement parameter of the sample () method — False, so you never select more than the total number of rows.

# Dataframe df only has 4 lines

 
# if we try to select more than 4 lines, an error will come
# Cannot take a larger sample than the population when & # 39; replace = False & # 39;

df1.sample (n = 3 , replace = False )

Output:

Example 6 Select more than n lines, where n — total number of lines using replace .

# Select more than lines with replacement
# default is False

df1.sample (n = 6 , replace = True )

Output:

Example 7. Using weights

# Weights will be reconfigured automatically

test_weights = [ 0.2 , 0.2 , 0.2 , 0.4 ]

 

df1.sample (n = 3 , weights = test_weights)

Output:

Example 8: Using an axis

An axis takes a number or a name. The sample () method also allows users to select columns instead of rows using the axis argument.

Output:

Example 9: Using random_state

With a given DataFrame, the sample will always fetch the same rows. If random_state is None or np.random , then a randomly initialized RandomState object is returned.

# Accepts an axis number or name.

 
# sample also allows users to select columns
# instead of strings using the axis argument.

df1.sample (axis = 0 )

# With this seed the sample will always draw the same lines.

 
# If random_state is None or np. random,
# then randomly initialized
# RandomState object is returned.

df1.sample (n = 2 , random_state = 2 )

Output:

С tutorial # 2: Using NumPy

Numpy has chosen how much index to include for random selection, and we can allow replacement.

# Pandas Import and Numpy Package

import numpy as np

import pandas as pd

  
# Define a dictionary containing employee data

data = { `Name` : [ `Jai` , ` Princi` , `Gaurav` , ` Anu j` , `Geeku` ],

`Age` : [ 27 , 24 , 22 , 32 , 15 ],

`Address` : [ ` Delhi` , `Kanpur` , ` Allahabad` , `Kannauj` , ` Noida` ],

`Qualification` : [ ` Msc` , `MA` , ` MCA` , `Phd` , ` 10th` ]}

 
# Convert dictionary in DataFrame

df = pd. DataFrame (data)

 
# Choose how much index to include for random selection

chosen_idx = np.random.choice ( 4 , replace = True , size = 6 )

 

df2 = df.iloc [chosen_idx]

 
df2

Output: