How to randomly select rows from Pandas DataFrame

Create a simple data frame with a dictionary of lists.

Mathod # 1 : Using the method

# Import pandas package

import pandas as pd

  
# Define a dictionary containing employee data

data = { 'Name' : [ ' Jai' , 'Princi' , 'Gaurav' , ' Anuj' , 'Geeku' ],

  'Age' : [ 27 , 24 , 22 , 32 , 15 ],

'Address' : [ 'Delhi' , ' Kanpur' , 'Allahabad' , ' Kannauj' , 'Noida' ],

  'Qualification' : [ ' Msc' , 'MA' , 'MCA' , ' Phd' , '10th' ]}

  
# Convert dictionary to DataFrame

df = pd.DataFrame (data)

  
# select all columns
df

# Selects a random line using sample ()
# without specifying any parameters.

 
# Import pandas package

import pandas as pd

 
# Define a dictionary containing employee data

data = { 'Name' : [ 'Jai' , 'Princi' , ' Gaurav' , 'An uj' , 'Geeku' ],

'Age' : [ 27 , 24 , 22 , 32 , 15 ],

'Address' : [ ' Delhi' , 'Kanpur' , ' Allahabad' , 'Kannauj' , ' Noida' ],

'Qualification' : [ ' Msc' , 'MA' , ' MCA' , 'Phd' , ' 10th' ]}

 
# Convert dictionary to DataFrame

df = pd .DataFrame (data)

 
# Pick a random line using sample ()
# without specifying any parameters
df.sample ()

Output:

Example 2 Using the n option, which randomly selects n line numbers.

Select n line numbers randomly using sample (n) or sample (n = n) . Each time you run this you get n different lines.

# To get 3 random lines
# this gives 3 different rows each time

 
# df .sample (3) or

df.sample (n = 3 )

Output:

Example 3: Using the frac parameter.

You can make part of the axis elements and get lines. For example, if frac = .5 then the fetch method returns 50% of the rows.

# Line fraction

 
# here you get .50% lines

df.sample (frac = 0.5 )

Output:

Example 4:
First, 70% of the rows of the whole dataframe df are fetched and placed in another df1 dataframe, after which we select 50% frac from df1 .

Output:

Example 5: select multiple rows at random with replace = false

replace gives ra permission to select one line many times (for example). The default value for the replacement parameter of the sample () — False, so you never select more than the total number of rows.

# line fraction

< code class = "undefined spaces">  
# here you get 70% of the line from df
# put df1 in another data frame

df1 = df.sample (frac = . 7 )

 
# Now choose 50% lines from df1

df1.sample (frac = . 50 )

# Dataframe df only has 4 lines

 
# if we try to select more than 4 lines, we get an error
# Cannot take a larger sample than the population when & # 39; replace = False & # 39;

df1.sample (n = 3 , replace = False )

Output:

Example 6 Select more than n lines, where n — total number of lines using replace .

# Select more than lines with replacement
# default is False

df1.sample (n = 6 , replace = True )

Output:

Example 7. Using weights

# Weights will be reconfigured automatically

test_weights = [ 0.2 , 0.2 , 0.2 , 0.4 ]

 

df1.sample ( n = 3 , weights = test_weights)

Output:

Example 8: Using an axis

An axis takes a number or a name. The sample () method also allows users to select columns instead of rows using the axis argument.

Output:

Example 9: Using random_state

With a given DataFrame, the sample will always fetch the same rows. If random_state is None or np.random , then a randomly initialized RandomState object is returned.

# Accepts an axis number or name.

 
# sample also allows users to select columns
# instead of strings using the axis argument.

df1.sample (axis = 0 )

# With this seed, the sample will always draw the same lines.

 
# If random_state is None or np. random,
# then randomly initialized
# RandomState object is returned.

df1.sample (n = 2 , random_state = 2 )

Output:

Method # 2: Using NumPy

Numpy has chosen how much index to include for random selection and we can allow replacement.

# Pandas Import and Numpy Package

import numpy as np

import pandas as pd

 
# Define a dictionary containing employee data

data = { 'Name' : [ ' Jai' , 'Princi' , 'Gaurav' , 'Anuj' , ' Geeku' ],

'Age' : [ 27 , 24 , 22 , 32 , 15 ],

'Address' : [ ' Delhi' , ' Kanpur' , 'Allahabad' , 'Kannauj' , ' Noida' ],

'Qualification' : [ ' Msc' , 'MA' , ' MCA' , 'Phd' , ' 10th' ]}

 
# Convert dictionary to DataFrame

df = pd.DataFrame (data)

 
# Choose how much index to include for random select

chosen_idx = np.random. choice ( 4 , replace = True  , size = 6 )

 

df2 = df.iloc [chosen_idx]

 
df2

Exit:





Get Solution for free from DataCamp guru