Interquartile range and quartile deviation using NumPy and SciPy



Quartile search algorithm:
Quartiles are calculated using the median. If the number of records is an even number, i.e. 2n, then the first quartile (Q1) is equal to the median of n smallest records, and the third quartile (Q3) is equal to the median of n largest records.

If the number of records is odd, that is, in the form (2n + 1), then

  • the first quartile (Q1) is the median of n smallest records
  • third quartile (Q1) is the median of n largest records
  • second quartile (Q2) is the same, like a normal median.

Range: is the difference between the largest value and the smallest value in a given dataset. 
Interquartile range:
Interquartile range (IQR), also called mean or mean 50% , or technically H-spread — it is the difference between the third quartile (Q3) and the first quartile (Q1). It covers the distribution center and contains 50% of the observations.  IQR = Q3 — Q1

Uses :

  • The interquartile range has a breakdown point of 25%, which is why it is often preferred over the entire range.
  • IQR is used to plot box plots, simple graphical representations of probability distributions.
  • IQR can also be used to identify outliers in a given dataset.
  • IQR gives the central trend of the data .

Make a decision

  • The dataset has a higher interquartile range (IQR) and more variability.
  • A dataset with a lower interquartile range (IQR) is preferred.

Suppose that if we have two datasets and their interquartile ranges are IR1 and IR2, and if IR1 & gt; IR2, it is said that the data in IR1 has more variability than the data in IR2, and the data in IR2 is preferable.

Example :

  • Below is the number of candidates enrolled each day in the last 20 days for the course — 
    Data Structures and Algorithms —  DSA Online 3 in Python.Engineering
    75, 69, 56, 46, 47, 79, 92, 97, 89, 88, 36, 96, 105, 32, 116, 101, 79, 93, 91, 112
  • After sorting the above dataset:
    32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97, 101, 105, 112, 116
  • The total number of terms here is 20.
  • The second quartile (Q2) or median of the above data is (88 + 89) / 2 = 88.5
  • First quartile (Q1) is the median of the first n, that is, 10 terms (or n, that is, 10 smallest values) = 62.5
  • Third quartile (Q3) — this is the median n, i.e. The 10 largest values ​​(or the last n, i.e. 10 values) = 96.5.
  • Then IQR = Q3 — Q1 = 96.5 — 62.5 = 34.0

Interquartile range using numpy.median

# Import numpy library as np

import numpy as np

 

data = [ 32 , 36 , 46 , 47 , 56 , 69 , 75 , 79 , 79 , 88 , 89 , 91 , 92 , 93 , 96 , 97

101 , 105 , 112 , 116 ]

 
# First quartile (Q1)

Q1 = np.median (data [: 10 ])

  
# Third quartile (Q3)

Q3 = np.median (data [ 10 :])

 
# Interquartile range (IQR)

IQR = Q3 - Q1

 

print (IQR)

  Output:  34.0 

Interquartile range using numpy.percentile

# Import digital library

 

import numpy as np

  

data = [ 32 , 36 , 46 , 47 , 56 , 69 , 75 , 79 , 79 , 88 , 89 , 91 , 92 , 93 < / code> , 96 , 97

101 , 105 , 112 , 116 ]

 
# First quartile (Q1)

Q1 = np.percentile (data, 25 , interpolation = `midpoint` )

 
# Third quartile (Q3)

Q3 = np.percentile (data, 75 , interpolation = ` midpoint` )

  
# Inter-apartment range (IQR)

IQR = Q3 - Q1

 

print (IQR)

  Output:  34.0  

Interquartile range using scipy.stats.iqr

# Import statistics from the Scipy library

from scipy import stats

 

data = [ 32 , 36 , 46 , 47 , 56 , 69 , 75 , 79 , 79 , 88 , 89 , 91 , 92 , 93 , 96 , 97

101 , 105 , 112 , 116 ]

 
# Interquartile range (IQR)

IQR = stats.iqr (data, interpolation = ` midpoint` )

  

print (IQR)

  Output:  34.0 

< strong> Quartile Deviation
Quartile Deviation — this is half the difference between the third quartile (Q3) and the first quartile (Q1), i.e. half of the interquartile range (IQR).  (Q3 — Q1) / 2 = IQR / 2

Make a decision
Dataset with higher quartile deviation , has higher volatility.

Quartile deflection using numpy.median

# import the numpy library as np

import numpy as np

 

data = [ 32 , 36 , 46 , 47 , 56 , 69 , 75 , 79 , 79 , 88 , 89 , 91 , 92 , 93 , 96 , 97

101 , 105 , 112 , 116 ]

 
# First quartile (Q1)

Q1 = np.median (data [: 10 ])

 
# Third quartile (Q3)

Q3 = np.median (data [ 10 :])

 
# Interquartile range (IQR)

IQR = Q3 - Q1

  
# Quartile Deviation

qd = IQR / 2

  

print (qd) 

  Output:  17.0