Variable evaluation | set 2

Terms related to variability metrics:

 - & gt; Deviation - & gt; Variance - & gt; Standard Deviation - & gt; Mean Absolute Deviation - & gt; Meadian Absolute Deviation - & gt; Order Statistics - & gt; Range - & gt; Percentile - & gt; Inter-quartile Range 
  • Median absolute deviation: Mean absolute deviation, variance and standard deviation (discussed in the previous section) are not robust to extremes and outliers. We average the sum of the deviations from the median. 

    Example:

     Sequence: [2, 4, 6, 8] Mean = 5 Deviation around mean = [-3, -1, 1, 3] Mean Absolute Deviation = (3 + 1 + 1 + 3) / 4 

    # Median Absolute Deviation

      

    import numpy as np

     

    def mad (data):

    return np.median (np.absolute (

    data - np.median (data)))

      

    Sequence = [ 2 , 4 , 10 , 6 , 8 , 11

     

    print ( " Median Absolute Deviation: " , mad (Sequence))

      

    Output:

     Median Absolute Deviation: 3.0 
  • Statistics orders. This approach to measuring variability is based on the scatter of ranked (sorted) data.
  • Range: This is the most basic measurement related to order statistics. This is the difference between the largest and smallest value in the dataset. It is useful to know the dissemination of data, but it is very sensitive to outliers. We can do this better by dropping the extremes. 
    Example:
     Sequence: [2, 30, 50, 46, 37, 91] Here, 2 and 91 are outliers Range = 91 - 2 = 89 Range without outliers = 50 - 30 = 20 
  • Percentile: This is a very good measure for measuring data variability while avoiding outliers. P percentile in data — is the value at which the P% values ​​or less are at least less than it, and the values ​​at least (100 — P) are greater than P.
    Median — this is the 50th percentile of the data. 
    Example:
     Sequence: [2, 30, 50, 46, 37, 91] Sorted: [2, 30, 37, 46, 50, 91] 50th percentile = ( 37 + 46) / 2 = 41.5 

    Code —

    # Percentile

     

    import numpy as np

     

     

    Sequence = [ 2 , 30 , 50 , 46 , 37 , 91 ] < / code>

     

    print ( "50th Percentile:" , np.percentile (Sequence, 50 ))

     

    print ( "60th Percentile:" , np. percentile (Sequence, 60 ))

    Output:

     50th Percentile: 41.5 60th Percentile: 46.0 

  • Inter Quartile Range (IQR): works for ranked (sorted data). He has data on the division into 3 quartiles — Q1 (25- th percentile), Q2 (50- th percentile) and Q3 (75- th percentile). Interquartile range — this is the difference between Q3 and Q1.

    Example:

     Sequence: [2, 30, 50, 46, 37, 91] Q1 (25  th  percentile): 31.75 Q2 (50  th  percentile): 41.5 Q3 (75  th  percentile): 49 IQR = Q3 - Q1 = 17.25 

    Code — 1

    # Interquartile range

     

    import numpy as np

    from scipy.stats import iqr

     

    Sequence = [ 2 , 30 , 50 , 46 , 37 , 91

     

      print ( "IQR:" , iqr (Sequence))

    Output:

     IQR: 17.25 

    Code — 2

    import numpy as np

     
    # Interquartile range

    iqr = np.subtract ( * np.percentile (Sequence, [ 75 , 25 ]))

     

    print ( " IQR: " , iqr)

    Output:

     IQR: 17.25