Understanding the types of funds | Set 1

This is one of the most important concepts in statistics, an essential subject for the study of machine learning.

  • Arithmetic mean: the expectation of a discrete set of numbers or mean. 
    Marked as , pronounced "x-bar". It is the sum of all discrete values ​​in the set divided by the total number of values ​​in the set. 
    Formula for calculating the average value of n — x 1 , x 2 ,… .. x n

    Example —

     Sequence = {1 , 5, 6, 4, 4} Sum = 20 n, Total values ​​= 5 Arithmetic Mean = 20/5 = 4 

    Code —

    # Arithmetic mean

      

    import statistics 

     
    # discrete set of numbers

    data1 = [ 1 , 5 , 6 , 4 , 4

     

    x = statistics.mean (data1) 

     
    # Greedy

    print ( "Mean is:" , x) 

    Output:

     Mean is: 4 
  • Trimmed mean: mean arithmetic depends on outliers (extreme values) in the data. So the truncated mean is used during preprocessing when we process this kind of data in machine learning. 
    This is an arithmetic that has a change, i.e. it is calculated by discarding a fixed number of sorted values ​​at each end of the data sequence, and then calculating the average (average) of the remaining values. 

    Example —

     Sequence = {0, 2, 1, 3} p = 0.25 Remaining Sequemce = {2, 1} n, Total values ​​= 2 Mean = 3/2 = 1.5 

    Code —

    # Truncated mean

     

    from scipy import stats

     
    # a discrete set of numbers

    data = [ 0 , 2  , 1 , 3

      

    x = stats.trim_mean (data, 0.25 )

     
    # Greedy

    print ( "Trimmed Mean is:" , x) 

    Output:

     Trimmed Mean is: 1.5 
  • Weighted mean: mean The arithmetic or trimmed mean is equally important for all parameters involved. But whenever we work in machine learning predictions, there is a possibility that some parameter values ​​are more important than others, so we assign large weights to the values ​​of such parameters. In addition, there may be a possibility that our dataset has a highly variable parameter value, so we assign less weight to the values ​​of such parameters. 

    Example —

     Sequence = [0, 2, 1, 3] Weight = [1, 0, 1, 1] Sum (Weight * sequence) = 0 * 1 + 2 * 0 + 1 * 1 + 3 * 1 Sum (Weight) = 3 Weighted Mean = 4/3 = 1.3333333333333333 

    Code 1 —

    # Weighted average

     

    import numpy as np

      
    # discrete set of numbers

    data = [ 0 , 2 , 1 , 3

      

    x = np.average (data, weights = [ 1 , 0 , 1 , 1 ])

     
    # Greedy

    print ( "Weighted Mean is:" , x) 

    Output 1:

     Weighted Mean is: 1.3333333333333333 

    Code 2 —

    # Weighted average

     

    data = [ 0 , 2 , 1 , 3 ]

    weights = [ 1 , 0 , 1 , 1 ]

     

    x = sum (data [i] * weights [i] 

    for i in range ( len (data))) / sum (weights)

     

     

    print ( "Weighted Mean is:" , x)

    Output 2:

     Weighted Mean is: 1.3333333333333333 




Get Solution for free from DataCamp guru