  # Python | Binning method for data smoothing

NumPy | Python Methods and Functions

The binning method is used to smooth data or process noisy data. In this method, the data is first sorted and then the sorted values ​​are spread across multiple segments or cells. Because binning methods refer to a neighborhood of values, they perform local smoothing.

There are three approaches to performing smoothing:

Smoothing by bin means: In smoothing by bin means, each value in a bin is replaced by the mean value of the bin.
Smoothing by bin median: In this method each bin value is replaced by its bin median value.
Smoothing by bin boundary: In smoothing by bin boundaries, the minimum and maximum values ​​in a given bin are identified as the bin boundaries. Each bin value is then replaced by the closest boundary value.

Fit :

1. Sort an array of a given dataset.
2. Divides the range into N bins, each containing approximately the same number of samples (division by equal depth).
3. Store the mean / median / bounds in each row.
4. Examples :

` Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34  Smoothing by bin means:  - Bin 1: 9, 9, 9, 9 - Bin 2: 23, 23, 23, 23 - Bin 3: 29, 29, 29, 29  Smoothing by bin boundaries:  - Bin 1: 4, 4, 4, 15 - Bin 2: 21, 21, 25, 25 - Bin 3: 26, 26, 26, 34  Smoothing by bin median :  - Bin 1: 9 9, 9, 9 - Bin 2: 24, 24, 24, 24 - Bin 3: 29, 29, 29, 29 `

Below is Python implementation for the above algorithm —

 ` import ` ` numpy as np ` ` import ` ` math ` ` from ` ` sklearn.datasets ` ` import ` ` load_iris ` ` from ` ` sklearn ` ` import ` ` datasets, linear_model, metrics `   ` # load iris dataset ` ` dataset ` ` = ` ` load_iris () ` ` a ` ` = ` ` dataset.data ` ` b ` ` = ` ` np.zeros (` ` 150 ` ` ) `     ` # take the 1st column among the 4 columns of the dataset ` ` for ` ` i ` ` in ` ` range ` ` (` ` 150 ` `): ` ` b [i] ` ` = ` ` a [i, ` ` 1 ` `] ` ` `  ` b ` ` = ` ` np.sort ( b) ` ` # sort array `   ` # create bins ` ` bin1 ` ` = ` ` np.zeros ((` ` 30 ` `, ` ` 5 ` `)) ` ` bin2 ` ` = ` ` np.zeros ((` ` 30 ` `, ` ` 5 ` `)) ` ` bin3 ` ` = ` ` np.zeros ((` ` 30 ` `, ` ` 5 ` `)) `   ` # Ben means ` ` for ` ` i ` ` in ` ` range ` ` ( ` ` 0 ` `, ` ` 150 ` `, ` ` 5 ` `): ` k ` = ` ` int ` ` (i / 5 ) `` mean = (b [i] + b [i + 1 ] + b [i + 2 ] + b [i + 3 ] + b [i + 4 ]) / 5   for j in range ( 5 ) : bin1 [k, j] = mean print ( "Bin Mean:" , bin1)   # Border bin for i in range ( 0 , 150 , 5 ):   k = int (i / 5 ) for j in range ( 5 ): if (b [i + j] - b [i]) & lt; (b [i + 4 ] - b [i + j]): bin2 [k, j] = b [i] else : bin2 [k, j ] = b [i + 4 ]  print ( "Bin Boundaries:" , bin2)   # Ben median for i in range ( 0 , 150 , 5 ): k = int (i / 5 )   for j in range ( 5 ): bin3 [k, j] = b [i + 2 ] print ( "Bin Median:" , bin3) `