There are three data smoothing methods:
Binning method for data smoothing —
Here we are dealing with Binning`s method for data smoothing. In this method, the data is first sorted and then the sorted values are spread across multiple segments or cells . Since binning methods refer to a neighborhood of values, they perform local smoothing.
There are basically two types of binning —
w = (BA) / k
Thus, the interval of the i ^{ th } interval will be [A + (i1) w, A + iw]
where i = 1, 2, 3… ..k
Skewed data cannot be handled well with this method.
There are three approaches to performing smoothing:
Sorted data by price (in dollars): 2, 6, 7, 9, 13, 20, 21, 25, 30
Partition using equal frequency approach: Bin 1: 2, 6, 7 Bin 2: 9, 13, 20 Bin 3: 21, 24, 30 Smoothing by bin mean: Bin 1: 5, 5, 5 Bin 2: 14 , 14, 14 Bin 3: 25, 25, 25 Smoothing by bin median: Bin 1: 6, 6, 6 Bin 2: 13, 13, 13 Bin 3: 24, 24, 24 Smoothing by bin boundary: Bin 1: 2 , 7, 7 Bin 2: 9, 9, 20 Bin 3: 21, 21, 30
Binning can also be used as a sampling method ... Here discretization refers to the process of transforming or breaking down continuous attributes, features, or variables into discrete or nominal attributes / features / variables / intervals.
For example, attribute values can be sampled by applying equal width or equal frequency binning and then replacing each bin value with the mean or median bin, as in antialiasing by mean bin value or smoothing by bin medians, respectively. Continuous values can then be converted to a nominal or sampled value that matches the corresponding bin value.
Below is the Python implementation:

bin_median
import
numpy as np
from sklearn.linear_model
import
LinearRegression
from
sklearn
import
linear_model
# import statsmodels.api as sm
import
statistics
import
math
from
collections
import
OrderedDict
x
=
[]
print
(
"enter the data"
)
x
=
list
(
map
(
float
,
input
(). split ()))
print
(
" enter the number of bins "
)
bi
=
int
(
input
())
# X_dict will store data in sorted order
X_dict
=
OrderedDict ()
# x_old will store the original data
x_old
=
{}
# x_new will store data after binning
x_new
=
{}
for
i
in
range
(
len
(x)) :
X_dict [i]
=
x [i]
x_old [ i]
=
x [i]
x_dict
=
sorted
(X_dict.items (), key
=
lambda
x: x [
1
])
# list of lists (bins)
binn
=
[]
# variable to find the average of each bin
avrg
=
[]
i
=
0
k
=
0
num_of_data_in_each_bin
=
int
(math.ceil (
len
(x)
/
bi))
# executing binning
for
g, h
in
X_dict.items ():
if
(i & lt; num_of_data_in_each_bin):
avrg.append (h)
i
=
i
+
1
elif
(i
=
=
num_of_data_in_each_bin):
k
=
k
+
1
i
=
0
binn. append (statistics.median (avrg))
avrg
=
[]
avrg.append (h)
i
=
i
+
1
binn.append (statistics.median (avrg))
# save the new value of each of the data
i
=
0
j
=
0
for
g, h
in
X_dict.items ():
if
(i & lt; num_of_data_in_each_bin):
x_new [g]
=
round
(binn [j],
3
)
i
=
i
+
1
else
:
i
= 0
j
=
j
+
1
x_new [g]
=
round
(binn [j],
3
)
i
=
i
+
1
print
(
"number of data in each bin"
)
print
(math.ceil (
len (x)
/
bi))
for
i
in
range
(
0
, len
(x)):
print
(
`index {2} old value {0} new value {1} `
.
format
(x_old [i], x_new [i], i))
bin_boundary
import
numpy as np
from
sklearn.linear_model
import
LinearRegression
from
sklearn
import
linear_model
# import statsmodels.api as sm
import
statistics
import
math
from
collections
import
OrderedDict
x
=
[]
print
(
"enter the data"
)
x
=
list
( map
(
float
,
input
(). split ()))
print
(
" enter the number of bins "
)
bi
=
int
(
input
())
# X_dict will store data in sorted order
X_dict
=
OrderedDict ()
# x_old will store the original data
x_old
=
{}
# x_new will store data after binning
x_new
= {}
for
i
in
range
(
len
( x)):
X_dict [i]
=
x [i]
x_old [i]
=
x [i]
x_dict
=
sorted
(X_dict.items (), key
=
lambda
x: x [
1
] )
# list of lists (bins)
binn
=
[]
# variable to find the average of each bin
avrg
=
[]
i
=
0
k
=
0
num_of_data_in_each_bin =
int
(math.ceil (
len
(x)
/
bi))
for
g, h
in
X_dict.items ():
if
(i & lt; num_of_data_in_each_bin) :
avrg.append (h)
i
=
i
+
1
elif
(i
=
= num_of_data_in_each_bin):
k
=
k
+
1
i
=
0
code class = "undefined spaces">
x_old [i]
=
x [i]
x_dict
=
sorted
(X_dict.items (), key
=
lambda
x: x [
1
])
Books for developers
Machine Learning in Finance: From Theory to Practice
This book introduces machine learning methods in finance. It features a unified treatment of machine learn...
12/08/2021
INTRODUCTION TO NUMERICAL PROGRAMMING
Taking into account the development of modern programming, especially the emerging programming languages that reflect modern practice, Numerical Programming: A Practical Guide for Scientists and...
08/08/2021
Raspberry Pi For Dummies 4th Edition
A recipe for having fun and getting things done with the Raspberry Pi
...
12/08/2021
Python: The Bible
Python:  The Bible  3 Manuscripts in 1 book:
 Python Programming For Beginners
 Python Programming For Intermediates
 Python Programming for Advanced
...
12/08/2021
Get Solution for free from DataCamp guru
X
Submit new EBook