Binning. Binning methods smooth the sorted data values ​​by referring to their "neighborhood", that is, the values ​​around them.
Regression: matches function data values. Linear regression involves finding the "best‚" line to match two attributes (or variables) so that one attribute can be used to predict the other.
Analyzing outliers : outliers can be detected by clustering, for example when similar values ​​are organized into groups or "clusters". Intuitively, values ​​that fall outside the cluster set can be considered outliers.
Binning method for data smoothing — Here we are dealing with Binning’s method for data smoothing. In this method, the data is first sorted and then the sorted values ‚Äã‚Äãare spread across multiple segments or cells . Since binning methods refer to a neighborhood of values, they perform local smoothing.
There are basically two types of binning —
Binning is the same width (or distance). The simplest approach is to divide the variable range into k intervals of equal width. Spacing width — it’s just the range [A, B] of the variable divided by k,
w = (BA) / k
Thus, the interval of the i- th interval will be [A + (i-1) w, A + iw] where i = 1, 2, 3… ..k Skewed data cannot be handled well with this method.
Binning of equal depth (or frequency): In binning of equal frequency, we divide the range [A, B] of a variable into intervals that contain (approximately) equal points; equal frequency may not be possible due to duplicate values.
How to smooth the data?
There are three approaches to performing smoothing:
Bin smoothing mean s: when bin smoothing, each value in the bin is replaced by the bin’s mean .
Bin-mean -median-mode-in-python-without-libraries/">median smoothing: in this method each bin value is replaced by its bin mean -median-mode-in-python-without-libraries/">median value.
Bin Smoothing: When bin boundary smoothing, the minimum and maximum values ​​in a given bin are defined as bin boundaries. Each bin value is then replaced with the closest cutoff value.
Sorted data by price (in dollars): 2, 6, 7, 9, 13, 20, 21, 25, 30
Partition using equal frequency approach: Bin 1: 2, 6, 7 Bin 2: 9, 13, 20 Bin 3: 21, 24, 30 Smoothing by bin mean : Bin 1: 5, 5, 5 Bin 2: 14 , 14, 14 Bin 3: 25, 25, 25 Smoothing by bin mean -median-mode-in-python-without-libraries/">median: Bin 1: 6, 6, 6 Bin 2: 13, 13, 13 Bin 3: 24, 24, 24 Smoothing by bin boundary: Bin 1: 2 , 7, 7 Bin 2: 9, 9, 20 Bin 3: 21, 21, 30
Binning can also be used as a sampling method ... Here discretization refers to the process of transforming or breaking down continuous attributes, features, or variables into discrete or nominal attributes / features / variables / intervals. For example, attribute values ​​can be sampled by applying equal width or equal frequency binning and then replacing each bin value with the mean or mean -median-mode-in-python-without-libraries/">median bin, as in antialiasing by mean bin value or smoothing by bin mean -median-mode-in-python-without-libraries/">medians, respectively. Continuous values ​​can then be converted to a nominal or sampled value that matches the corresponding bin value.
Maybe there are another answers? What ML | Binning or Discretization exactly means?. Will use it in my bachelor thesis
Manuel OConnell
Tallinn | 2023-03-25
Maybe there are another answers? What ML | Binning or Discretization exactly means?. I am just not quite sure it is the best method
Angelo Lehnman
Massachussetts | 2023-03-25
Simply put and clear. Thank you for sharing. ML | Binning or Discretization and other issues with StackOverflow was always my weak point 😁. Will get back tomorrow with feedback
Shop
Learn programming in R: courses
$FREE
Best Python online courses for 2022
$FREE
Best laptop for Fortnite
$399+
Best laptop for Excel
$
Best laptop for Solidworks
$399+
Best laptop for Roblox
$399+
Best computer for crypto mining
$499+
Best laptop for Sims 4
$
Latest questions
PythonStackOverflow
Common xlabel/ylabel for matplotlib subplots
1947 answers
PythonStackOverflow
Check if one list is a subset of another in Python
1173 answers
PythonStackOverflow
How to specify multiple return types using type-hints
1002 answers
PythonStackOverflow
Printing words vertically in Python
909 answers
PythonStackOverflow
Python Extract words from a given string
798 answers
PythonStackOverflow
Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?