# ML | Binning or Discretization

| | | | | | | | | | | | | | | | | | | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

There are three data smoothing methods:

1. Binning. Binning methods smooth the sorted data values ‚Äã‚Äãby referring to their "neighborhood", that is, the values ‚Äã‚Äãaround them.
2. Regression: matches function data values. Linear regression involves finding the "best‚" line to match two attributes (or variables) so that one attribute can be used to predict the other.
3. Analyzing outliers : outliers can be detected by clustering, for example when similar values ‚Äã‚Äãare organized into groups or "clusters". Intuitively, values ‚Äã‚Äãthat fall outside the cluster set can be considered outliers.

Binning method for data smoothing —
Here we are dealing with Binning’s method for data smoothing. In this method, the data is first sorted and then the sorted values ‚Äã‚Äãare spread across multiple segments or cells . Since binning methods refer to a neighborhood of values, they perform local smoothing.

There are basically two types of binning —

1. Binning is the same width (or distance). The simplest approach is to divide the variable range into k intervals of equal width. Spacing width — it’s just the range [A, B] of the variable divided by k,
` w = (BA) / k `

Thus, the interval of the i- th interval will be ` [A + (i-1) w, A + iw] ` where i = 1, 2, 3‚Ä¶ ..k
Skewed data cannot be handled well with this method.

2. Binning of equal depth (or frequency): In binning of equal frequency, we divide the range [A, B] of a variable into intervals that contain (approximately) equal points; equal frequency may not be possible due to duplicate values.

#### How to smooth the data?

There are three approaches to performing smoothing:

1. Bin smoothing mean s: when bin smoothing, each value in the bin is replaced by the bin’s mean .
2. Bin-mean -median-mode-in-python-without-libraries/">median smoothing: in this method each bin value is replaced by its bin mean -median-mode-in-python-without-libraries/">median value.
3. Bin Smoothing: When bin boundary smoothing, the minimum and maximum values ‚Äã‚Äãin a given bin are defined as bin boundaries. Each bin value is then replaced with the closest cutoff value.

Sorted data by price (in dollars): 2, 6, 7, 9, 13, 20, 21, 25, 30

` Partition using equal frequency approach: Bin 1: 2, 6, 7 Bin 2: 9, 13, 20 Bin 3: 21, 24, 30 Smoothing by bin  mean : Bin 1: 5, 5, 5 Bin 2: 14 , 14, 14 Bin 3: 25, 25, 25 Smoothing by bin mean -median-mode-in-python-without-libraries/">median: Bin 1: 6, 6, 6 Bin 2: 13, 13, 13 Bin 3: 24, 24, 24 Smoothing by bin boundary: Bin 1: 2 , 7, 7 Bin 2: 9, 9, 20 Bin 3: 21, 21, 30 `

Binning can also be used as a sampling method ... Here discretization refers to the process of transforming or breaking down continuous attributes, features, or variables into discrete or nominal attributes / features / variables / intervals.
For example, attribute values ‚Äã‚Äãcan be sampled by applying equal width or equal frequency binning and then replacing each bin value with the mean or mean -median-mode-in-python-without-libraries/">median bin, as in antialiasing by mean bin value or smoothing by bin mean -median-mode-in-python-without-libraries/">medians, respectively. Continuous values ‚Äã‚Äãcan then be converted to a nominal or sampled value that matches the corresponding bin value.

Below is the Python implementation:

bin_ mean

 ` import ` ` numpy as np ` ` from ` ` sklearn.linear_model ` ` import ` ` LinearRegression ` ` from ` ` sklearn ` ` import ` ` linear_model ` ` # import statsmodels.api as sm ` ` import ` ` statistics ` ` import ` ` math ` ` from ` ` collections ` ` import ` ` OrderedDict ` ` x ` ` = ` ` [] ` ` print ` ` (` ` "enter the data" ` `) ` ` x ` ` = ` ` list ` ` (` ` map ` ` (` ` float ` `, ` ` input ` ` (). split ())) ` ` print ` ` (` ` "enter the number of bins" ` `) ` ` bi ` ` = ` ` int ` ` (` ` input ` ` ()) ` ` # X_dict will store data in sorted order ` ` X_dict ` ` = ` ` OrderedDict () ` ` # x_old will store the original data ` ` x_old ` ` = ` ` {} ` ` # x_new will store data after binning ` ` x_new ` ` = ` ` {} ` ` for ` ` i ` ` in ` ` range ` ` (` ` len ` ` (x)): ` ` X_dict [i] ` ` = ` ` x [i] ` ` x_old [i] ` ` = ` ` x [i] ` ` x_dict ` ` = ` ` sorted ` ` (X_dict.items (), key ` ` = ` ` lambda ` ` x: x [` ` 1 ` `]) ` ` # list to lists (bins) ` ` binn ` ` = ` ` [] ` ` # variable to find the average of each bin ` ` avrg ` ` = ` ` 0 ` ` i ` ` = ` ` 0 ` ` k ` ` = ` ` 0 ` ` num_of_data_in_each_bin ` ` = ` ` int ` ` (math.ceil (` ` len ` ` (x) ` ` / ` ` bi)) ` ` # executing binning ` ` for ` ` g, h ` ` in ` ` X_dict.items (): ` ` if ` ` (i & lt; num_of_data_in_each_bin): ` ` avrg ` ` = ` ` avrg ` ` + ` ` h ` ` i ` ` = ` ` i ` ` + ` ` 1 ` ` elif ` ` (i ` ` = ` ` = ` ` num_of_data_in_each_bin): ` ` k ` ` = ` ` k ` ` + ` ` 1 ` ` i ` ` = ` ` 0 ` ` binn.append (` ` round ` ` (avrg ` ` / ` ` num_of_data_in_each_bin, ` ` 3 ` `)) ` ` avrg ` ` = ` ` 0 ` ` avrg ` ` = ` ` avrg ` ` + ` ` h ` ` ` ` i ` ` = ` ` i ` ` + ` ` 1 ` ` rem ` ` = ` ` len ` ` (x) ` `% ` ` bi ` ` if ` ` (rem ` ` = ` ` = ` ` 0 ` `): ` ` ` ` binn.append (` ` round ` ` (avrg ` ` / ` ` num_of_data_in_each_bin, ` ` 3 ` `)) ` ` else ` `: ` ` binn.append (` ` round ` ` (avrg ` ` / ` ` rem, ` ` 3 ` `)) ` ` # save the new value of each data ` ` i ` ` = ` ` 0 ` ` j ` ` = ` ` 0 ` ` for ` ` g, h ` ` in ` ` X_dict.items (): ` ` ` ` if ` ` (i & lt; num_of_data_in_each_bin): ` ` x_new [g] ` ` = ` ` binn [j] ` ` i ` ` = ` ` i ` ` + ` ` 1 ` ` else ` `: ` ` i ` ` = ` ` 0 ` ` j ` ` = ` ` j ` ` + ` ` 1 ` ` x_new [g] ` ` = ` ` binn [j] ` ` i ` ` = ` ` i ` ` + ` ` 1 ` ` print ` ` (` `" number of data in each bin "` `) ` ` print ` ` (math.ceil (` ` le n ` ` (x) ` ` / ` ` bi)) ` ` for ` ` i ` ` in ` ` range ` ` (` ` 0 ` `, ` ` len ` ` (x)): ` ` print ` ` (` ` ’index {2} old value {0} new value {1}’ ` `. ` ` format ` ` (x_old [i], x_new [i], i)) `

bin_mean -median-mode-in-python-without-libraries/">median

` `

` import numpy as np from sklearn.linear_model import LinearRegression from sklearn import linear_model # import statsmodels.api as sm import statistics import math from collections import OrderedDict x = [] print ( "enter the data" ) x = list ( map ( float , input (). split ())) print ( " enter the number of bins " ) bi = int ( input ()) # X_dict will store data in sorted order X_dict = OrderedDict () # x_old will store the original data x_old = {} # x_new will store data after binning x_new = {} for i in range ( len (x)) : X_dict [i] = x [i] x_old [ i] = x [i] x_dict = sorted (X_dict.items (), key = lambda x: x [ 1 ]) # list of lists (bins) binn = [] # variable to find the average of each bin avrg = [] i = 0 k = 0 num_of_data_in_each_bin = int (math.ceil ( len (x) / bi)) # executing binning for g, h in X_dict.items (): if (i & lt; num_of_data_in_each_bin): avrg.append (h) i = i + 1 elif (i = = num_of_data_in_each_bin): k = k + 1 i = 0 binn. append (statistics.mean -median-mode-in-python-without-libraries/">median (avrg)) avrg = [] avrg.append (h) i = i + 1 binn.append (statistics.mean -median-mode-in-python-without-libraries/">median (avrg)) # save the new value of each of the data i = 0 j = 0 for g, h in X_dict.items (): if (i & lt; num_of_data_in_each_bin): x_new [g] = round (binn [j], 3 ) i = i + 1 else : i = 0 j = j + 1 x_new [g] = round (binn [j], 3 ) i = i + 1 print ( "number of data in each bin" ) print (math.ceil ( len (x) / bi)) for i in range ( 0 , len (x)): print ( ’index {2} old value {0} new value {1} ’ . format (x_old [i], x_new [i], i)) `

` ` bin_boundary

` `

``` import numpy as np from sklearn.linear_model import LinearRegression from sklearn import linear_model # import statsmodels.api as sm import statistics import math from collections import OrderedDict x = [] print ( "enter the data" ) x = list ( map ( float , input (). split ())) print ( " enter the number of bins " ) bi = int ( input ()) # X_dict will store data in sorted order X_dict = OrderedDict () # x_old will store the original data x_old = {} # x_new will store data after binning x_new = {} for i in range ( len ( x)): X_dict [i] = x [i] x_old [i] = x [i] x_dict = sorted (X_dict.items (), key = lambda x: x [ 1 ] ) # list of lists (bins) binn = [] # variable to find the average of each bin avrg = [] i = 0 k = 0 num_of_data_in_each_bin = int (math.ceil ( len (x) / bi)) for g, h in X_dict.items (): if (i & lt; num_of_data_in_each_bin) : avrg.append (h) i = i + 1 elif (i = = num_of_data_in_each_bin): k = k + 1 i = 0 code class = "undefined spaces"> x_old [i] = x [i] x_dict = sorted (X_dict.items (), key = lambda x: x [ 1 ]) laptop for engineering students? ML | Binning or Discretization __del__: Questions __del__ How can I make a time delay in Python? 5 answers I would like to know how to put a time delay in a Python script. 2973 Answer #1 import time time.sleep(5) # Delays for 5 seconds. You can also use a float value. Here is another example where something is run approximately once a minute: import time while True: print("This prints once a minute.") time.sleep(60) # Delay for 1 minute (60 seconds). 2973 Answer #2 You can use the sleep() function in the time module. It can take a float argument for sub-second resolution. from time import sleep sleep(0.1) # Time in seconds ML | Binning or Discretization __del__: Questions __del__ How to delete a file or folder in Python? 5 answers How do I delete a file or folder in Python? 2639 Answer #1 os.remove() removes a file. os.rmdir() removes an empty directory. shutil.rmtree() deletes a directory and all its contents. Path objects from the Python 3.4+ pathlib module also expose these instance methods: pathlib.Path.unlink() removes a file or symbolic link. pathlib.Path.rmdir() removes an empty directory. We hope this article has helped you to resolve the problem. Apart from ML | Binning or Discretization, check other __del__-related topics. Want to excel in Python? See our review of the best Python online courses 2023. If you are interested in Data Science, check also how to learn programming in R. By the way, this material is also available in other languages: Italiano ML | Binning or DiscretizationDeutsch ML | Binning or DiscretizationFrançais ML | Binning or DiscretizationEspañol ML | Binning or DiscretizationTürk ML | Binning or DiscretizationРусский ML | Binning or DiscretizationPortuguês ML | Binning or DiscretizationPolski ML | Binning or DiscretizationNederlandse ML | Binning or Discretization中文 ML | Binning or Discretization한국어 ML | Binning or Discretization日本語 ML | Binning or Discretizationहिन्दी ML | Binning or Discretization Angelo Porretti Munchen | 2023-03-25 Maybe there are another answers? What ML | Binning or Discretization exactly means?. Will use it in my bachelor thesis Manuel OConnell Tallinn | 2023-03-25 Maybe there are another answers? What ML | Binning or Discretization exactly means?. I am just not quite sure it is the best method Angelo Lehnman Massachussetts | 2023-03-25 Simply put and clear. Thank you for sharing. ML | Binning or Discretization and other issues with StackOverflow was always my weak point 😁. Will get back tomorrow with feedback (adsbygoogle = window.adsbygoogle || []).push({}); Shop Learn programming in R: courses\$FREE Best Python online courses for 2022\$FREE Best laptop for Fortnite\$399+ Best laptop for Excel\$ Best laptop for Solidworks\$399+ Best laptop for Roblox\$399+ Best computer for crypto mining\$499+ Best laptop for Sims 4\$ (adsbygoogle = window.adsbygoogle || []).push({}); Latest questions PythonStackOverflow Common xlabel/ylabel for matplotlib subplots 1947 answers PythonStackOverflow Check if one list is a subset of another in Python 1173 answers PythonStackOverflow How to specify multiple return types using type-hints 1002 answers PythonStackOverflow Printing words vertically in Python 909 answers PythonStackOverflow Python Extract words from a given string 798 answers PythonStackOverflow Why do I get "Pickle - EOFError: Ran out of input" reading an empty file? 606 answers PythonStackOverflow Python os.path.join () method 384 answers PythonStackOverflow Flake8: Ignore specific warning for entire file 360 answers All questions (adsbygoogle = window.adsbygoogle || []).push({}); News 24/03/2023 Why do Startups use BVI for incorporation 24/03/2023 GitHub reported github.com RSA SSH key changes after private key leaked to public 24/03/2023 Child spent over \$800 in Roblox after bypassing password reset Wiki Python functions Python | How to copy data from one Excel sheet to another __main__ Python module Common xlabel/ylabel for matplotlib subplots ast Python module Check if one list is a subset of another in Python code Python module How to specify multiple return types using type-hints __main__ Python module Printing words vertically in Python code Python module Python Extract words from a given string Python functions Cyclic redundancy check in Python Python functions Finding mean, median, mode in Python without libraries ast Python module Python add suffix / add prefix to strings in a list ast Python module Why do I get "Pickle - EOFError: Ran out of input" reading an empty file? Python functions Python - Move item to the end of the list Python functions Python - Print list vertically © 2017—2023 Python Engineering Hub EN | ES | DE | FR | IT | RU | TR | PL | PT | JP | KR | CN | HI | NL Python.Engineering is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com Python Loops Counters NumPy NLP Regular Expressions Wiki Tech news Python Wiki StackOverflow PHP JavaScript Books All books Computations Development Cryptography For dummies Big Data document.addEventListener("DOMContentLoaded", () => { let arrayCode = document.querySelectorAll('pre'); arrayCode.forEach(element => { element.classList.add("prettyprint"); }); }); window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-Q022WLXW4X'); ```