# Chi-square test for feature selection — mathematical explanation

Mathematically, the chi-square test is performed for two distributions, two of which determine the similarity level of their respective variances. In his null hypothesis , he assumes that these distributions are independent. Thus, this test can be used to determine the best performance for a given dataset by identifying the characteristics on which the output class label is most dependent. For each function in the dataset, is calculated and then ordered in descending order according to value. The higher the value the more the output label depends on the function and the higher the value that the function has to determine the output.

Suppose the object under consideration has m attribute values, and the output data has k class labels. Then the value is given by the following expression: where — observed frequency — expected frequency

A contingency table with m rows and k columns is created for each object. Each cell (i, j) denotes the number of rows that have the attribute attribute as i and the class label as k. Thus, each cell in this table represents the observed frequency. To calculate the expected frequency for each cell, the fraction of the function value in the total dataset is first calculated and then multiplied by the total number of labels in the current class.

Solved example:

Consider the following table: The output variable here is a column named “ PlayTennis ", which determines whether you played tennis on a given day, taking into account the weather.

The contingency table for the Outlook function is structured as follows: ## Shop Learn programming in R: courses

\$FREE Best Python online courses for 2022

\$FREE Best laptop for Fortnite

\$399+ Best laptop for Excel

\$ Best laptop for Solidworks

\$399+ Best laptop for Roblox

\$399+ Best computer for crypto mining

\$499+ Best laptop for Sims 4

\$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

PythonStackOverflow

Check if one list is a subset of another in Python

PythonStackOverflow

How to specify multiple return types using type-hints

PythonStackOverflow

Printing words vertically in Python

PythonStackOverflow

Python Extract words from a given string

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

PythonStackOverflow

Python os.path.join () method

PythonStackOverflow

Flake8: Ignore specific warning for entire file

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries