Change language

Chi-square test for feature selection — mathematical explanation

Mathematically, the chi-square test is performed for two distributions, two of which determine the similarity level of their respective variances. In his null hypothesis , he assumes that these distributions are independent. Thus, this test can be used to determine the best performance for a given dataset by identifying the characteristics on which the output class label is most dependent. For each function in the dataset, is calculated and then ordered in descending order according to value. The higher the value the more the output label depends on the function and the higher the value that the function has to determine the output.

Suppose the object under consideration has m attribute values, and the output data has k class labels. Then the value is given by the following expression:

where

— observed frequency

— expected frequency

A contingency table with m rows and k columns is created for each object. Each cell (i, j) denotes the number of rows that have the attribute attribute as i and the class label as k. Thus, each cell in this table represents the observed frequency. To calculate the expected frequency for each cell, the fraction of the function value in the total dataset is first calculated and then multiplied by the total number of labels in the current class.

Solved example:

Consider the following table:

The output variable here is a column named “ PlayTennis ", which determines whether you played tennis on a given day, taking into account the weather.

The contingency table for the Outlook function is structured as follows:

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically