# How does the class_weight parameter in scikit-learn work?

| | | | | | | | | | | | | | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I am having a lot of trouble understanding how the `class_weight` parameter in scikit-learn"s Logistic Regression operates.

The Situation

I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples having negative outcome.

First Attempt: Manually Preparing Training Data

I split the data I had into disjoint sets for training and testing (about 80/20). Then I randomly sampled the training data by hand to get training data in different proportions than 19:1; from 2:1 -> 16:1.

I then trained logistic regression on these different training data subsets and plotted recall (= TP/(TP+FN)) as a function of the different training proportions. Of course, the recall was computed on the disjoint TEST samples which had the observed proportions of 19:1. Note, although I trained the different models on different training data, I computed recall for all of them on the same (disjoint) test data.

The results were as expected: the recall was about 60% at 2:1 training proportions and fell off rather fast by the time it got to 16:1. There were several proportions 2:1 -> 6:1 where the recall was decently above 5%.

Second Attempt: Grid Search

Next, I wanted to test different regularization parameters and so I used GridSearchCV and made a grid of several values of the `C` parameter as well as the `class_weight` parameter. To translate my n:m proportions of negative:positive training samples into the dictionary language of `class_weight` I thought that I just specify several dictionaries as follows:

``````{ 0:0.67, 1:0.33 } #expected 2:1
{ 0:0.75, 1:0.25 } #expected 3:1
{ 0:0.8, 1:0.2 }   #expected 4:1
``````

and I also included `None` and `auto`.

This time the results were totally wacked. All my recalls came out tiny (< 0.05) for every value of `class_weight` except `auto`. So I can only assume that my understanding of how to set the `class_weight` dictionary is wrong. Interestingly, the `class_weight` value of "auto" in the grid search was around 59% for all values of `C`, and I guessed it balances to 1:1?

My Questions

1. How do you properly use `class_weight` to achieve different balances in training data from what you actually give it? Specifically, what dictionary do I pass to `class_weight` to use n:m proportions of negative:positive training samples?

2. If you pass various `class_weight` dictionaries to GridSearchCV, during cross-validation will it rebalance the training fold data according to the dictionary but use the true given sample proportions for computing my scoring function on the test fold? This is critical since any metric is only useful to me if it comes from data in the observed proportions.

3. What does the `auto` value of `class_weight` do as far as proportions? I read the documentation and I assume "balances the data inversely proportional to their frequency" just means it makes it 1:1. Is this correct? If not, can someone clarify?

👻 Read also: what is the best laptop for engineering students?

## How does the class_weight parameter in scikit-learn work? __del__: Questions

How can I make a time delay in Python?

I would like to know how to put a time delay in a Python script.

2973

``````import time
time.sleep(5)   # Delays for 5 seconds. You can also use a float value.
``````

Here is another example where something is run approximately once a minute:

``````import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).
``````

2973

You can use the `sleep()` function in the `time` module. It can take a float argument for sub-second resolution.

``````from time import sleep
sleep(0.1) # Time in seconds
``````

## How does the class_weight parameter in scikit-learn work? __del__: Questions

How to delete a file or folder in Python?

How do I delete a file or folder in Python?

2639

`Path` objects from the Python 3.4+ `pathlib` module also expose these instance methods:

We hope this article has helped you to resolve the problem. Apart from How does the class_weight parameter in scikit-learn work?, check other __del__-related topics.

Want to excel in Python? See our review of the best Python online courses 2023. If you are interested in Data Science, check also how to learn programming in R.

By the way, this material is also available in other languages:

Xu Williams

Rome | 2023-03-25

COM PHP module is always a bit confusing 😭 How does the class_weight parameter in scikit-learn work? is not the only problem I encountered. Checked yesterday, it works!

Frank Jackson

San Francisco | 2023-03-25

Thanks for explaining! I was stuck with How does the class_weight parameter in scikit-learn work? for some hours, finally got it done 🤗. I just hope that will not emerge anymore

Javier Innsbruck

San Francisco | 2023-03-25

Maybe there are another answers? What How does the class_weight parameter in scikit-learn work? exactly means?. Will use it in my bachelor thesis

## Shop

Learn programming in R: courses

\$FREE

Best Python online courses for 2022

\$FREE

Best laptop for Fortnite

\$399+

Best laptop for Excel

\$

Best laptop for Solidworks

\$399+

Best laptop for Roblox

\$399+

Best computer for crypto mining

\$499+

Best laptop for Sims 4

\$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

PythonStackOverflow

Check if one list is a subset of another in Python

PythonStackOverflow

How to specify multiple return types using type-hints

PythonStackOverflow

Printing words vertically in Python

PythonStackOverflow

Python Extract words from a given string

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

PythonStackOverflow

Python os.path.join () method

PythonStackOverflow

Flake8: Ignore specific warning for entire file

## Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries