Welcome to GrabNGoInfo.

Today I will talk about how to apply class weight on neural network model for imbalance dataset.

When using a neural network model to classify imbalanced data, we can adjust the balanced weight for the cost function to give more attention to the minority class.

Pythons Keras library has a built-in option called class_weight to help us achieve this quickly.

One benefit of using the balanced weight adjustment is that we can use the imbalanced data to build the model directly without oversampling or under-sampling before training the model.

In this tutorial, we will go over the following topics: 1.

Baseline neural network model for imbalanced classification.

2.

How to calculate class weight using sklearn.

3.

How to apply class weight to a neural network model.

4.

How to apply manual class weight on a neural network model.

Lets get started! The first step is to import libraries.

We need to import make_classification from sklearn to create the modeling dataset.

Import pandas and numpy for data processing, Counter will help us count the number of records after oversampling and under-sampling.

Matplotlib and seaborn are for visualization.

We also need the train_test_split to create training and validation dataset.

cross_validate, and StratifiedKFold are for k-fold cross-validation.

Dense and Sequential from keras are for neural network model.

Class_weight is for balanced weights calculation.

Classification_report and roc_auc_score are for model performance evaluation.

Using make_classification from the sklearn library, in the second step, we create an imbalanced dataset with two classes.

The minority class is 0.5% of the dataset.

I made two features to predict which type each data point belongs to.

The dataset gives us around 1% data points for the minority class.

It is higher than the specified weights of 0.5% but works for demonstrating the rare event modeling process.

The 3rd step is to do a train test split for the imbalanced data.

In this step, we split the dataset into 80% training data and 20% validation data.

random_state ensures that we have the same train test split every time.

The seed number for random_state does not have to be 42, and it can be any number.

The train test split gives us 80,000 records for the training dataset and 20,000 for the validation dataset.

Thus, we have 79,183 data points from the majority class and 817 from the minority class in the training dataset.

In step 4, we create a neural network model on the imbalanced training datasets as the baseline model.

The neural network model has one input layer, one hidden layer, and one output layer.

Since we have two features, the input_dim is 2.

We set the input layer to have two neurons, the hidden layer to have two neurons, and the output layer to have one neuron.

The activation function for the input and hidden layers is relu, a popular activation function with good performance.

The output activation function is sigmoid, which is used for binary classification.

We set the loss to be binary_crossentropy when compiling the model.

This is because we are building a binary classification model.

For a multi-class classification model, the loss is usually categorical_crossentropy, and for a linear regression model, the loss is usually mean_squared_error.

The optimizer is responsible for changing the weights and the learning rate to reduce the loss.

adam is a widely used optimizer.

After compiling the model, we fit the neural network model on the training dataset.

The epochs of 50 mean that the model will go through the training dataset 50 times.

The batch_size of 100 means that each time the weights are updated, 100 data points are used.

Now lets make predictions on the testing dataset and check the model performance.

We got a recall of 0, which means that the neural network model did not predict any minority data correctly.

Lets see if the balanced weight can help us.

In step 5, we will calculate class weight using sklearn.

sklearn has a built-in utility function compute_class_weight to calculate the class weights.

The weights are calculated using the inverse proportion of class frequencies.

The computed weights from sklearn are in array format.

We need to transform it into a dictionary because Keras takes a dictionary as inputs.

In step 6, we keep all the hyperparameters to be the same as the baseline model.

The only difference is that we set the class_weight hyperparameter to be balanced when fitting the model.

We can see that the minority recall value increased from 0 to 56%, which is a significant improvement.

Note that your results can be different than mine because of the randomness with the neural network model, but the difference should be small.

In step 7, we will apply manual balance weight on the neural network model.

Although the balance weights are commonly calculated using the inverse proportion of class frequencies, we can set our own balance weight and tune it as a hyperparameter.

For example, we can set the cost penalty ratio to be 1:200.

We are able to capture 98% of the minority class after increasing the cost penalty for the minority class.

In this tutorial, we built the neural network models with and without the balanced weight for imbalanced classification.

Results show that the balanced weight significantly improved the models ability to capture the minority class.

If you found the information in this tutorial helpful, please click the like button and subscribe to the channel.

I publish tutorials on machine learning, deep learning, and natural language processing every week.

If you prefer the written version of the tutorial, please go to GrabNGoInfo.com.

I will put the link in the video description.

This is the blog post for this tutorial.

It has all the code and explanations in this video.

If you are interested in learning about the oversampling and under-sampling methods.

please refer to my videos "Four Oversampling And Under-Sampling Methods For Imbalanced Classification Using Python".

and "Ensemble Oversampling And Under-Sampling For Imbalanced Classification Using Python".

Thank you for watching.

See you in the next video.