ML | Credit Card Fraud Detection

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

The main challenges in detecting credit card fraud are:

  1. Huge data is processed every day, and model building must be fast enough to respond to fraud in a timely manner .
  2. Unbalanced data, i.e. most transactions (99.8%) are not fraudulent, making them difficult to detect.
  3. Data availability because the data is mostly private.
  4. Unclassified data can be another major concern as not every fraudulent transaction is detected and recorded.
  5. Adaptive techniques used against the model by fraudsters.

How to solve these problems?

  1. The model used must be simple and fast enough to detect the anomaly and classify it as a fraudulent transaction as soon as possible .
  2. Imbalances can be dealt with by using some some methods, which we will discuss in the next paragraph.
  3. Data can be scaled down to protect user privacy.
  4. A more reliable source should be taken that double-checks the data at least to train the model.
  5. We can make the model simple and straightforward so that when a cheater adapts to it with just a few tweaks, we can get a new model up and running.
  6. Before getting into the code, he is asked to work on a Jupyter notebook. If not installed on your computer, you can use Google Colab .
    You can download the dataset from this link
    If the link doesn’t work, go to this link and log in to kaggle to download the dataset.
    Code: import all required libraries

    # import required packages

    import numpy as np

    import pandas as pd

    import matplotlib.pyplot as plt

    import seaborn as sns

    from matplotlib import gridspec

    Code: data loading

    # Load dataset from CSV file using pan d
    # the best way is to mount the disk on colaba and
    # copy the path to the CSV file

    path = "credit.csv"

    data = pd.read_csv (path)

    Code: Understanding Data

    # Look at the data
    data.head ()

    Code: data description

    # Print data form
    # data = data.sample (frac = 0.1, random_state = 48)

    print (data.shape)

    print (data.describe ())

    Output:

     (284807, 31) Time V1 ... Amount Class count 284807.000000 2.848070e + 05 ... 284807.000000 284807.000000  mean  94813.859575 3.919560e-15 ... 88.349619 0.001727 std 47488.145955 1.958696e + 00 ... 250.120109 0.0415 0.000000 -5.640751e + 01 ... 0.000000 0.000000 25% 54201.500000 -9.203734e-01 ... 5.600000 0.000000 50% 84692.000000 1.810880e-02 ... 22.000000 0.000000 75% 139320.500000 1.315642e + 00 ... 77.165000 0.000000 max 172792.000000 2.454930e + 00 ... 25691.160000 1.000000 [8 rows x 31 columns] 

    Code: data imbalance
    Time to explain the data we are dealing with

    # Determine the number of fraud cases in the dataset

    fraud = data [data [ ’Class’ ] = = 1 ]

    valid = data [data [ ’Class’ ] = = 0 ]

    outlierFraction = len (fraud) / float ( len (valid))

    print (outlierFraction)

    print ( ’Fraud Cases: {}’ . format ( len (data [data [ ’Class’ ] = = 1 ])) )

    print ( ’Valid Transactions: {} ’ . format ( len (data [data [ ’Class’ ] = = 0 ])))


    Only 0.17% fraudulent transactions from all transactions. The data is highly imbalanced. Let’s apply our models first without balancing them, and if we don’t get good accuracy, then we can find a way to balance this dataset. But first, let’s implement the model without it and balance the data only if necessary.

    Code: print information about the amount of the fraudulent transaction

    print ("Amount details of the fraudulent transaction‚")

    fraud.Amount .describe ()

    Output:

     Amount details of the fraudulent transaction count 492.000000  mean  122.211321 std 256.683288 min 0.000000 25% 1.000000 50% 9.250000 75% 105.890000 max 2125.870000 Name: Amount, dtype: float64 

    Code: Print amount information for a regular transaction

    print ("Detai ls of valid transaction ‚")

    valid.Amount.describe ()

    Output:

     Amount details of valid transaction count 284315.000000  mean  88.291022 std 250.105092 min 0.000000 25 % 5.650000 50% 22.000000 75% 77.050000 max 25691.160000 Name: Amount, dtype: float64 

    As we can clearly see, the average Money transaction for fraudulent transactions is higher. This makes this problem solvable.

    Code: Building a correlation matrix
    A correlation matrix graphically gives us an idea of ​​how functions correlate with each other and can help us to predict which features are most relevant for forecasting.

    # Correlation matrix

    corrmat = data.corr ()

    fig = plt.figure (figsize = ( 12 , 9 ))

    sns.heatmap (corrmat, vmax = . 8 , square = True )

    plt.show ()


    In HeatMap we can clearly see that most of the functions are not related to other features, but there are some features that correlate positively or negatively with each other. For example, V2 and V5 correlate strongly negatively with a feature called Amount . We also see some correlation with V20 and Amount . This gives us a deeper understanding of the data available to us.

    Code: Split X and Y Values ​​
    Split Data into Input Parameters and Format Output Values ​​

    # divide X and Y from dataset

    X = data.drop ([ ’ Class’ ], axis = 1 )

    Y = data [ "Class " ]

    print (X.shape )

    print (Y.shape)

    # get only values to process
    # (this is an empty array with no columns)

    xData = X.values ​​

    yData = Y.values ​​

    Output:

     (284807, 30) (284807,) 


    Training and bifurcation testing

    We will divide the dataset into two main groups. One for training the model and the other for testing the performance of our trained model.

    # Using Skicit-learn to split data into training and test cases

    from sklearn.model_selection import train_test_split

    # Split data into training and test cases

    xTrain, xTest, yTrain, yTest = train_test_split (

    xData, yData, test_size = 0.2 , random_state = 42 )

    Code: Building a random forest model using skicit learn

    # Building a RANDOM FOREST classifier

    from sklearn.ensemble import RandomForestClassifier

    # create a random forest model

    rfc = RandomForestClassifier ()

    rfc.fit (xTrain, yTrain)
    # predictions

    yPred = rfc.predict (xTest)

    Code: creation of all kinds evaluation parameters

    # Classifier score
    # print each classifier score
    # scored anything

    from sklearn.metrics import classification_report, accuracy_score 

    from sklearn.metrics import precision_score, recall_score

    from sklearn.metrics import f1_score, matthews_corrcoef

    from sklearn.metrics import con fusion_matrix

     

    n_outliers = len (fraud)

    n_errors = (yPred! = yTest). sum ()

    print ( "The model used is Random Forest classifier" )

     

    acc = accuracy_score (yTest, yPred)

    print ( "The accuracy is {}" . format (ac c))

     

    prec = precision_score (yTest, yPred)

    print ( "The precision is {}" . format (prec))

     

    rec = recall_score (yTest, yPred)

    print ( "The recall is {}" . format (rec))

      

    f1 = f1_score ( yTest, yPred)

    print ( "The F1-Score is {}" . format (f1))

     

    MCC = matthews_corrcoef (yTest, yPred)

    print ( "The Matthews correlation coefficient is {}" . format (MCC))

    Output:

     The model used is Random Forest classifier The accuracy is 0.9995611109160493 The precision is 0.9866666666666667 The recall is 0.7551020408163265 The F1-Score is 0.8554913294797689 The Matthews correlation coefficient is0.8629589216367891 

    Code: confusion visualization

    / p>

    # print confusion matrix

    LABELS = [ ’ Normal’ , ’Fraud’ ]

    conf_matrix = confusion_matrix (yTest, yPred)

    plt.figure (figsize = ( 12 , 12 ) )

    sns.heatmap (conf_matrix, xticklabels = LABELS, 

      yticklabels = LABELS, ann ot = True , fmt = " d " );

    plt.title ( "Confusion matrix" )

    plt.ylabel ( ’True class’ )

    plt.xlabel ( ’ Predicted class’ )

    plt.show ()

    Output:

       

    Comparison with other algorithms without considering data imbalances.

    As you can clearly see with our random forest model, we clearly get better results even for review, which is the hardest part.

    👻 Read also: what is the best laptop for engineering students?

    ML | Credit Card Fraud Detection __del__: Questions

    How can I make a time delay in Python?

    5 answers

    I would like to know how to put a time delay in a Python script.

    2973

    Answer #1

    import time
    time.sleep(5)   # Delays for 5 seconds. You can also use a float value.
    

    Here is another example where something is run approximately once a minute:

    import time
    while True:
        print("This prints once a minute.")
        time.sleep(60) # Delay for 1 minute (60 seconds).
    

    2973

    Answer #2

    You can use the sleep() function in the time module. It can take a float argument for sub-second resolution.

    from time import sleep
    sleep(0.1) # Time in seconds
    

    ML | Credit Card Fraud Detection __del__: Questions

    How to delete a file or folder in Python?

    5 answers

    How do I delete a file or folder in Python?

    2639

    Answer #1


    Path objects from the Python 3.4+ pathlib module also expose these instance methods:

    We hope this article has helped you to resolve the problem. Apart from ML | Credit Card Fraud Detection, check other __del__-related topics.

    Want to excel in Python? See our review of the best Python online courses 2023. If you are interested in Data Science, check also how to learn programming in R.

    By the way, this material is also available in other languages:



    Boris Richtgofen

    Vigrinia | 2023-04-01

    Maybe there are another answers? What ML | Credit Card Fraud Detection exactly means?. Will use it in my bachelor thesis

    Oliver Lehnman

    Vigrinia | 2023-04-01

    Maybe there are another answers? What ML | Credit Card Fraud Detection exactly means?. Will use it in my bachelor thesis

    Olivia Innsbruck

    London | 2023-04-01

    I was preparing for my coding interview, thanks for clarifying this - ML | Credit Card Fraud Detection in Python is not the simplest one. Checked yesterday, it works!

Shop

Gifts for programmers

Learn programming in R: courses

$FREE
Gifts for programmers

Best Python online courses for 2022

$FREE
Gifts for programmers

Best laptop for Fortnite

$399+
Gifts for programmers

Best laptop for Excel

$
Gifts for programmers

Best laptop for Solidworks

$399+
Gifts for programmers

Best laptop for Roblox

$399+
Gifts for programmers

Best computer for crypto mining

$499+
Gifts for programmers

Best laptop for Sims 4

$

Latest questions

PythonStackOverflow

Common xlabel/ylabel for matplotlib subplots

1947 answers

PythonStackOverflow

Check if one list is a subset of another in Python

1173 answers

PythonStackOverflow

How to specify multiple return types using type-hints

1002 answers

PythonStackOverflow

Printing words vertically in Python

909 answers

PythonStackOverflow

Python Extract words from a given string

798 answers

PythonStackOverflow

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

606 answers

PythonStackOverflow

Python os.path.join () method

384 answers

PythonStackOverflow

Flake8: Ignore specific warning for entire file

360 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

How to specify multiple return types using type-hints

Printing words vertically in Python

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically