ML | Credit Card Fraud Detection

| | | | | | | | | | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

The main challenges in detecting credit card fraud are:

  1. Huge data is processed every day, and model building must be fast enough to respond to fraud in a timely manner .
  2. Unbalanced data, i.e. most transactions (99.8%) are not fraudulent, making them difficult to detect.
  3. Data availability because the data is mostly private.
  4. Unclassified data can be another major concern as not every fraudulent transaction is detected and recorded.
  5. Adaptive techniques used against the model by fraudsters.

How to solve these problems?

  1. The model used must be simple and fast enough to detect the anomaly and classify it as a fraudulent transaction as soon as possible .
  2. Imbalances can be dealt with by using some some methods, which we will discuss in the next paragraph.
  3. Data can be scaled down to protect user privacy.
  4. A more reliable source should be taken that double-checks the data at least to train the model.
  5. We can make the model simple and straightforward so that when a cheater adapts to it with just a few tweaks, we can get a new model up and running.
  6. Before getting into the code, he is asked to work on a Jupyter notebook. If not installed on your computer, you can use Google Colab .
    You can download the dataset from this link
    If the link doesn’t work, go to this link and log in to kaggle to download the dataset.
    Code: import all required libraries

    # import required packages

    import numpy as np

    import pandas as pd

    import matplotlib.pyplot as plt

    import seaborn as sns

    from matplotlib import gridspec

    Code: data loading

    # Load dataset from CSV file using pan d
    # the best way is to mount the disk on colaba and
    # copy the path to the CSV file

    path = "credit.csv"

    data = pd.read_csv (path)

    Code: Understanding Data

    # Look at the data
    data.head ()

    Code: data description

    # Print data form
    # data = data.sample (frac = 0.1, random_state = 48)

    print (data.shape)

    print (data.describe ())


     (284807, 31) Time V1 ... Amount Class count 284807.000000 2.848070e + 05 ... 284807.000000 284807.000000  mean  94813.859575 3.919560e-15 ... 88.349619 0.001727 std 47488.145955 1.958696e + 00 ... 250.120109 0.0415 0.000000 -5.640751e + 01 ... 0.000000 0.000000 25% 54201.500000 -9.203734e-01 ... 5.600000 0.000000 50% 84692.000000 1.810880e-02 ... 22.000000 0.000000 75% 139320.500000 1.315642e + 00 ... 77.165000 0.000000 max 172792.000000 2.454930e + 00 ... 25691.160000 1.000000 [8 rows x 31 columns] 

    Code: data imbalance
    Time to explain the data we are dealing with

    # Determine the number of fraud cases in the dataset

    fraud = data [data [ ’Class’ ] = = 1 ]

    valid = data [data [ ’Class’ ] = = 0 ]

    outlierFraction = len (fraud) / float ( len (valid))

    print (outlierFraction)

    print ( ’Fraud Cases: {}’ . format ( len (data [data [ ’Class’ ] = = 1 ])) )

    print ( ’Valid Transactions: {} ’ . format ( len (data [data [ ’Class’ ] = = 0 ])))

    Only 0.17% fraudulent transactions from all transactions. The data is highly imbalanced. Let’s apply our models first without balancing them, and if we don’t get good accuracy, then we can find a way to balance this dataset. But first, let’s implement the model without it and balance the data only if necessary.

    Code: print information about the amount of the fraudulent transaction

    print ("Amount details of the fraudulent transaction‚")

    fraud.Amount .describe ()


     Amount details of the fraudulent transaction count 492.000000  mean  122.211321 std 256.683288 min 0.000000 25% 1.000000 50% 9.250000 75% 105.890000 max 2125.870000 Name: Amount, dtype: float64 

    Code: Print amount information for a regular transaction

    print ("Detai ls of valid transaction ‚")

    valid.Amount.describe ()


     Amount details of valid transaction count 284315.000000  mean  88.291022 std 250.105092 min 0.000000 25 % 5.650000 50% 22.000000 75% 77.050000 max 25691.160000 Name: Amount, dtype: float64 

    As we can clearly see, the average Money transaction for fraudulent transactions is higher. This makes this problem solvable.

    Code: Building a correlation matrix
    A correlation matrix graphically gives us an idea of ‚Äã‚Äãhow functions correlate with each other and can help us to predict which features are most relevant for forecasting.

    # Correlation matrix

    corrmat = data.corr ()

    fig = plt.figure (figsize = ( 12 , 9 ))

    sns.heatmap (corrmat, vmax = . 8 , square = True ) ()

    In HeatMap we can clearly see that most of the functions are not related to other features, but there are some features that correlate positively or negatively with each other. For example, V2 and V5 correlate strongly negatively with a feature called Amount . We also see some correlation with V20 and Amount . This gives us a deeper understanding of the data available to us.

    Code: Split X and Y Values ‚Äã‚Äã
    Split Data into Input Parameters and Format Output Values ‚Äã‚Äã

    # divide X and Y from dataset

    X = data.drop ([ ’ Class’ ], axis = 1 )

    Y = data [ "Class " ]

    print (X.shape )

    print (Y.shape)

    # get only values to process
    # (this is an empty array with no columns)

    xData = X.values ‚Äã‚Äã

    yData = Y.values ‚Äã‚Äã


     (284807, 30) (284807,) 

    Training and bifurcation testing

    We will divide the dataset into two main groups. One for training the model and the other for testing the performance of our trained model.

    # Using Skicit-learn to split data into training and test cases

    from sklearn.model_selection import train_test_split

    # Split data into training and test cases

    xTrain, xTest, yTrain, yTest = train_test_split (

    xData, yData, test_size = 0.2 , random_state = 42 )

    Code: Building a random forest model using skicit learn

    # Building a RANDOM FOREST classifier

    from sklearn.ensemble import RandomForestClassifier

    # create a random forest model

    rfc = RandomForestClassifier () (xTrain, yTrain)
    # predictions

    yPred = rfc.predict (xTest)

    Code: creation of all kinds evaluation parameters

    # Classifier score
    # print each classifier score
    # scored anything

    from sklearn.metrics import classification_report, accuracy_score 

    from sklearn.metrics import precision_score, recall_score

    from sklearn.metrics import f1_score, matthews_corrcoef

    from sklearn.metrics import con fusion_matrix


    n_outliers = len (fraud)

    n_errors = (yPred! = yTest). sum ()

    print ( "The model used is Random Forest classifier" )


    acc = accuracy_score (yTest, yPred)

    print ( "The accuracy is {}" . format (ac c))


    prec = precision_score (yTest, yPred)

    print ( "The precision is {}" . format (prec))


    rec = recall_score (yTest, yPred)

    print ( "The recall is {}" . format (rec))


    f1 = f1_score ( yTest, yPred)

    print ( "The F1-Score is {}" . format (f1))


    MCC = matthews_corrcoef (yTest, yPred)

    print ( "The Matthews correlation coefficient is {}" . format (MCC))


     The model used is Random Forest classifier The accuracy is 0.9995611109160493 The precision is 0.9866666666666667 The recall is 0.7551020408163265 The F1-Score is 0.8554913294797689 The Matthews correlation coefficient is0.8629589216367891 

    Code: confusion visualization

    / p>

    # print confusion matrix

    LABELS = [ ’ Normal’ , ’Fraud’ ]

    conf_matrix = confusion_matrix (yTest, yPred)

    plt.figure (figsize = ( 12 , 12 ) )

    sns.heatmap (conf_matrix, xticklabels = LABELS, 

      yticklabels = LABELS, ann ot = True , fmt = " d " );

    plt.title ( "Confusion matrix" )

    plt.ylabel ( ’True class’ )

    plt.xlabel ( ’ Predicted class’ ) ()



    Comparison with other algorithms without considering data imbalances.

    As you can clearly see with our random forest model, we clearly get better results even for review, which is the hardest part.

    👻 Read also: what is the best laptop for engineering students?

    ML | Credit Card Fraud Detection __del__: Questions

    How can I make a time delay in Python?

    5 answers

    I would like to know how to put a time delay in a Python script.


    Answer #1

    import time
    time.sleep(5)   # Delays for 5 seconds. You can also use a float value.

    Here is another example where something is run approximately once a minute:

    import time
    while True:
        print("This prints once a minute.")
        time.sleep(60) # Delay for 1 minute (60 seconds).


    Answer #2

    You can use the sleep() function in the time module. It can take a float argument for sub-second resolution.

    from time import sleep
    sleep(0.1) # Time in seconds

    ML | Credit Card Fraud Detection __del__: Questions

    How to delete a file or folder in Python?

    5 answers

    How do I delete a file or folder in Python?


    Answer #1

    Path objects from the Python 3.4+ pathlib module also expose these instance methods:

    How do I merge two dictionaries in a single expression (taking union of dictionaries)?

    5 answers

    Carl Meyer By Carl Meyer

    I have two Python dictionaries, and I want to write a single expression that returns these two dictionaries, merged (i.e. taking the union). The update() method would be what I need, if it returned its result instead of modifying a dictionary in-place.

    >>> x = {"a": 1, "b": 2}
    >>> y = {"b": 10, "c": 11}
    >>> z = x.update(y)
    >>> print(z)
    >>> x
    {"a": 1, "b": 10, "c": 11}

    How can I get that final merged dictionary in z, not x?

    (To be extra-clear, the last-one-wins conflict-handling of dict.update() is what I"m looking for as well.)


    Answer #1

    How can I merge two Python dictionaries in a single expression?

    For dictionaries x and y, z becomes a shallowly-merged dictionary with values from y replacing those from x.

    • In Python 3.9.0 or greater (released 17 October 2020): PEP-584, discussed here, was implemented and provides the simplest method:

      z = x | y          # NOTE: 3.9+ ONLY
    • In Python 3.5 or greater:

      z = {**x, **y}
    • In Python 2, (or 3.4 or lower) write a function:

      def merge_two_dicts(x, y):
          z = x.copy()   # start with keys and values of x
          z.update(y)    # modifies z with keys and values of y
          return z

      and now:

      z = merge_two_dicts(x, y)


    Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

    x = {"a": 1, "b": 2}
    y = {"b": 3, "c": 4}

    The desired result is to get a new dictionary (z) with the values merged, and the second dictionary"s values overwriting those from the first.

    >>> z
    {"a": 1, "b": 3, "c": 4}

    A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

    z = {**x, **y}

    And it is indeed a single expression.

    Note that we can merge in with literal notation as well:

    z = {**x, "foo": 1, "bar": 2, **y}

    and now:

    >>> z
    {"a": 1, "b": 3, "foo": 1, "bar": 2, "c": 4}

    It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What"s New in Python 3.5 document.

    However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

    z = x.copy()
    z.update(y) # which returns None since it mutates z

    In both approaches, y will come second and its values will replace x"s values, thus b will point to 3 in our final result.

    Not yet on Python 3.5, but want a single expression

    If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

    def merge_two_dicts(x, y):
        """Given two dictionaries, merge them into a new dict as a shallow copy."""
        z = x.copy()
        return z

    and then you have a single expression:

    z = merge_two_dicts(x, y)

    You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

    def merge_dicts(*dict_args):
        Given any number of dictionaries, shallow copy and merge into a new dict,
        precedence goes to key-value pairs in latter dictionaries.
        result = {}
        for dictionary in dict_args:
        return result

    This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

    z = merge_dicts(a, b, c, d, e, f, g) 

    and key-value pairs in g will take precedence over dictionaries a to f, and so on.

    Critiques of Other Answers

    Don"t use what you see in the formerly accepted answer:

    z = dict(x.items() + y.items())

    In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you"re adding two dict_items objects together, not two lists -

    >>> c = dict(a.items() + b.items())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: "dict_items" and "dict_items"

    and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

    Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don"t do this:

    >>> c = dict(a.items() | b.items())

    This example demonstrates what happens when values are unhashable:

    >>> x = {"a": []}
    >>> y = {"b": []}
    >>> dict(x.items() | y.items())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: "list"

    Here"s an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

    >>> x = {"a": 2}
    >>> y = {"a": 1}
    >>> dict(x.items() | y.items())
    {"a": 2}

    Another hack you should not use:

    z = dict(x, **y)

    This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it"s difficult to read, it"s not the intended usage, and so it is not Pythonic.

    Here"s an example of the usage being remediated in django.

    Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

    >>> c = dict(a, **b)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: keyword arguments must be strings

    From the mailing list, Guido van Rossum, the creator of the language, wrote:

    I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.


    Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally, I find it more despicable than cool.

    It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

    dict(a=1, b=10, c=11)

    instead of

    {"a": 1, "b": 10, "c": 11}

    Response to comments

    Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

    Again, it doesn"t work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

    >>> foo(**{("a", "b"): None})
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: foo() keywords must be strings
    >>> dict(**{("a", "b"): None})
    {("a", "b"): None}

    This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

    I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

    More comments:

    dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

    My response: merge_two_dicts(x, y) actually seems much clearer to me, if we"re actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

    {**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

    Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first"s values being overwritten by the second"s - in a single expression.

    Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

    from copy import deepcopy
    def dict_of_dicts_merge(x, y):
        z = {}
        overlapping_keys = x.keys() & y.keys()
        for key in overlapping_keys:
            z[key] = dict_of_dicts_merge(x[key], y[key])
        for key in x.keys() - overlapping_keys:
            z[key] = deepcopy(x[key])
        for key in y.keys() - overlapping_keys:
            z[key] = deepcopy(y[key])
        return z


    >>> x = {"a":{1:{}}, "b": {2:{}}}
    >>> y = {"b":{10:{}}, "c": {11:{}}}
    >>> dict_of_dicts_merge(x, y)
    {"b": {2: {}, 10: {}}, "a": {1: {}}, "c": {11: {}}}

    Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

    Less Performant But Correct Ad-hocs

    These approaches are less performant, but they will provide correct behavior. They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

    You can also chain the dictionaries manually inside a dict comprehension:

    {k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

    or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

    dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

    itertools.chain will chain the iterators over the key-value pairs in the correct order:

    from itertools import chain
    z = dict(chain(x.items(), y.items())) # iteritems in Python 2

    Performance Analysis

    I"m only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

    from timeit import repeat
    from itertools import chain
    x = dict.fromkeys("abcdefg")
    y = dict.fromkeys("efghijk")
    def merge_two_dicts(x, y):
        z = x.copy()
        return z
    min(repeat(lambda: {**x, **y}))
    min(repeat(lambda: merge_two_dicts(x, y)))
    min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
    min(repeat(lambda: dict(chain(x.items(), y.items()))))
    min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

    In Python 3.8.1, NixOS:

    >>> min(repeat(lambda: {**x, **y}))
    >>> min(repeat(lambda: merge_two_dicts(x, y)))
    >>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
    >>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
    >>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
    $ uname -a
    Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

    Resources on Dictionaries


    Answer #2

    In your case, what you can do is:

    z = dict(list(x.items()) + list(y.items()))

    This will, as you want it, put the final dict in z, and make the value for key b be properly overridden by the second (y) dict"s value:

    >>> x = {"a":1, "b": 2}
    >>> y = {"b":10, "c": 11}
    >>> z = dict(list(x.items()) + list(y.items()))
    >>> z
    {"a": 1, "c": 11, "b": 10}

    If you use Python 2, you can even remove the list() calls. To create z:

    >>> z = dict(x.items() + y.items())
    >>> z
    {"a": 1, "c": 11, "b": 10}

    If you use Python version 3.9.0a4 or greater, then you can directly use:

    x = {"a":1, "b": 2}
    y = {"b":10, "c": 11}
    z = x | y
    {"a": 1, "c": 11, "b": 10}


    Answer #3

    An alternative:

    z = x.copy()

    We hope this article has helped you to resolve the problem. Apart from ML | Credit Card Fraud Detection, check other __del__-related topics.

    Want to excel in Python? See our review of the best Python online courses 2022. If you are interested in Data Science, check also how to learn programming in R.

    By the way, this material is also available in other languages:

    Dmitry Lehnman

    Texas | 2022-12-07

    Simply put and clear. Thank you for sharing. ML | Credit Card Fraud Detection and other issues with log was always my weak point 😁. Will get back tomorrow with feedback

    Javier Zelotti

    Shanghai | 2022-12-07

    square is always a bit confusing 😭 ML | Credit Card Fraud Detection is not the only problem I encountered. I just hope that will not emerge anymore

    Frank Zelotti

    Moscow | 2022-12-07

    I was preparing for my coding interview, thanks for clarifying this - ML | Credit Card Fraud Detection in Python is not the simplest one. I just hope that will not emerge anymore


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


Common xlabel/ylabel for matplotlib subplots

12 answers


How to specify multiple return types using type-hints

12 answers


Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers


Flake8: Ignore specific warning for entire file

12 answers


glob exclude pattern

12 answers


How to avoid HTTP error 429 (Too Many Requests) python

12 answers


Python CSV error: line contains NULL byte

12 answers


csv.Error: iterator should return strings, not bytes

12 answers



Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python


How to specify multiple return types using type-hints


Printing words vertically in Python


Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries


Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically