This article will show you how to use an auto-encoder to classify data. The data used below represents credit card transactions to predict whether a given transaction is fraudulent or not. The data can be downloaded here .
Step 1: Download the required libraries p>
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import seaborn as sns
from keras.layers import Input , Dense
from keras.models import Model , Sequential
from keras import regularizers
Step 2: Load data
# Change the workspace to data location cd C: UsersDevDesktopKaggleCredit Card Fraud
# Building coded points tsne_plot (encoded_X, encoded_y)
Note that after encoding the data, the data is closer to linear separation. Thus, in some cases, encoding the data can help make the classification boundary linear for the data. To analyze this point numerically, we will fit a linear logistic regression model to the encoded data and a support vector classifier for the original data.
Step 11: Separate raw and encoded data into training and test data
So the performance metrics support the point above that encoding data can sometimes be useful for generating data. x are linearly separable, as the performance of the Logistic Linear Regression model is very close to the performance of the Nonlinear Support Vector model by the classifier.