Assitan Koné
Jan 5

Credit Card Fraud Detection Part 2

SHARE

Welcome to the second part of our credit cart tutorial. You can find the first part here.

Now it's time to train other models.

6. Training other models

We are gonna use:
  • Decision tree
  • Naive Bayes
  • Support vector machine (SVM)
  • Random Forest
  • Gradient Boosting

First, we split the dataset into 3 parts (train, validation, and test). So first train, test, then validation, test.
We load the dataset. We create a variable named train with the file path, and create a new variables named df_train to create a dataframe.
loading the dataset
Now we fit again logistic regression.
loading the dataset
And we instantiate the other algorithms, then train them. And finally, we evaluate them.
loading the datasetloading the datasetloading the datasetloading the datasetloading the dataset

7. Evaluating models

First, we make predictions on validation data for each model.
Exploratory data analysis
Then plot the confusion matrix for each model.
shape of the datasetshape of the datasetshape of the dataset
It seems to have a problem with SVM. Let's focus on the other models.
shape of the datasetshape of the datasetshape of the dataset
To see better, let’s compare the F1-score.
shape of the dataset
Random forest is the best score!

Hyperparameter tuning

We can do hyperparameter tuning to improve the score. I want to improve the score of logistic regression and random forest. We can use random RandomizedSearchCV (for logistic regression) and GridSearchCV. First we create a grid.
shape of the dataset
Then set up random hyperparameter search for Logistic regression. Train the new model.
shape of the dataset
We have to wait a few minutes.
shape of the dataset
Find the best parameters and predict on evaluation data.
shape of the dataset
Then plot the result.
shape of the dataset
We do the same for random forest with GridSearchCV.
shape of the datasetshape of the datasetshape of the datasetshape of the datasetshape of the datasetshape of the dataset
We compare F1 scores.It’s better for logistic regression but worse for random forest. It could be complicated to find the best grid. The problem is that it takes time to process. So we stop here.
shape of the dataset
Let’s plot ROC curve.
shape of the datasetshape of the dataset
As seen above the area under the ROC curve varies from 0 to 1, with 1 being ideal and 0.5 being random. An AUC (Area Under the ROC Curve) of 0.9 is indicative of a reasonably good model; 0.8 is alright, 0.7 is not very good, and 0.6 indicates quite poor performance. The score is indicative of how good the model separates positive and negative labels.
Logistic regression is better. We’re gonna use the first model with Random forest.

8. Testing the model

Now let’s test the model. We do the same as evaluation.
We make predictions on test data (probability).
shape of the dataset
The output is a matrix with predictions. For each credit card, it outputs two numbers, which are the probability of being non fraudulent and the probability of being fraudulent. We select the second column, we don't need both (the probability of being fraudulent).
Let’s use the model. We’re gonna select a credit card from the test data. Let’s display it as a dict so you see the data.
shape of the dataset
Now we make a prediction on this credit card. This time, there's just one row and we get the second column so we set just zero.
shape of the dataset
We get zero.
Let’s check the result.
shape of the dataset
Also zero. So our model had guessed well.
That's it.
What you can do now is deploy the model or improve the score. To improve the score, you can do some feature engineering or try other combinations for the hyperparameters.

Resources

Author

Assitan Koné
Founder @Codistwa
Empty space, drag to resize

SHARE

Write your awesome label here.

Design Your Custom Machine Learning Chatbot

Write your awesome label here.
This quiz aims to help you create a machine learning-powered chatbot tailored to your interests, passions, culture, values, and expertise area. Answer the following questions honestly to uncover your ideal chatbot concept.
Sign up. Be inspired. Code.

Get a FREE Machine Learning Roadmap!

Subscribe to our newsletter to get your gift.

Get tips to teach yourself data science without being overwelmed in your email box. Get secrets to think and act like a Data Scientist on a daily basis. 
Write your awesome label here.
Created with