✆+91-9916812177 | contact@beingdatum.com

Clueless on Confusion Matrix?

Clueless on Confusion Matrix?

Confusion Matrix helps in calculating the accuracy of the classification model which indirectly helps us to describe the performance of the classification model. It is the most important step when it comes to evaluating a model.

I’ll be covering the following topics in this article:

  • Accuracy and Components
  • What is it?
  • Precision, Recall, Accuracy, Specificity, F1 Score
  • Comparing ROC Curve
  • Creating a CM by using Python and Sklearn


As a Data Scientist we do OSEMN Things!!

Have you ever wondered “How can we measure the efficiency of our models?

Better the efficiency, better the performance and that’s exactly what we want. And it is where the CM comes very handy for Model evaluation.

After reading this post you will be clear with concepts like what is CM, key performance metrics to measure the classification models accuracy and sample CM python code. Let’s get started!!


Classification Accuracy: Measuring the quality of fit

  • For a regression problem, we use the MSE to assess the accuracy of the Statistical learning method
  • For a classification problem we can use the error rate i.e.

Is an indicator function, which will give 1 if the condition is correct, otherwise it gives a 0.

Thus the error rate represents the fraction of incorrect classifications or Misclassifications.

What is Confusion Matrix?

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix. It’s a performance measurement technique for ML classification. It is a kind of table which helps you to know the performance of the classification model on a set of test data for that the true values are known.

It’s a table with 4 different combinations of predicted and actual values.

Confusion Matrix

 It extremely helpful in assessing the key performance metrics like (Accuracy, Precision, Recall/Sensitivity, Specificity…etc) 

Performance Metrics

Let’s make the Confusion Matrix Less Confusing using simple analogy 🙂

Model Evaluation
  • ROC curves are commonly used techniques to measure the quality of prediction algorithm.
  • Plot of TPR (Sensitivity) vs FPR (1- Specificity)
Comparing ROC Curves

ROC Curves:

  • x-axis = 1 – specificity (or, probability of false positive)
  • y-axis = sensitivity (or, probability of true positive)
  • points plotted = cutoff/combination
  • areas under curve = quantifies whether the prediction model is viable or not

Higher area → better predictor, better model

  • area = 0.5 → effectively random guessing (diagonal line in the ROC curve)
  • area = 1 → perfect classifier
  • area = 0.8 → considered good for a prediction algorithm

Creating a Confusion Matrix in Python

Create a simple data set with the predicted values and actual’s

To create the Confusion Matrix using pandas, you’ll need to apply the pd.crosstab as follows:

Displaying the Confusion Matrix using seaborn

Confusion Matrix

To get additional stats using pandas_ml can be used. It can be installed using below command

 !pip install pandas_ml

Starting your career and confused about pre-requisites?
Here’s a Step-by-Step guide for beginning your career as a Data Scientist.

2 responses on "Clueless on Confusion Matrix?"

  1. Dear Sir

    I have used MCDM (Fuzzy AHP) technique for site selection of ELECTRIC VEHICLE CHARGING STATION.

    How can i make Confusion matrix in it??

Leave a Message

Your email address will not be published. Required fields are marked *

© BeingDatum. All rights reserved.