Clueless on Confusion Matrix?
Confusion Matrix helps in calculating the accuracy of the classification model which indirectly helps us to describe the performance of the classification model. It is the most important step when it comes to evaluating a model.
I’ll be covering the following topics in this article:
- Accuracy and Components
- What is it?
- Precision, Recall, Accuracy, Specificity, F1 Score
- Comparing ROC Curve
- Creating a CM by using Python and Sklearn
As a Data Scientist we do OSEMN Things!!
Have you ever wondered “How can we measure the efficiency of our models?“
Better the efficiency, better the performance and that’s exactly what we want. And it is where the CM comes very handy for Model evaluation.
After reading this post you will be clear with concepts like what is CM, key performance metrics to measure the classification models accuracy and sample CM python code. Let’s get started!!
Classification Accuracy: Measuring the quality of fit
- For a regression problem, we use the MSE to assess the accuracy of the Statistical learning method
- For a classification problem we can use the error rate i.e.
Is an indicator function, which will give 1 if the condition is correct, otherwise it gives a 0.
Thus the error rate represents the fraction of incorrect classifications or Misclassifications.
What is Confusion Matrix?
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix. It’s a performance measurement technique for ML classification. It is a kind of table which helps you to know the performance of the classification model on a set of test data for that the true values are known.
It’s a table with 4 different combinations of predicted and actual values.
It extremely helpful in assessing the key performance metrics like (Accuracy, Precision, Recall/Sensitivity, Specificity…etc)
Let’s make the Confusion Matrix Less Confusing using simple analogy 🙂
- ROC curves are commonly used techniques to measure the quality of prediction algorithm.
- Plot of TPR (Sensitivity) vs FPR (1- Specificity)
- x-axis = 1 – specificity (or, probability of false positive)
- y-axis = sensitivity (or, probability of true positive)
- points plotted = cutoff/combination
- areas under curve = quantifies whether the prediction model is viable or not
Higher area → better predictor, better model
- area = 0.5 → effectively random guessing (diagonal line in the ROC curve)
- area = 1 → perfect classifier
- area = 0.8 → considered good for a prediction algorithm
Creating a Confusion Matrix in Python
Create a simple data set with the predicted values and actual’s
To create the Confusion Matrix using pandas, you’ll need to apply the pd.crosstab as follows:
Displaying the Confusion Matrix using seaborn
To get additional stats using pandas_ml can be used. It can be installed using below command
!pip install pandas_ml
Starting your career and confused about pre-requisites?
Here’s a Step-by-Step guide for beginning your career as a Data Scientist.