The aim of this article is to help you build an intuition about what a confusion matrix is, how it works and why is important.
What is confusion matrix?
It is a matrix containing information about the accuracy of predictions for a model and it is widely used in Machine Learning classification problems.
Why do we need a confusion matrix in the first place?
The reason we look at confusion matrices in order to asses accuracy is due to class imbalances. What is that you ask ? When we have a bunch of observations and most of them are of type ‘A’ and just a few of type ‘B’ we call it class imbalance, for example: 90 Women and 3 Men.
Let’s look at a real life example where we would use confusion matrix.
Let’s say you want to build a model that predicts how many patience have a disease. The disease you are looking for is very rare and only appears in 1% of the patients.
Even if you don’t build any model and assume that everyone is healthy, you would be accurate 99% of the time and wrong 1%. Therefor simple accuracy defined as Correct Predictions / Total Predictions would not be great choice.
How does the confusion matrix work?
As we look at the matrix we can see that we have Predictions on the left side, these are predictions made by the model. Some of them are positive and some negative. On the top we have Actuals which are the real observations, positive and negative.
Going forward we will switch the discussion from abstract to something more easy to understand.
Let’s imagine that we are trying to figure out the accuracy of a Test for a particular Disease that is very rare. We now have a confusion matrix with Test Predictions on the left and Actual Disease on the top.
We will abbreviate the matrix elements to make it easy to work with and add some numbers as follows:
True Positives = TP
True Negatives = TN
False Positives = FP
False Negatives = FN
- Positive Tests (predictions): Tests that came out positive for our disease.
We have 50 + 15 = 65 Positive Tests from which 50 True Positives and 15 False Positives. Let’s clarify what a True Positive (TP) and a False Positive (FP) means.
True Positive (TP)= Test is positive AND patient has the disease (+,+)
False Positive (FP)= Test is positive AND patient NOT have the disease (+,-)
- Negative Tests (predictions): Tests that came out negative for our disease.
We have 10 + 25 = 35 Negative Tests from which 10 False Negatives and 25 True Negatives. Let’s define False Negative (FN) and True Negatives (TN)
True Negative (TN) = Tests is negative AND patient has the disease (-, +)
False Negative (FN) = Test is neg AND patient NOT have the disease (-,-)
Let’s draw a few conclusions to reinforce the concepts:
- True Positive and True Negative observations are on diagonal and respresent the correct predictions
- Positive Tests are top column from left to right and represent all the positive predictions (right and wrong)
- Negative Tests are bottom column and represent negative predictions (right and wrong)
- False Positives are observations predicted to be positive but are negative in reality
- False Negatives are observations predicted to be negative but are positive in reality
In order to eliminate confusion it is important to keep in mind how columns are arranged in the matrix and what each column means.
Using the same principle try to identify the columns with healthy patients and sick patients from out matrix to really consolidate these concepts.
Let’s now dive into the metrics for confusion matrix.
Sometimes also referred to as Positive Predicted Value (PPV). This metric is called precision because it evaluates the ‘quality’ of our tests. The emphasis is on the accuracy of the outcome.
To further exemplify you can think of precision as the metric we would use if we would need to tell to one of our patients that he/she has cancer. We would need to be be really really sure before you communicate this news to the patient. Therefore the quality of the tests is most important.
Let’s see how the formula looks like and also try to understand how the quality concept relates to the matrix.
Looking at the confusion matrix and formula we can see that Precision is the percentage of Accurate Tests (TP) from all conducted Tests that came out positive. It’s easy to see now why the emphasis is on ‘quality’ of the tests.
In other words we can write our formula as:
Let’s calculate the precision for our matrix:
Our test has a precision (accuracy) of 83%.
From 75 patients that tested positive the test has correctly identified 77%.
Sensitivity or Recall
Sometimes also referred to as True Positive Rate (TPR). Sensitivity is a measurement of ‘quantity’. Sensitivity/Recall evaluates the percentage of correct predictions from the real positive observations.
In other words we look at the population that has the disease only and try to identify (predict) as many as possible.
Let’s have a look at the formula:
Sensitivity|Recall can be defined in our case as the number of patients that correctly tested positive (TP) from all the patients with disease (TP + FN). The question for Sensitivity|Recall would be how many patients have we identified correctly using the test?
Let’s rewrite the formula in a more intuitive way:
Let’s calculate Sensitivity|Recall for our matrix:
Our test has a Sensitivity|Recall of 83% or in other words the test was sensitive enough to identify 83% of patients with disease.
Sometimes referred as True Negative Rate (TNR). Specificity is very similar with Sensitivity but instead of looking at the proportion of True Positives out of all Actual Positives it looks at the proportion of True Negatives out of all Actual Negatives.
Let’s look at the formula:
Specificity is a measure of ‘quantity’ as Sensitivity is however we are now interested in the True Negatives.
Because it’s highly unlikely for a test to have very high Sensitivity and Specificity at the same time we use the Specificity to make a definite decision. We use Sensitivity and Specificity to choose the optimum level of False positive and False negatives we want to accept in our test. There is always a tradeoff between the two tests and depending on our objective we might choose one over the other or decide to choose equal amounts of both.
Let’s look at our matrix example and calculate Specificity.
A positive result in a test with high specificity is useful for ruling in disease. Our test has a specificity of 62.5% which is not that high.
Precision — tells us the quality of the test (prediction), how precise is.
Sensitivity|Recall — tells us how many (quantity) correct positive predictions we can make from all Actual positive observations. How sensitive our test (prediction) is in identifying correct positives from all real Positive observations.
Specificity — tells us how many (quantity) correct negative predictions we can make out of all negative observations
Some of the software out there would have the Predictions and Actuals swapped, such example is sklearn confusion_matrix so make sure you keep an eye on the documentation as well.