Introduction
- In the world of machine learning, assessing the performance of classification models stands out as one the most vital steps, which brings the confusion matrix into the picture
Objectives
Understanding the confusion matrix
Components of the confusion matrix
Calculating the metrics
Assessing model performance
Understanding the confusion matrix
Definition: A performance measurement tool used in classification to assess the effectiveness of a predictive model.
Purpose: It helps us to understand how well a model is doing in terms of making predictions on a dataset.
Components of the confusion matrix
Actual Positive | Actual Negative | |
---|---|---|
Predicted Positive | TP | FP |
Predicted Negative | FN | TN |
Components:
True Positives (TP): Instances correctly predicted as positive.
True Negatives (TN): Instances correctly predicted as negative.
False Positives (FP): Instances incorrectly predicted as positive.
False Negatives (FN): Instances incorrectly predicted as negative.
Metrics:
Accuracy: Number of instances predicted correctly compared to the total number of instances. Simpler terms: Out of all the instances, how many of them have been predicted correctly?
\[ \frac{(TP + TN)}{(TP + TN + FP + FN)} \]
Precision: Number of positive instances predicted correctly compared to the total number of predicted positive instances. Simpler terms: When the model predicts something is positive, how often is it right?
\[ \frac{TP}{(TP + FP)} \]
Recall (Sensitivity): Number of positive instances predicted correctly compared to the total number of positive instances. Simpler terms: Out of all the positive instances, how many of them have been predicted correctly?
\[ TP / (TP + FN) \]
Specificity: Number of negative instances predicted correctly compared to the total number of negative instances. Simpler terms: Out of all the negative instances, how many of them have been predicted correctly?
\[ \frac{TN}{(TN + FP)} \]
F1 Score: A balanced metric that takes both precision and recall into account
\[ \frac{2 * (Precision * Recall)}{(Precision + Recall)} \]
Example
Predict whether a client will make a travel insurance claim
Actual Claim | Actual No Claim | |
---|---|---|
Predicted: Claim | 30 | 5 |
Predicted: No Claim | 7 | 50 |
True Positives (TP): 30 clients were correctly predicted to make a claim.
True Negatives (TN): 50 clients were correctly predicted not to make a claim
False Positives (FP): 5 clients were incorrectly predicted to make a claim when they actually did not.
False Negatives (FN): 7 clients were incorrectly predicted not to make a claim when they actually did.
Calculating the metrics
- Accuracy:
\[ \frac{(30 + 50)}{(30 + 50 + 5 + 7)} = 0.8696 \ or \ 86.96\% \]
- Precision:
\[ \frac{30}{(30 + 5)} = 0.8571\ or \ 85.71\% \]
- Recall (Sensitivity):
\[ \frac{30}{(30 + 7)} = 0.8108 \ or \ 81.08\% \]
- Specificity:
\[ \frac{50}{(50 + 5)} = 0.9091 \ or \ 90.91\% \]
- F1 Score:
\[ \frac{2 * ( 0.8571 * 0.8108)}{(0.8571 + 0.8108)} = 0.8333 \ or \ 83.33\% \]
Assessing model performance
Metric | Value |
---|---|
Accuracy | 86.96% |
Precision | 85.71% |
Recall (Sensitivity) | 81.08% |
Specificity | 90.91% |
F1 Score | 83.33% |
Accuracy: 86.96%: Correctly predicting whether a client will make a travel insurance claim or not.
Precision: Out of all the instances predicted as making a claim, 85.71% were correctly predicted.
Recall (Sensitivity): Out of all the actual “claim” instances, 81.08% were correctly predicted
Specificity: Out of all the actual “no claim” instances, 90.91% were correctly predicted
F1 Score: 83.33%: A good balance between precision and recall
If you are interested in more machine learning related blog posts/projects, feel free to explore here . If you have any questions or further discussions, please reach out to me.