Evaluation indicators of binary classification models

Classification

Machine Learning

Accuracy, Precision and Recall

Author

Guofeng Lin

Published

November 8, 2021

Why talk about evaluation indicators

When discussing the quality of classification models, we usually use a test dataset to evaluate them. So, what metrics can truly indicate a model’s quality? Some might say, “Just look at the accuracy rate; if it’s all correct, then the model is pretty good.”

This approach has some merit, but some datasets may be imbalanced, and the accuracy rate is strongly correlated with the dataset distribution. For example, out of 1000 people, only 997 might have had chickenpox (labeled 1), while only 3 might not have had it (labeled 0). In this case, even if the model output is always 1, the accuracy rate would still be as high as 99.7%. Therefore, this evaluation metric cannot accurately reflect the model’s generalization ability.

Basic Definitions

Let’s look at the following concepts:

True Positive (TP): means that the model predicts a positive sample, and the sample is actually a positive sample.
False Positive (FP): means that the model predicts a positive sample, but the sample is actually a negative sample.
False Negative (FN): means that the models predicts a negative sample, but the sample is actually a positive sample.
True Negative (TN): means that the models predicts a negative sample, and the sample is actually a negative sample.

Definition of Accuracy

Accuracy represents the proportion of positive and negative samples correctly predicted by the model out of all samples, and the formula is: \[ acc=\frac{TP+TF}{TP+FP+TN+FN} \]

This is the accuracy metric mentioned in the example mentioned before, where categories with a large weighting often become the main factors affecting accuracy.

Definition of Precision

Precision represents the proportion of correctly predicted positive samples out of all samples predicted as positive. The formula is:

\[ precision=\frac{TP}{TP+FP} \]

Definition of Recall

Recall rate represents the proportion of correctly predicted positive samples out of all actual positive samples, and the formula is:

\[ recall=\frac{TP}{TP+FN} \]