The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve is a performance measure for classification problems. It tells us how well a model can distinguish between classes. The higher the AUC, the better the model is at predicting positives as positives and negatives as negatives.
What is AUC?
AUC measures the entire two-dimensional area underneath the ROC curve. It provides an aggregate measure of performance across all possible classification thresholds.
- AUC = 1.0: Perfect classifier - can perfectly distinguish between positive and negative classes
- AUC = 0.5: No better than random guessing - the model has no discrimination capacity
- AUC < 0.5: Worse than random guessing, you should invert the predictions :D
How to Interpret the Visualization
Distribution Plot: Shows how the model’s predicted probabilities are distributed for positive (orange) and negative (purple) classes. Move the threshold slider to see how different threshold values affect the confusion matrix and metrics.
ROC Curve: Plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. The area under this curve (AUC) represents the model’s ability to discriminate between classes.
Dataset Types:
- Separated: Classes have little overlap, making them easy to separate (high AUC)
- Overlapping: Classes overlap significantly, making classification more challenging (medium AUC)
- Unbalanced: One class has many more examples than the other, which can affect metrics (varying AUC)