Creating Trust Scores for Classification Models

Published November 18, 2021

Overview

Trust scores are introduced in the To Trust Or Not To Trust A Classifier, presented at the 32nd Conference on Neural Information Processing Systems in 2018. The trust scores are a similar concept to the credibility and confidence scores introduced in Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning. The trust score is a remarkably flexible ratio that has a goal of describing how much a user should trust the an algorithm’s classification of a specific point. It is ratio of the distance from the testing sample to the nearest classes similar and class different from the testing sample. This trust score for a testing sample is likely agree with the Bayes-optimal classifier, so the trust score is a clue to decide if an algorithm’s prediction about a test point is correct rather than a Type I or II Error. This approach mixes the topology of points in a multidimensional space with calculus to be effective at detecting the accuracy of a unique machine learning prediction.

Defined variables

k: the number of neighbors, like in k-Nearest Neighbors.

\(\alpha\): the density threshold and the specific fraction of data to filter out of the overall collection of data. This is a proxy for the density value. The recommended \(\alpha\) value is not specified in the paper.

1 - \(\alpha\): the probability mass. This can be desscribed as a fraction.

Step 1: Preprocess the data

The first step is to keep only the points that exist in a high density area. That specific density varies but is defined as \(\alpha\) as a general variable. This has the effect of removing outliers and having smooth clusters of points to use for calculating the trust score. The specific distribution of the smooth cluster of points is irrelevant.

The trust scores paper does not recommend an \(\alpha\) value to define what density counts as a high density set.

Step 2: Calculate the trust score

The trust score compares the the distance from the testing sample to the nearest classes similar and different from the testing sample.

The trust score is calculated as:

(distance from the testing sample to the nearest class different from the predicted class) / (distance from the testing sample to the nearest class the same as the predicted class)

Note that “class” in this definition describes the grouping categorization of a single point (like, a specific penguin species if the goal is the determine the penguin’s species), a group of k-Nearest Neighbors, or a centroid (centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the figure). The trust scores formula is flexible enough to accommodate all of those groupings and have more customizations. Mote details in the Variations of the trust score section.

If trust score > 1, then the different class is geographically closer to the testing sample than its predicted class. This is a red flag for the testing sample’s categorization being incorrect.

Variations of the trust score

The trust score concept put forth in the To Trust Or Not To Trust A Classifier paper can have variations in a few different categories. A trust score encompasses a variety of similar but slightly differentiated metrics. When communicating a trust score it is important to add the fine print about which metrics are being chosen to calculate that score.

Trust scores can have variations throughout different categories, just like creating a Build-A-Bear or picking out parts for a custom computer.

A specific test point. The trust score is about looking at the categorization of a single input’s categorization compared to the training data.
A specific classifier model. For example, neural network, random forest, or logistic regression.
A specific neighbor metric to use to compare the different classes. For example, k-Nearest Neighbors, a single nearest neighbor, or a centroid.
A specific data setting. For example, raw inputs, an unsupervised embedding of the space, or individual layers.

So a trust score has four categories, each with numerous options. It is crucial to specify the details of each of these four categories when communicating a trust score.

Compare trust scores to DkNN’s confidence and credibility scores

Similarites

Trust, confidence, and credibility scores all can use k-Nearest Neighbors when creating the score. Trust scores have the flexibility to additionally use a single nearest neighbor or a centroid when calculating the trust score.
All scores use the geographically closest points in their calculation.
Both the confidence score and the trust score consider the distance from the testing sample to points of other classes.
All scores can create their respective scores on individual intermediate layers in a deep learning model. Trust scores can create additional scores using different parts of the model, such as creating a trust score using the raw input.

Differences

Confidence and credibility scores do not rely on removing outliers from the training data to get a smooth and high density cluster of points. Trust score calculations rely on having the low density points removed from the training data, which might have adverse effects if the test points are outliers themself.
Using the confidence and credibility scores together is better for detecting outlying and adversarial input into the model. The trust score ratio fails to detect if the same and different classes are far away but in a reasonable ratio. This means the trust score would not be extreme enough to alert the algorithm user that the input could be an outlier. In that sense, the trust score would be less effective at detecting the adversarial input compared to a confidence score that can describe ordinal distances between the points in a model.
Trust scores have a greater flexibility on how to apply the concept, detailed in the Variations of the trust score section.
Trust score points must be independently and identically distributed (i.i.d.).

DIY

The code for the To Trust Or Not To Trust A Classifier paper is located on Github and has been updated for Python 3.