Sources of Uncertainty

(i.e. what kinds of mistakes do neural networks make?)

Error in computer vision tasks is often presented as originating from two sources of uncertainty: aleatoric and epistemic. Epistemic uncertainty (related to the Ancient Greek, episteme, for knowledge), is uncertainty that is theoretically under the control of the modeller (i.e. the person designing the model) and can be reduced by decisions in the model design. Under this definition, if only epistemic uncertainty exists, then the following is true: there theoretically exists a model architecture defined by $ \mathtt{f}^* $, which can be trained by a optimal dataset and optimiser to yield parameters $ \theta^* $, such that the trained model $ \mathtt{f}_\theta^* $ reduces the test error to zero. (Strictly, this should be $\mathtt{f}^*_{\theta^*}$, as both the architecture and the parameters are optimal, however $\mathtt{f}^*_\theta$ is used for simplicity of presentation.) For this reason, epistemic uncertainty is due to the suboptimality of the model and its parameters rather than the input data, and can be reduced by an improved training procedure, such as a more diverse training dataset, better network architecture, better optimiser etc.

Aleatoric Uncertainty

In contrast, aleatoric uncertainty is uncertainty than cannot be reduced by the modeller. Etymologically, aleatoric is related to the Latin, alea, for dice, bringing up the idea that aleatoric uncertainty causes error due to the inherent randomness of the observations, rather than imperfections in the model.

This means that no combination of model, dataset, optimizer can reduce the error on the test dataset to zero. In outdoor robotics, examples of sources of aleatoric uncertainty include: noise from the sensor, environmental factors such as fog or darkness, lack of resolution of objects in the distance, sensor adherents such as rain, snow, or smudges on a camera lens.

The distinction between the types of uncertainty is important for two reasons. Firstly, knowledge of the uncertainty type should be used to design the method for its estimation, e.g. if uncertainty primarily originates in the data, then a method should focus its attention on the data and not the model parameters. Secondly, since epistemic uncertainty is reducible, estimating it for a dataset of unlabelled images gives the modeller information about which images to use to improve the model’s accuracy, for example by labelling them and then training on them. Therefore, estimating epistemic and aleatoric uncertainty separately can be useful in an active learning setting to find training examples that are maximally informative, i.e. those that have high epistemic uncertainty and low aleatoric uncertainty. The latter is important as images with high aleatoric uncertainty are, by definition, uninformative, and thus cannot improve the model’s accuracy.