通过远距离意识改善单模深度不确定性的简单方法

论文标题

通过远距离意识改善单模深度不确定性的简单方法

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

论文作者

Liu, Jeremiah Zhe, Padhy, Shreyas, Ren, Jie, Lin, Zi, Wen, Yeming, Jerfel, Ghassen, Nado, Zack, Snoek, Jasper, Tran, Dustin, Lakshminarayanan, Balaji

论文摘要

准确的不确定性量化是深度学习的主要挑战，因为神经网络可以过度自信错误，并为分布外（OOD）输入分配高置信度预测。深度学习中最流行的估计预测不确定性方法是结合了来自多个神经网络的预测，例如贝叶斯神经网络（BNN）和深层合奏。但是，由于高度记忆和计算成本，它们在实时的工业规模应用中的实用性受到限制。此外，合奏和BNN不一定要解决基础成员网络的所有问题。在这项工作中，我们研究了基于单个确定性表示的原则方法来改善单个网络的不确定性属性。通过将不确定性量化为最小学习问题，我们首先确定距离意识，即模型量化测试示例与培训数据的距离的能力，是DNN实现高质量（即最小值最佳）不确定性估计的必要条件。然后，我们提出了一种简单的方法来提高现代DNN具有两个简单更改的现代DNN的距离意识能力：（1）将光谱归一化应用于隐藏的权重以在表示中的bi-lipschitz平滑度中，并将最后一个输出层替换为高斯流程层。在一系列视觉和语言理解基准的套件上，SNGP在预测，校准和室外检测方面的其他单模方法优于其他单模方法。此外，SNGP为诸如深度集合和数据增强之类的流行技术提供了互补的好处，这使其成为概率深度学习的简单且可扩展的构件。代码在https://github.com/google/unclententy-baselines上开源

Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines

下载PDF全文

下载文献需遵守相关版权规定

论文标题