PLEX：使用预估计的大型扩展的可靠性

论文标题

PLEX：使用预估计的大型扩展的可靠性

Plex: Towards Reliability using Pretrained Large Model Extensions

论文作者

Tran, Dustin, Liu, Jeremiah, Dusenberry, Michael W., Phan, Du, Collier, Mark, Ren, Jie, Han, Kehang, Wang, Zi, Mariet, Zelda, Hu, Huiyi, Band, Neil, Rudner, Tim G. J., Singhal, Karan, Nado, Zachary, van Amersfoort, Joost, Kirsch, Andreas, Jenatton, Rodolphe, Thain, Nithum, Yuan, Honglin, Buchanan, Kelly, Murphy, Kevin, Sculley, D., Gal, Yarin, Ghahramani, Zoubin, Snoek, Jasper, Lakshminarayanan, Balaji

论文摘要

人工智能的最新趋势是将验证的模型用于语言和视觉任务，这些模型已经实现了非凡的表现，但也令人困惑。因此，以各种方式探索这些模型的能力对该领域至关重要。在本文中，我们探讨了模型的可靠性，在其中我们将可靠的模型定义为不仅可以实现强大的预测性能，而且在许多涉及不确定性的决策任务（例如，选择性预测，开放设置识别），强大的概括性和诸如诸如“准确性和适当的符号”规则之类的涉及不确定性的决策任务（例如，在In-of-inciple of-inciple in-ef-insibe of in-iN-iN-im-nim-nim-nim-nim-nim-extristion of-incibe contrion）上都始终如一地表现良好。学习，几乎不确定性）。我们设计了40个数据集的10种任务类型，以评估视觉和语言域上可靠性的不同方面。为了提高可靠性，我们分别开发了VIT-PLEX和T5-PLEX，分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进，并简化了传统协议，因为它可以改善开箱即用的性能，并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型大小的缩放效果，并且预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能，包括零射门开放式识别，主动学习和对话语言理解中的不确定性。

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题