论文标题
WT5?!培训文本到文本模型以解释他们的预测
WT5?! Training Text-to-Text Models to Explain their Predictions
论文作者
论文摘要
神经网络最近在各种具有挑战性的自然语言处理(NLP)任务上实现了人类水平的表现,但是众所周知,很难理解为什么神经网络会产生特定的预测。在本文中,我们利用Raffel等人(2019年)提出的文本到文本框架来训练语言模型,以在其预测以及它们的预测以及它们的预测范围内输出自然文本说明。至关重要的是,这不需要对损耗函数或训练和解码程序进行任何修改 - 我们只是在生成(自然文本)预测后训练模型以输出解释。我们表明,这种方法不仅获得了最先进的基准结果,而且还允许从有限的标记解释和转移跨数据集的合理能力中学习。为了促进可重复性和未来的工作,我们发布了代码用于培训模型的代码。
Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their prediction. Crucially, this requires no modifications to the loss function or training and decoding procedures -- we simply train the model to output the explanation after generating the (natural text) prediction. We show that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets. To facilitate reproducibility and future work, we release our code use to train the models.