学会通过施工忠实地合理化

论文标题

学会通过施工忠实地合理化

Learning to Faithfully Rationalize by Construction

论文作者

Jain, Sarthak, Wiegreffe, Sarah, Pinter, Yuval, Wallace, Byron C.

论文摘要

在许多设置中，重要的是要理解为什么模型做出特定预测。在NLP中，这通常需要提取输入文本的片段“负责”相应的模型输出；当这样的片段包括确实为模型预测提供的代币时，这是一个忠实的解释。在某些情况下，忠诚可能对确保透明度至关重要。 Lei等。（2016年）提出了一个模型，通过定义独立的摘要提取和预测模块来为神经文本分类产生忠实的理由。但是，通过这种方法执行的输入令牌的离散选择使训练变得复杂，导致较高的方差，并需要仔细的超参数调整。我们提出了这种方法的更简单的变体，可以通过构造提供忠实的解释。在我们的方案中，命名为新的，任意特征的重要性得分（例如，训练有素的模型的梯度）用于诱导令牌输入上的二进制标签，可以培训提取器以预测。然后，将独立的分类器模块专门在提取器提供的片段上进行培训；因此，即使分类器任意复杂，这些片段也构成了忠实的解释。在自动评估和手动评估中，我们发现这个简单框架的变体产生的预测性能优于“端到端”方法，同时更一般和更易于训练。代码可从https://github.com/successar/fresh获得

In many settings it is important for one to be able to understand why a model made a particular prediction. In NLP this often entails extracting snippets of an input text `responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation. In some settings, faithfulness may be critical to ensure transparency. Lei et al. (2016) proposed a model to produce faithful rationales for neural text classification by defining independent snippet extraction and prediction modules. However, the discrete selection over input tokens performed by this method complicates training, leading to high variance and requiring careful hyperparameter tuning. We propose a simpler variant of this approach that provides faithful explanations by construction. In our scheme, named FRESH, arbitrary feature importance scores (e.g., gradients from a trained model) are used to induce binary labels over token inputs, which an extractor can be trained to predict. An independent classifier module is then trained exclusively on snippets provided by the extractor; these snippets thus constitute faithful explanations, even if the classifier is arbitrarily complex. In both automatic and manual evaluations we find that variants of this simple framework yield predictive performance superior to `end-to-end' approaches, while being more general and easier to train. Code is available at https://github.com/successar/FRESH

下载PDF全文

下载文献需遵守相关版权规定

论文标题