选择性问题在域移动下回答

论文标题

选择性问题在域移动下回答

Selective Question Answering under Domain Shift

论文作者

Kamath, Amita, Jia, Robin, Liang, Percy

论文摘要

为了避免给出错误的答案，问题答案（QA）模型需要知道何时弃权。此外，用户经常提出与模型培训数据有所不同的问题，从而使错误更有可能，从而更加关键。在这项工作中，我们提出了在域转移下的选择性问题的设置，其中质量检查模型在内域和外域数据的混合物中进行了测试，并且必须在保持高精度的同时回答（即不要弃权）（即不放弃）。仅基于模型的SoftMax概率的弃权策略的表现不佳，因为模型在室外输入中过度自信。取而代之的是，我们训练校准器以识别质量检查模型错误的输入，并在预测错误时弃权。至关重要的是，校准器受益于观察模型在室外数据上的行为，即使是从与测试数据不同的领域中。我们将此方法与受过训练的QA模型相结合，并在小队和其他五个QA数据集的混合物上进行评估。我们的方法回答了56％的问题，同时保持80％的准确性；相比之下，直接使用模型的概率仅以80％的准确性答复48％。

To avoid giving wrong answers, question answering (QA) models need to know when to abstain from answering. Moreover, users often ask questions that diverge from the model's training data, making errors more likely and thus abstention more critical. In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i.e., not abstain on) as many questions as possible while maintaining high accuracy. Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs. Instead, we train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely. Crucially, the calibrator benefits from observing the model's behavior on out-of-domain data, even if from a different domain than the test data. We combine this method with a SQuAD-trained QA model and evaluate on mixtures of SQuAD and five other QA datasets. Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题