论文标题
使用交互式反馈来提高问题回答系统的准确性和解释性
Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment
论文作者
论文摘要
大多数关于问题回答的研究都集中在剥离前阶段。即,建立一个精确的部署模型。在本文中,我们提出一个问题:我们可以根据用户交互进一步改善质量检查系统\ emph {post-}吗?我们专注于两种改进:1)提高质量检查系统的性能本身,2)提供模型以解释答案的正确性或不正确性。我们收集一个基于检索的质量检查数据集,FeffbackQA,其中包含来自用户的交互式反馈。我们通过将基本质量检查系统部署给拥挤的人来收集此数据集,后者随后与系统互动并提供有关其答案质量的反馈。反馈既包含结构化评分,又包含非结构化的自然语言解释。我们使用这些反馈数据来培训神经模型,该数据可以产生解释并重新评分候选者。我们表明,反馈数据不仅提高了已部署的质量检查系统的准确性,还提高了其他更强大的非部署系统。生成的解释还可以帮助用户就答案的正确性做出明智的决定。 项目页面:https://mcgill-nlp.github.io/feedbackqa/
Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment. In this paper, we ask the question: Can we improve QA systems further \emph{post-}deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer. We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers. The feedback contains both structured ratings and unstructured natural language explanations. We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers. Project page: https://mcgill-nlp.github.io/feedbackqa/