COVID-19的端到端质量检查：合成训练的域适应

论文标题

COVID-19的端到端质量检查：合成训练的域适应

End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training

论文作者

Reddy, Revanth Gangi, Iyer, Bhavani, Sultan, Md Arafat, Zhang, Rong, Sil, Avi, Castelli, Vittorio, Florian, Radu, Roukos, Salim

论文摘要

端到端的问答（QA）要求对检索到的段落进行大量文档收集和机器阅读理解（MRC）的信息检索（IR）。最近的工作已成功训练了神经IR系统，仅使用开放域数据集的监督问题答录（QA）示例。然而，尽管在维基百科上表现出色，但在更具体和专业的目标域，例如Covid-19，神经IR落后于传统术语匹配方法，例如BM25。此外，如果几乎没有标记的数据，在此类目标域中，有效的质量检查系统的有效适应也可能是具有挑战性的。在这项工作中，我们探讨了合成生成的质量检查示例的应用，以提高封闭域检索和MRC的性能。我们结合了我们的神经IR和MRC系统，并在最新的开放域质量识别质量域基线上在CORD-19收集的端到端质量检查中显示出显着改善。

End-to-end question answering (QA) requires both information retrieval (IR) over a large document collection and machine reading comprehension (MRC) on the retrieved passages. Recent work has successfully trained neural IR systems using only supervised question answering (QA) examples from open-domain datasets. However, despite impressive performance on Wikipedia, neural IR lags behind traditional term matching approaches such as BM25 in more specific and specialized target domains such as COVID-19. Furthermore, given little or no labeled data, effective adaptation of QA systems can also be challenging in such target domains. In this work, we explore the application of synthetically generated QA examples to improve performance on closed-domain retrieval and MRC. We combine our neural IR and MRC systems and show significant improvements in end-to-end QA on the CORD-19 collection over a state-of-the-art open-domain QA baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题