论文标题
以常识性问题为零评估的知识驱动数据构建
Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering
论文作者
论文摘要
预先训练的神经语言建模的最新发展导致了共识提问基准的准确性。但是,越来越多的担忧是模型过于适应特定任务,而无需学习使用外部知识或执行一般的语义推理。相比之下,零射门评估已显示出对模型一般推理能力的更强大度量的希望。在本文中,我们提出了一个新型的神经符号框架,用于在常识任务中回答零弹的问题。在一组假设的指导下,该框架研究如何将各种预先存在的知识资源转变为最有效的培训模型的形式。我们改变了语言模型,培训制度,知识来源和数据生成策略的集合,并衡量其在任务中的影响。为了扩展先前的工作,我们设计并比较了四种受约束的干扰物采样策略。我们通过五个外部知识资源生成的数据提供了五个共识性提问任务的经验结果。我们表明,尽管单个知识图更适合特定任务,但全局知识图会在不同的任务上带来一致的收益。此外,保留任务的结构以及产生公平和有益的问题都可以帮助语言模型更有效地学习。
Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a model's general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.