论文标题
Wanli:自然语言推理数据集创建的工人和AI协作
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
论文作者
论文摘要
众包NLP数据集大规模的反复挑战是,在制作示例时,人类作家通常会依靠重复的模式,从而导致缺乏语言多样性。我们介绍了一种基于工人和AI协作的数据集创建的新方法,该方法汇集了语言模型的生成力量和人类的评估力量。从现有的数据集,自然语言推理的Multinli开始,我们的方法使用数据集制图自动确定示例来证明具有挑战性的推理模式,并指示GPT-3撰写具有相似模式的新示例。然后,机器生成的示例会自动过滤,并最终由人类人群工人进行修订和标记。由此产生的数据集Wanli由107,885个NLI示例组成,并在现有NLI数据集上呈现出独特的经验优势。值得注意的是,与对4倍大型Multinli的训练相比,训练Wanli的模型改善了我们考虑的八个室外测试集的性能,包括汉斯的11%和对抗性NLI的9%。此外,它仍然比其他NLI数据集增加了Multinli更有效。我们的结果证明了利用自然语言生成技术并重新想象人类在数据集创建过程中的作用的希望。
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns. Machine generated examples are then automatically filtered, and finally revised and labeled by human crowdworkers. The resulting dataset, WANLI, consists of 107,885 NLI examples and presents unique empirical strengths over existing NLI datasets. Remarkably, training a model on WANLI improves performance on eight out-of-domain test sets we consider, including by 11% on HANS and 9% on Adversarial NLI, compared to training on the 4x larger MultiNLI. Moreover, it continues to be more effective than MultiNLI augmented with other NLI datasets. Our results demonstrate the promise of leveraging natural language generation techniques and re-imagining the role of humans in the dataset creation process.