论文标题
病毒:NLU的虚拟对抗性积极学习
VirAAL: Virtual Adversarial Active Learning For NLU
论文作者
论文摘要
本文介绍了Viraal,这是一个基于对抗性培训的积极学习框架。 Viraal旨在减少自然语言理解中注释的努力(NLU)。 Viraal基于虚拟对抗训练(VAT),这是一种半监督的方法,该方法通过局部分布平滑度正规化模型。因此,将对抗性扰动添加到输入中,从而使后验分布更加一致。因此,基于熵的主动学习可以通过查询更多信息的样本而无需其他组件,从而变得强大。第一组实验研究了适应的增值税对低标记数据制度内的联合NLU任务的影响。第二组显示了活性学习(AL)过程中病毒性的效果。结果表明,即使在多任务训练中,增值税也是强大的,在多任务训练中,从多个损耗函数计算对手噪声。使用基于熵的AL观察到具有重大改进,并带有病毒性查询数据以注释。就AL计算而言,Viraal是一种廉价的方法,对数据采样产生了积极影响。此外,病毒性的注释减少了80%的注释,并显示出对现有数据增强方法的改善。该代码公开可用。
This paper presents VirAAL, an Active Learning framework based on Adversarial Training. VirAAL aims to reduce the effort of annotation in Natural Language Understanding (NLU). VirAAL is based on Virtual Adversarial Training (VAT), a semi-supervised approach that regularizes the model through Local Distributional Smoothness. With that, adversarial perturbations are added to the inputs making the posterior distribution more consistent. Therefore, entropy-based Active Learning becomes robust by querying more informative samples without requiring additional components. The first set of experiments studies the impact of an adapted VAT for joint-NLU tasks within low labeled data regimes. The second set shows the effect of VirAAL in an Active Learning (AL) process. Results demonstrate that VAT is robust even on multi-task training, where the adversarial noise is computed from multiple loss functions. Substantial improvements are observed with entropy-based AL with VirAAL for querying data to annotate. VirAAL is an inexpensive method in terms of AL computation with a positive impact on data sampling. Furthermore, VirAAL decreases annotations in AL up to 80% and shows improvements over existing data augmentation methods. The code is publicly available.