论文标题
探索和利用用于机器阅读理解的多晶状体表示
Exploring and Exploiting Multi-Granularity Representations for Machine Reading Comprehension
论文作者
论文摘要
最近,注意力增强的多层编码器(例如变压器)已在机器阅读理解(MRC)中进行了广泛的研究。为了预测答案,通常使用预测因子仅从最终的编码层中汲取信息,该层生成源序列的粗粒表示,即段落和问题。分析表明,随着编码层的增加,源序列的表示会变得更粗糙。人们普遍认为,随着深度神经网络中越来越多的层次,编码过程将越来越多地为每个位置收集相关信息,从而导致更粗糙的表示形式,这增加了与其他位置相似的可能性(指均质性)。这种现象会误导该模型做出错误的判断并降低表现。在本文中,我们认为,如果预测指标可以利用与编码器不同粒度的表示形式,提供源序列的不同视图,以便可以充分利用模型的表达能力。为此,我们提出了一种新型方法,称为自适应双向关注胶囊网络(ABA-NET),该方法可自适应地利用不同级别的源表示向预测变量。此外,由于更好的表示形式是提高MRC性能的核心,因此胶囊网络和自我发项模块被仔细设计为我们编码器的构建块,该模块分别提供了探索本地和全球表示形式的能力。在三个基准数据集(即小队1.0,Squad 2.0和COQA)上进行的实验结果证明了我们方法的有效性。特别是,我们在小队1.0数据集上设置了新的最新性能
Recently, the attention-enhanced multi-layer encoder, such as Transformer, has been extensively studied in Machine Reading Comprehension (MRC). To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the coarse-grained representations of the source sequences, i.e., passage and question. The analysis shows that the representation of source sequence becomes more coarse-grained from finegrained as the encoding layer increases. It is generally believed that with the growing number of layers in deep neural networks, the encoding process will gather relevant information for each location increasingly, resulting in more coarse-grained representations, which adds the likelihood of similarity to other locations (referring to homogeneity). Such phenomenon will mislead the model to make wrong judgement and degrade the performance. In this paper, we argue that it would be better if the predictor could exploit representations of different granularity from the encoder, providing different views of the source sequences, such that the expressive power of the model could be fully utilized. To this end, we propose a novel approach called Adaptive Bidirectional Attention-Capsule Network (ABA-Net), which adaptively exploits the source representations of different levels to the predictor. Furthermore, due to the better representations are at the core for boosting MRC performance, the capsule network and self-attention module are carefully designed as the building blocks of our encoders, which provides the capability to explore the local and global representations, respectively. Experimental results on three benchmark datasets, i.e., SQuAD 1.0, SQuAD 2.0 and COQA, demonstrate the effectiveness of our approach. In particular, we set the new state-of-the-art performance on the SQuAD 1.0 dataset