通过图层多视图解码重新思考和改善自然语言的生成

论文标题

通过图层多视图解码重新思考和改善自然语言的生成

Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

论文作者

Liu, Fenglin, Ren, Xuancheng, Zhao, Guangxiang, You, Chenyu, Ma, Xuewei, Wu, Xian, Sun, Xu

论文摘要

在序列到序列学习中，例如自然语言生成，解码器依赖于注意机制从编码器中有效提取信息。尽管通常只从最后一个编码层汲取信息是普遍的做法，但最近的工作提议将不同编码层的表示形式用于多元化的信息级别。但是，解码器仍然仅获得源序列的单一视图，这可能导致由于层次结构绕过问题而导致编码器层堆栈的训练不足。在这项工作中，我们提出了层的多视图解码，其中每个解码器层以及最后一个编码器层的表示形式（作为全局视图），从其他编码器层中的那些层进行了补充，以供源序列的立体视图。系统的实验和分析表明，我们成功地解决了旁观问题的层次结构，几乎需要可忽略不计的参数增加，并通过对五个不同的任务，即机器翻译，抽象摘要，图像图像字幕，视频字幕，报告生成，医学报告生成和派出酶的生成，实质上改善了序列到序列学习的性能。特别是，我们的方法在十个基准数据集上实现了新的最新结果，包括一个低资源的机器翻译数据集和两个低资源医学报告生成数据集。

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题