我们真的需要在变压器中的许多参数来进行提取摘要吗？话语可以帮助！

论文标题

我们真的需要在变压器中的许多参数来进行提取摘要吗？话语可以帮助！

Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !

论文作者

Xiao, Wen, Huber, Patrick, Carenini, Giuseppe

论文摘要

对流行变压器模型的多头自我注意事项被广泛用于自然语言处理（NLP），包括用于提取性摘要的任务。为了分析和修剪较重的参数自我注意机制，有多种方法提出了更多参数的自我发项替代方法。在本文中，我们使用话语先验提出了一种新型的参数lean自我发项机制。我们的新树自我发表基于文档级的话语信息，将最近提出的“合成器”框架扩展到另一种轻量级替代方案。我们表明的是，我们的树自我注意方法方法在提取性摘要的任务上达到了竞争性胭脂得分。与原始的单头变压器模型相比，尽管注意力组件中的参数显着降低，但树的注意力方法在EDU和句子水平上都达到了相似的性能。在应用更平衡的超参数设置时，我们在句子级别上进一步胜过8头变压器模型，需要较小的参数。

The multi-head self-attention of popular transformer models is widely used within Natural Language Processing (NLP), including for the task of extractive summarization. With the goal of analyzing and pruning the parameter-heavy self-attention mechanism, there are multiple approaches proposing more parameter-light self-attention alternatives. In this paper, we present a novel parameter-lean self-attention mechanism using discourse priors. Our new tree self-attention is based on document-level discourse information, extending the recently proposed "Synthesizer" framework with another lightweight alternative. We show empirical results that our tree self-attention approach achieves competitive ROUGE-scores on the task of extractive summarization. When compared to the original single-head transformer model, the tree attention approach reaches similar performance on both, EDU and sentence level, despite the significant reduction of parameters in the attention component. We further significantly outperform the 8-head transformer model on sentence level when applying a more balanced hyper-parameter setting, requiring an order of magnitude less parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题