多语言AMR到文本一代

论文标题

多语言AMR到文本一代

Multilingual AMR-to-Text Generation

论文作者

Fan, Angela, Gardent, Claire

论文摘要

从结构化数据中生成文本很具有挑战性，因为它需要弥合（i）结构和自然语言（NL）和（ii）语义上指定的输入和完全指定的NL输出之间的差距。多语言产生带来了另一个挑战：将单词顺序和形态学特性变化的语言产生。在这项工作中，我们专注于抽象含义表示（AMR）作为结构化输入，在此研究中，以前的研究以压倒性的重点是仅生成英语。我们利用跨语性嵌入，预处理和多语言模型的进步来创建以二十一种不同语言生成的多语言AMR到文本模型。对于18种语言，基于自动指标，我们的多语言模型超过了单个语言的基线。我们分析了多语言模型使用人类评估准确捕获形态和单词顺序的能力，并发现母语人士认为我们的世代流利。

Generating text from structured data is challenging because it requires bridging the gap between (i) structure and natural language (NL) and (ii) semantically underspecified input and fully specified NL output. Multilingual generation brings in an additional challenge: that of generating into languages with varied word order and morphological properties. In this work, we focus on Abstract Meaning Representations (AMRs) as structured input, where previous research has overwhelmingly focused on generating only into English. We leverage advances in cross-lingual embeddings, pretraining, and multilingual models to create multilingual AMR-to-text models that generate in twenty one different languages. For eighteen languages, based on automatic metrics, our multilingual models surpass baselines that generate into a single language. We analyse the ability of our multilingual models to accurately capture morphology and word order using human evaluation, and find that native speakers judge our generations to be fluent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题