论文标题
释义作为零拍的多语言翻译:从词汇和句法多样性中解开语义相似性
Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity
论文作者
论文摘要
最近的工作表明,多语言神经机器翻译(NMT)模型可以用来判断句子用相同语言的另一种句子来解释句子的程度(Thompson and Post,2020);但是,尝试使用标准梁搜索从这种模型中产生释义会产生微不足道的副本或附近的副本。我们介绍了一种简单的释义生成算法,该算法不鼓励输入中存在的n-gram产生。我们的方法可以从单个多语言NMT模型中以多种语言进行释义。此外,可以在生成时间控制输入和输出之间的词汇多样性量。我们进行了人类评估,将我们的方法与对大型英语合成释义数据库Parabank 2(Hu等人,2019c)进行训练的释义者进行了比较,并发现我们的方法产生的释义更好地保留含义,并且具有更具光泽的含义,并且具有相同水平的词汇多样性。其他较小的人类评估表明,我们的方法还使用两种非英语语言起作用。
Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies. We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input. Our approach enables paraphrase generation in many languages from a single multilingual NMT model. Furthermore, the amount of lexical diversity between the input and output can be controlled at generation time. We conduct a human evaluation to compare our method to a paraphraser trained on the large English synthetic paraphrase database ParaBank 2 (Hu et al., 2019c) and find that our method produces paraphrases that better preserve meaning and are more gramatical, for the same level of lexical diversity. Additional smaller human assessments demonstrate our approach also works in two non-English languages.