分销差异：无条件文本生成的指标

论文标题

分销差异：无条件文本生成的指标

Distributional Discrepancy: A Metric for Unconditional Text Generation

论文作者

Cai, Ping, Chen, Xingyuan, Jin, Peng, Wang, Hongjun, Li, Tianrui

论文摘要

无条件文本生成的目的是训练一个具有真实句子的模型，然后生成与培训数据相同质量和多样性的新颖句子。但是，当使用不同的指标比较无条件文本生成的方法时，得出了矛盾的结论。困难是，在评估模型时，应同时考虑样品的多样性和质量。为了解决这个问题，新颖的分布差异（DD）旨在根据生成的训练句子和实际训练句之间的差异来评估发电机。但是，由于不可用的真实句子的分布，它无法直接计算DD。因此，我们提出了一种通过训练基于神经网络的文本分类器来估算DD的方法。为了进行比较，三个现有的指标，双语评估研究（BLEU）与自我布鲁，语言模型得分与反向语言模型得分，以及fréchet嵌入距离以及拟议的DD，用于评估两个流行的短期记忆和生成预告片的流行生成模型，以及在语法和真实数据上均在句法和真实数据上。实验结果表明，DD明显好于对这些生成模型进行排名的三个现有指标。

The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data. However, when different metrics are used for comparing the methods of unconditional text generation, contradictory conclusions are drawn. The difficulty is that both the diversity and quality of the sample should be considered simultaneously when the models are evaluated. To solve this problem, a novel metric of distributional discrepancy (DD) is designed to evaluate generators based on the discrepancy between the generated and real training sentences. However, it cannot compute the DD directly because the distribution of real sentences is unavailable. Thus, we propose a method for estimating the DD by training a neural-network-based text classifier. For comparison, three existing metrics, bi-lingual evaluation understudy (BLEU) versus self-BLEU, language model score versus reverse language model score, and Fréchet embedding distance, along with the proposed DD, are used to evaluate two popular generative models of long short-term memory and generative pretrained transformer 2 on both syntactic and real data. Experimental results show that DD is significantly better than the three existing metrics for ranking these generative models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题