论文标题

ETC-NLG:端到端主题条件的自然语言生成

ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation

论文作者

Carbone, Ginevra, Sarti, Gabriele

论文摘要

通过将大型预训练的发电机与用于将预测的令牌分布转向所选主题的属性模型,通过将大型预训练的发电机与属性模型配对来启用主题条件的自然语言生成。尽管具有计算效率,但PPLM仍需要大量标记的文本,以有效地平衡发电性和适当的调理,使其不适合低资源设置。我们提出ETC-NLG,这是一种利用主题建模注释的方法,以实现无标记的文档集合中的新兴主题,以实现端到端主题条件的自然语言生成。我们首先在意大利语的低资源环境中测试方法的有效性,从而评估主题模型和黄金注释的条件。然后,我们使用平行语料库对意大利语和英语进行比较评估。最后,我们提出了一种自动方法来估计调节对产生的话语的有效性。

Plug-and-play language models (PPLMs) enable topic-conditioned natural language generation by pairing large pre-trained generators with attribute models used to steer the predicted token distribution towards the selected topic. Despite their computational efficiency, PPLMs require large amounts of labeled texts to effectively balance generation fluency and proper conditioning, making them unsuitable for low-resource settings. We present ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections. We first test the effectiveness of our approach in a low-resource setting for Italian, evaluating the conditioning for both topic models and gold annotations. We then perform a comparative evaluation of ETC-NLG for Italian and English using a parallel corpus. Finally, we propose an automatic approach to estimate the effectiveness of conditioning on the generated utterances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源