神经主题模型通过最佳运输

论文标题

神经主题模型通过最佳运输

Neural Topic Model via Optimal Transport

论文作者

Zhao, He, Phung, Dinh, Huynh, Viet, Le, Trung, Buntine, Wray

论文摘要

最近，由于文本分析的有希望的结果，受各种自动编码器启发的神经主题模型（NTMS）越来越多地研究。但是，现有的NTM通常很难同时获得良好的文档表示和相干/不同的主题。此外，他们经常在简短的文档上严重降低表现。重新聚集的要求还可以构成其培训质量和模型灵活性。为了解决这些缺点，我们通过最佳运输理论（OT）提出了一个新的神经主题模型。具体来说，我们建议通过将文档与文档单词分布的距离直接最小化来了解文档的主题分布。重要的是，OT距离的成本矩阵模拟了主题和单词之间的权重，这是由嵌入空间中主题和单词之间的距离构成的。我们提出的模型可以通过可区分的损失进行有效训练。广泛的实验表明，我们的框架在发现更连贯和多样化的主题并为常规文本和短文中得出更好的文档表示方面极大地胜过最先进的NTM。

Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题