带有嵌入式学生的变性自动编码器 - 作者归因的$ t $混合模型

论文标题

带有嵌入式学生的变性自动编码器 - 作者归因的$ t $混合模型

Variational Autoencoder with Embedded Student-$t$ Mixture Model for Authorship Attribution

论文作者

Boenninghoff, Benedikt, Zeiler, Steffen, Nickel, Robert M., Kolossa, Dorothea

论文摘要

传统的计算作者归因描述了封闭场景中的分类任务。给定一组有限的候选作者和相应的标记文本，其目的是确定哪些作者编写了另一组匿名或有争议的文本。在这项工作中，我们提出了一个概率自动编码框架来处理此监督分类任务。更确切地说，我们正在将带有嵌入式高斯混合物模型的变异自动编码器（VAE）扩展到学生$ t $混合模型。自动编码器在学习潜在表示方面取得了巨大的成功。但是，现有的VAE目前仍受到潜在空间中基本概率分布的高斯高斯性施加的限制。在这项工作中，我们将VAE的高斯模型扩展到学生$ T $模型，该模型可以独立控制隐含概率密度的各个尾巴的“重度”。亚马逊评论数据集的实验表明该方法的卓越性能。

Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. More precisely, we are extending a variational autoencoder (VAE) with embedded Gaussian mixture model to a Student-$t$ mixture model. Autoencoders have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending the Gaussian model for the VAE to a Student-$t$ model, which allows for an independent control of the "heaviness" of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题