深贝叶斯因素对作者身份验证的评分

论文标题

深贝叶斯因素对作者身份验证的评分

Deep Bayes Factor Scoring for Authorship Verification

论文作者

Boenninghoff, Benedikt, Rupp, Julian, Nickel, Robert M., Kolossa, Dorothea

论文摘要

PAN 2020著作验证（AV）挑战挑战着重于幻想小说文本的跨主题/封闭式AV任务。幻想小说是一个故事情节的粉丝编写的扩展，其中所谓的狂热主题描述了该文档的主要主题。 PAN 2020 AV任务中提供的数据非常具有挑战性，因为包括多个/不同的粉丝主题的文本作者。在这项工作中，我们将两种知名方法的分层融合介绍为单一端到端的学习过程：底部的深度度量学习框架旨在学习一个伪金属，将可变长度的文档映射到固定尺寸的特征向量。在顶部，我们结合了一个概率层，以在学习的度量空间中执行贝叶斯因子评分。我们还提供文本预处理策略来处理跨主题问题。

The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.

下载PDF全文

下载文献需遵守相关版权规定

论文标题