论文标题
深贝叶斯因素对作者身份验证的评分
Deep Bayes Factor Scoring for Authorship Verification
论文作者
论文摘要
PAN 2020著作验证(AV)挑战挑战着重于幻想小说文本的跨主题/封闭式AV任务。幻想小说是一个故事情节的粉丝编写的扩展,其中所谓的狂热主题描述了该文档的主要主题。 PAN 2020 AV任务中提供的数据非常具有挑战性,因为包括多个/不同的粉丝主题的文本作者。在这项工作中,我们将两种知名方法的分层融合介绍为单一端到端的学习过程:底部的深度度量学习框架旨在学习一个伪金属,将可变长度的文档映射到固定尺寸的特征向量。在顶部,我们结合了一个概率层,以在学习的度量空间中执行贝叶斯因子评分。我们还提供文本预处理策略来处理跨主题问题。
The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.