论文标题

同一作者还是同一主题?朝着独立于内容的样式表示

Same Author or Just Same Topic? Towards Content-Independent Style Representations

论文作者

Wegmann, Anna, Schraagen, Marijn, Nguyen, Dong

论文摘要

语言风格是语言不可或缺的组成部分。风格表示形式的开发最新进展已越来越多地使用了作者身份验证(AV)的培训目标:两个文本是否具有同一作者? AV培训任务(同一作者近似相同的写作风格)的基础的假设可以自欺欺人,从而实现了广泛的培训。但是,在AV任务上的良好性能并不能确保良好的“通用”样式表示。例如,正如同一位作者通常可以写出某些主题时,在AV上训练的表示形式也可能编码内容信息,而不是单独使用样式。我们介绍了AV培训任务的变体,该任务使用对话或域标签控制内容。我们通过针对最近提出的Stel框架的原始变化来评估已知样式维度是否代表并优先于内容信息。我们发现,通过控制对话培训的表示形式比用域或没有内容控制的表示样式独立于内容训练的表示形式要好。

Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV): Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, a good performance on the AV task does not ensure good "general-purpose" style representations. For example, as the same author might typically write about certain topics, representations trained on AV might also encode content information instead of style alone. We introduce a variation of the AV training task that controls for content using conversation or domain labels. We evaluate whether known style dimensions are represented and preferred over content information through an original variation to the recently proposed STEL framework. We find that representations trained by controlling for conversation are better than representations trained with domain or no content control at representing style independent from content.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源