论文标题

全球和时间序列注释的联合多维模型

Joint Multi-Dimensional Model for Global and Time-Series Annotations

论文作者

Ramakrishna, Anil, Gupta, Rahul, Narayanan, Shrikanth

论文摘要

众包是收集未标记数据实例注释的一种流行方法。它涉及从每个数据实例中收集大量的注释,通常是幼稚的未经训练的注释者,然后将其合并以估计地面真相。此外,对于每个实例,诸如情感等构建体的注释通常是多维的,带有多个维度(例如价和唤醒)的注释。但是,大多数注释融合方案都忽略了这一方面,并​​分别对每个维度进行建模。在这项工作中,我们通过提出用于多维注释融合的生成模型来解决这一问题,该模型共同建模了尺寸,从而导致更准确的地面真相估计。我们提出的模型适用于全球和时间序列的注释融合问题,并将地面真理视为被注释者扭曲的潜在变量。使用预期最大化算法估算模型参数,我们使用合成数据和真实的情感语料库以及带有人类注释的人工任务来评估其性能

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源