论文标题
一个轻巧,有效,有效的模型,用于众包的标签聚合
A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing
论文作者
论文摘要
由于众包标签的噪音,标签聚合(LA)已成为后处理后众包标签的标准程序。 LA方法通过对工人品质进行建模来估算众包标签的真正标签。大多数现有的洛杉矶方法本质上都是迭代的。他们需要多次遍历所有众包标签,以共同和迭代地更新真实的标签和工人品质,直到收敛为止。因此,这些方法具有很高的空间和时间复杂性。在本文中,我们将LA视为动态系统,并将其建模为动态贝叶斯网络。从动态模型中,我们得出了两种轻量级算法,La \ textsuperscript {OnePass}和La \ textSuperscript {twopass},可以通过大多数两次遍历所有标签,从而有效,有效地估算工作的工作质量和真实标签。由于动态性质,所提出的算法还可以在线估算真正的标签,而无需重新访问历史数据。从理论上讲,我们证明了所提出的算法的收敛属性,并约束了估计工人质量的错误。我们还分析了拟议算法的空间和时间复杂性,并表明它们等同于多数投票的算法。在20个现实世界数据集上进行的实验表明,即使它们最多两次遍历了所有标签,提出的算法也可以在离线和在线设置中有效,有效地汇总标签。
Due to the noises in crowdsourced labels, label aggregation (LA) has emerged as a standard procedure to post-process crowdsourced labels. LA methods estimate true labels from crowdsourced labels by modeling worker qualities. Most existing LA methods are iterative in nature. They need to traverse all the crowdsourced labels multiple times in order to jointly and iteratively update true labels and worker qualities until convergence. Consequently, these methods have high space and time complexities. In this paper, we treat LA as a dynamic system and model it as a Dynamic Bayesian network. From the dynamic model we derive two light-weight algorithms, LA\textsuperscript{onepass} and LA\textsuperscript{twopass}, which can effectively and efficiently estimate worker qualities and true labels by traversing all the labels at most twice. Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data. We theoretically prove the convergence property of the proposed algorithms, and bound the error of estimated worker qualities. We also analyze the space and time complexities of the proposed algorithms and show that they are equivalent to those of majority voting. Experiments conducted on 20 real-world datasets demonstrate that the proposed algorithms can effectively and efficiently aggregate labels in both offline and online settings even if they traverse all the labels at most twice.