使每个标签计数：通过整合域知识来处理语义不确定

论文标题

使每个标签计数：通过整合域知识来处理语义不确定

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

论文作者

Brust, Clemens-Alexander, Barz, Björn, Denzler, Joachim

论文摘要

嘈杂的数据是从网络上爬行或由机械土耳其人或公民科学家等志愿者提供的，被认为是专业标记数据的替代方法。研究的重点是减轻标签噪声的影响。它通常被建模为不准确性，其中正确的标签被同一集的错误标签替换。我们考虑标签噪声的附加维度：不精确。例如，一个非繁殖的雪地被标记为鸟。该标签是正确的，但不像任务所需的那样精确。标准的软性分类器无法从如此弱的标签中学习，因为它们认为所有互斥的类别都不是，而非繁殖的雪地和鸟则不是。我们提出了Chillax（用于不精确的标签学习和注释外推的类层次结构），一种基于层次分类的方法，以充分利用任何精确度的标签。对Nabirds和ILSVRC2012的嘈杂变体的实验表明，我们的方法的表现优于强大的基准点高达16.4个百分点，而目前的最高水平则高达3.9个百分点。

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题