用元数据哭泣监督的图像分类：自动嘈杂标签通过视觉语义图进行校正

论文标题

用元数据哭泣监督的图像分类：自动嘈杂标签通过视觉语义图进行校正

Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph

论文作者

Yang, Jingkang, Chen, Weirong, Feng, Litong, Yan, Xiaopeng, Zheng, Huabin, Zhang, Wayne

论文摘要

Webly监督的学习最近因其在数据扩展方面的效率而没有昂贵的人类标签而变得有吸引力。但是，采用搜索查询或标签作为图像的网络标签进行训练会带来巨大的噪音，从而降低了DNN的性能。特别是，由于查询单词的语义混乱，一个查询检索到的图像可能包含属于其他概念的巨大图像。例如，在Flickr上搜索“老虎猫”将返回主导数量的老虎图像，而不是猫图像。这些现实的嘈杂样品通常在视觉空间中具有清晰的视觉语义簇，这些群集误导了学习准确的语义标签中的DNN。为了纠正现实世界的嘈杂标签，昂贵的人类注释似乎是必不可少的。幸运的是，我们发现元数据可以提供额外的知识来以无劳动的方式发现干净的网络标签，从而使自动在庞大的标签 - 噪声网络数据中自动提供正确的语义指导是可行的。在本文中，我们根据视觉语义图提出了一个自动标签校正器VSGRAPH-LC。 VSgraph-LC从锚定选择开始，指的是元数据和正确标签概念之间的语义相似性，然后使用图形神经网络（GNN）在视觉图上传播正确的标签。对现实监督的学习数据集WebVision-1000和NUS-81-WEB进行了实验，显示了VSGRAPH-LC的有效性和鲁棒性。此外，VSGRAPH-LC在开放式验证集上揭示了其优势。

Webly supervised learning becomes attractive recently for its efficiency in data expansion without expensive human labeling. However, adopting search queries or hashtags as web labels of images for training brings massive noise that degrades the performance of DNNs. Especially, due to the semantic confusion of query words, the images retrieved by one query may contain tremendous images belonging to other concepts. For example, searching `tiger cat' on Flickr will return a dominating number of tiger images rather than the cat images. These realistic noisy samples usually have clear visual semantic clusters in the visual space that mislead DNNs from learning accurate semantic labels. To correct real-world noisy labels, expensive human annotations seem indispensable. Fortunately, we find that metadata can provide extra knowledge to discover clean web labels in a labor-free fashion, making it feasible to automatically provide correct semantic guidance among the massive label-noisy web data. In this paper, we propose an automatic label corrector VSGraph-LC based on the visual-semantic graph. VSGraph-LC starts from anchor selection referring to the semantic similarity between metadata and correct label concepts, and then propagates correct labels from anchors on a visual graph using graph neural network (GNN). Experiments on realistic webly supervised learning datasets Webvision-1000 and NUS-81-Web show the effectiveness and robustness of VSGraph-LC. Moreover, VSGraph-LC reveals its advantage on the open-set validation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题