改善嘈杂张开标签的培训

论文标题

改善嘈杂张开标签的培训

Improving Training on Noisy Stuctured Labels

论文作者

Abid, Abubakar, Zou, James

论文摘要

细粒注释---例如。密集的图像标签，图像分割和文本标记---在许多ML应用中都很有用，但它们的生成是劳动密集型的。此外，在这些细粒度的注释中通常会出现系统的结构化错误。例如，在图像中可能完全未经注释，或者只能将汽车和街道之间的边界精炼出来。使用这种结构化误差的数据进行标准ML培训会产生偏见和性能差的模型。在这项工作中，我们提出了一个新颖的误差校正网络（ECN）框架，以应对在细粒注释中结构化误差中学习的挑战。鉴于一个大型嘈杂的数据集具有常见的结构性错误，并且具有更准确的注释的数据集，ECN可以实质上改善细粒注释的预测与噪音数据的训练方法相比。它通过学习利用注释和嘈杂标签中的结构来做到这一点。对图像分割和文本标签的系统实验证明了ECN在改善嘈杂结构标签的训练方面的出色表现。

Fine-grained annotations---e.g. dense image labels, image segmentation and text tagging---are useful in many ML applications but they are labor-intensive to generate. Moreover there are often systematic, structured errors in these fine-grained annotations. For example, a car might be entirely unannotated in the image, or the boundary between a car and street might only be coarsely annotated. Standard ML training on data with such structured errors produces models with biases and poor performance. In this work, we propose a novel framework of Error-Correcting Networks (ECN) to address the challenge of learning in the presence structured error in fine-grained annotations. Given a large noisy dataset with commonly occurring structured errors, and a much smaller dataset with more accurate annotations, ECN is able to substantially improve the prediction of fine-grained annotations compared to standard approaches for training on noisy data. It does so by learning to leverage the structures in the annotations and in the noisy labels. Systematic experiments on image segmentation and text tagging demonstrate the strong performance of ECN in improving training on noisy structured labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题