使用顺序标签克服音频事件检测中的标签噪声

论文标题

使用顺序标签克服音频事件检测中的标签噪声

Overcoming label noise in audio event detection using sequential labeling

论文作者

Kim, Jae-Bin, Mun, Seongkyu, Oh, Myungwoo, Choe, Soyeon, Lee, Yong-Hyeok, Park, Hyung-Min

论文摘要

本文通过将强标标签作为删除的时间戳不准确的时间戳来解决音频事件检测（AED）中嘈杂的标签问题。在AED中，强标签包含特定事件的出现，其时间戳与音频剪辑中事件的开始和结束相对应。时间戳取决于每个注释者的主观性，它们的标签噪声是不可避免的。与强标签相反，弱标签仅表示特定事件的发生。它们没有时间戳引起的标签噪声，但时间信息被排除在外。为了完全利用可用的强和弱标签中的信息，我们提出了一种AED方案，除了将强标签转换为顺序标签之后，除了给定的强和弱标签外，还可以使用顺序标签进行训练。使用顺序标签始终如一地通过关注事件的发生来提高性能，尤其是在基于段的F-评分方面。在基于平均教师的半监督学习方法中，除了具有连续的标签以减轻标签噪声的监督学习外，还包括顺序预测的早期一步，对教师模型的预测不准确，并显着改善了基于段的F-SCORE，同时保持基于事件的F-SCORE。

This paper addresses the noisy label issue in audio event detection (AED) by refining strong labels as sequential labels with inaccurate timestamps removed. In AED, strong labels contain the occurrence of a specific event and its timestamps corresponding to the start and end of the event in an audio clip. The timestamps depend on subjectivity of each annotator, and their label noise is inevitable. Contrary to the strong labels, weak labels indicate only the occurrence of a specific event. They do not have the label noise caused by the timestamps, but the time information is excluded. To fully exploit information from available strong and weak labels, we propose an AED scheme to train with sequential labels in addition to the given strong and weak labels after converting the strong labels into the sequential labels. Using sequential labels consistently improved the performance particularly with the segment-based F-score by focusing on occurrences of events. In the mean-teacher-based approach for semi-supervised learning, including an early step with sequential prediction in addition to supervised learning with sequential labels mitigated label noise and inaccurate prediction of the teacher model and improved the segment-based F-score significantly while maintaining the event-based F-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题