高度致密鸟鸣场景的复音声音事件检测

论文标题

高度致密鸟鸣场景的复音声音事件检测

Polyphonic sound event detection for highly dense birdsong scenes

论文作者

Parrilla, Alberto García Arroba, Stowell, Dan

论文摘要

日出前一小时，人们可以体验黎明合唱团，那里的鸟类来自不同物种的鸟。在这种情况下，高水平的复合物（如重叠的声源数量中）很容易发生，从而导致复杂的声学结果。声音事件检测（SED）任务分析声学场景，以确定发生的事件及其各自的时间信息。但是，高度密集的场景可能很难处理，并且没有深入研究。在这里，我们显示，使用卷积复发性神经网络（CRNN），如何在处理较高的复音时如何检测到鸟鸣多形的情况，以及这种类型的模型如何有效地面对一个非常密集的场景，最多可容纳10个重叠的鸟类。我们发现，经过密集示例训练的模型（即更高的多态）以与模型相似的速度学习，这些模型在其训练集中使用了更简单的样本。此外，经过致密样品训练的模型在所有多音中保持一致的分数，而随着多孔的增加，经过最小密度样品培训的模型。我们的结果表明，可以使用CRNN来处理高度密集的声学场景。我们预计这项研究是处理人口稠密的鸟类场景（例如黎明合唱或其他密集的声学问题）的起点。

One hour before sunrise, one can experience the dawn chorus where birds from different species sing together. In this scenario, high levels of polyphony, as in the number of overlapping sound sources, are prone to happen resulting in a complex acoustic outcome. Sound Event Detection (SED) tasks analyze acoustic scenarios in order to identify the occurring events and their respective temporal information. However, highly dense scenarios can be hard to process and have not been studied in depth. Here we show, using a Convolutional Recurrent Neural Network (CRNN), how birdsong polyphonic scenarios can be detected when dealing with higher polyphony and how effectively this type of model can face a very dense scene with up to 10 overlapping birds. We found that models trained with denser examples (i.e., higher polyphony) learn at a similar rate as models that used simpler samples in their training set. Additionally, the model trained with the densest samples maintained a consistent score for all polyphonies, while the model trained with the least dense samples degraded as the polyphony increased. Our results demonstrate that highly dense acoustic scenarios can be dealt with using CRNNs. We expect that this study serves as a starting point for working on highly populated bird scenarios such as dawn chorus or other dense acoustic problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题