论文标题
使用SRP-PHAT和3D卷积神经网络的强大声音源跟踪
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks
论文作者
论文摘要
在本文中,我们基于众所周知的SRP-PHAT算法和三维卷积神经网络提出了一个新的单声源DOA估计和跟踪系统。它使用SRP-PHAT功率图作为完全卷积的因果体系结构的输入功能,该功能使用3D卷积层来准确地跟踪声源的跟踪,即使在高度混响的场景中,大多数最先进的技术都可以通过。与以前的方法不同,由于我们不使用双向复发层,并且我们所有的卷积层在时间维度上都是因果关系,因此我们的系统对于实时应用是可行的,并且为每个新的SRP-PHAT映射提供了新的DOA估计。为了训练模型,我们引入了一个新的程序,以模拟训练期间需要的随机轨迹,相当于无限大小的数据集具有高灵活性,以修改其声学条件,例如混响时间。我们在各种混响时间和Locata数据集的实际记录上使用两个声学模拟,以证明我们系统的鲁棒性及其良好的性能,即使使用低分辨率的SRP-PHAT映射也是如此。
In this paper, we present a new single sound source DOA estimation and tracking system based on the well-known SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network. It uses SRP-PHAT power maps as input features of a fully convolutional causal architecture that uses 3D convolutional layers to accurately perform the tracking of a sound source even in highly reverberant scenarios where most of the state of the art techniques fail. Unlike previous methods, since we do not use bidirectional recurrent layers and all our convolutional layers are causal in the time dimension, our system is feasible for real-time applications and it provides a new DOA estimation for each new SRP-PHAT map. To train the model, we introduce a new procedure to simulate random trajectories as they are needed during the training, equivalent to an infinite-size dataset with high flexibility to modify its acoustical conditions such as the reverberation time. We use both acoustical simulations on a large range of reverberation times and the actual recordings of the LOCATA dataset to prove the robustness of our system and its good performance even using low-resolution SRP-PHAT maps.