论文标题

FCN的方法用于动态定位多个扬声器

FCN Approach for Dynamically Locating Multiple Speakers

论文作者

Hammer, Hodaya, Chazan, Shlomo E., Goldberger, Jacob, Gannot, Sharon

论文摘要

在本文中,我们提出了一种基于神经网络的深层在线多演讲者本地化算法。遵循频谱域中的W-Disjoint正交性原理,每个时间频率(TF)bin都由单个扬声器主导,因此由一个到达的方向(DOA)主导。完全卷积的网络接受了瞬时空间特征的训练,以估算每个TF垃圾箱的DOA。高分辨率分类使网络能够准确,同时定位并跟踪静态和动态的多个扬声器。在静态和动态场景中使用模拟和现实生活记录的详细实验研究证实,所提出的算法的表现均优于经典和最近的深度学习算法。

In this paper, we present a deep neural network-based online multi-speaker localisation algorithm. Following the W-disjoint orthogonality principle in the spectral domain, each time-frequency (TF) bin is dominated by a single speaker, and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using both simulated and real-life recordings in static and dynamic scenarios, confirms that the proposed algorithm outperforms both classic and recent deep-learning-based algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源