论文标题
沉默锥:通过本地化分离语音
The Cone of Silence: Speech Separation by Localization
论文作者
论文摘要
鉴于同时使用的一位未知数量的说话者的多微粒录制,我们同时将来源定位并分开单个扬声器。我们方法的核心是波形域中的一个深网,该网络将源$θ\ pm w/2 $隔离,鉴于利息的角度$θ$和Angular窗口尺寸$ W $。通过指数减少$ W $,我们可以执行二进制搜索以在对数时间中本地化和分开所有来源。我们的算法允许在测试时任意数量的潜在移动扬声器,其中包括扬声器比培训期间更多的扬声器。实验证明了源分离和源定位的最新性能,尤其是在高水平的背景噪声中。
Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $θ\pm w/2$, given an angle of interest $θ$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.