MIC2MIC：使用循环一致的生成对抗网络克服语音系统中的麦克风变异性

论文标题

MIC2MIC：使用循环一致的生成对抗网络克服语音系统中的麦克风变异性

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

论文作者

Mathur, Akhil, Isopoussu, Anton, Kawsar, Fahim, Berthouze, Nadia, Lane, Nicholas D.

论文摘要

移动设备和嵌入式设备越来越多地使用麦克风和基于音频的计算模型来推断用户上下文。将音频模型与商品麦克风相结合的建筑系统中的一个主要挑战是确保其在现实世界中的准确性和鲁棒性。除了许多环境动力学外，影响音频模型鲁棒性的主要因素是麦克风的可变性。在这项工作中，我们提出了MIC2MIC（一种机器学习的系统组件），它位于音频模型的推理管道中，并实时降低了由麦克风特异性因素引起的音频数据的可变性。 MIC2MIC设计的两个主要考虑因素是：a）将音频任务中的麦克风变异性问题解散，b）给最终用户带来最小的负担以提供培训数据。考虑到这些，我们应用了循环一致的生成对抗网络（Cyclegans）的原理，以使用从不同麦克风收集的未标记和未配对数据来学习MIC2MIC。我们的实验表明，由于两个常见的音频任务的麦克风变异性，MIC2MIC可以恢复到损失的66％至89％的精度。

Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic -- a machine-learned system component -- which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put a minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题