音频效果的样式转移和可区别的信号处理

论文标题

音频效果的样式转移和可区别的信号处理

Style Transfer of Audio Effects with Differentiable Signal Processing

论文作者

Steinmetz, Christian J., Bryan, Nicholas J., Reiss, Joshua D.

论文摘要

我们提出了一个框架，可以以一个例子为例，从一个录音到另一个录音，以简化音频生产过程。我们训练一个深度神经网络来分析输入记录和样式参考记录，并预测用于呈现输出的音频效应的控制参数。与过去的工作相反，我们将音频效应纳入框架中的可区分运算符，通过音频效果执行反向传播，并使用音频域损失来优化端到端。我们使用自我监督的训练策略，可以自动控制音频效果，而无需使用任何标记或配对的培训数据。我们调查了一系列现有的和新方法，以进行可区分的信号处理，并显示如何在讨论其权衡的同时将每个方法集成到我们的框架中。我们在语音和音乐任务上都评估了我们的方法，这表明我们的方法概括了看不见的录音，甚至是与培训期间所看到的相同的样本率。我们的方法产生了令人信服的生产风格转移结果，能够将输入记录转换为生产记录，从而产生启用可解释性和用户互动的音频效应控制参数。

We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effects as differentiable operators in our framework, perform backpropagation through audio effects, and optimize end-to-end using an audio-domain loss. We use a self-supervised training strategy enabling automatic control of audio effects without the use of any labeled or paired training data. We survey a range of existing and new approaches for differentiable signal processing, showing how each can be integrated into our framework while discussing their trade-offs. We evaluate our approach on both speech and music tasks, demonstrating that our approach generalizes both to unseen recordings and even to sample rates different than those seen during training. Our approach produces convincing production style transfer results with the ability to transform input recordings to produced recordings, yielding audio effect control parameters that enable interpretability and user interaction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题