可区分的数字信号处理混合模型，用于从谐波声音混合物中提取合成参数

论文标题

可区分的数字信号处理混合模型，用于从谐波声音混合物中提取合成参数

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

论文作者

Kawamura, Masaya, Nakamura, Tomohiko, Kitamura, Daichi, Saruwatari, Hiroshi, Takahashi, Yu, Kondo, Kazunobu

论文摘要

可区分的数字信号处理（DDSP）自动编码器是一种音乐声音合成器，结合了深神经网络（DNN）和光谱建模合成。它使我们能够通过更改从输入声音中提取的基本频率，音色功能和响度（合成参数）来灵活编辑声音。但是，它是为单声音谐波的设计而设计的，无法处理和声声音的混合物。在本文中，我们提出了一个模型（DDSP混合模型），该模型代表混合物作为多个预审预认证的DDSP自动编码器的输出之和。通过将提出模型的输出拟合到观察到的混合物，我们可以直接估计每个源的合成参数。通过合成参数提取实验，我们表明，与直接的方法相比，该方法具有较高且稳定的性能，该方法将DDSP自动编码器应用于通过音频源分离方法分隔的信号。

A differentiable digital signal processing (DDSP) autoencoder is a musical sound synthesizer that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. However, it is designed for a monophonic harmonic sound and cannot handle mixtures of harmonic sounds. In this paper, we propose a model (DDSP mixture model) that represents a mixture as the sum of the outputs of multiple pretrained DDSP autoencoders. By fitting the output of the proposed model to the observed mixture, we can directly estimate the synthesis parameters of each source. Through synthesis parameter extraction experiments, we show that the proposed method has high and stable performance compared with a straightforward method that applies the DDSP autoencoder to the signals separated by an audio source separation method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题