论文标题

用于同时建模线性和对数振幅光谱的伽玛玻尔兹曼机器

Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra

论文作者

Nakashika, Toru, Yatabe, Kohei

论文摘要

在音频应用中,音频信号的最重要表示之一是振幅频谱图。它用于许多基于机器学习的信息处理方法,包括使用限制性玻尔兹曼机器(RBM)的信息处理方法。但是,普通的高斯 - 伯努利RBM(其变体中最流行的rbm)无法直接处理振幅光谱,因为高斯分布是一种对称模型,允许允许在幅度中出现的负值。在本文中,在提出了一台一般的伽玛玻尔兹曼机器之后,我们提出了一个称为Gamma-Bernoulli rbm的实用模型,该模型同时处理线性和对数振幅频谱图。其可观察数据的条件分布由伽马分布给出,因此提出的RBM自然可以处理由正数表示为振幅光谱的数据。它还可以以对数刻度对待振幅,这对于从感知的角度来看对于音频信号很重要。 PESQ和MSE在代表语音信号的振幅光谱图中证实了与普通的高斯 - 伯努利RBM相比,提出的模型的优势。

In audio applications, one of the most important representations of audio signals is the amplitude spectrogram. It is utilized in many machine-learning-based information processing methods including the ones using the restricted Boltzmann machines (RBM). However, the ordinary Gaussian-Bernoulli RBM (the most popular RBM among its variations) cannot directly handle amplitude spectra because the Gaussian distribution is a symmetric model allowing negative values which never appear in the amplitude. In this paper, after proposing a general gamma Boltzmann machine, we propose a practical model called the gamma-Bernoulli RBM that simultaneously handles both linear- and log-amplitude spectrograms. Its conditional distribution of the observable data is given by the gamma distribution, and thus the proposed RBM can naturally handle the data represented by positive numbers as the amplitude spectra. It can also treat amplitude in the logarithmic scale which is important for audio signals from the perceptual point of view. The advantage of the proposed model compared to the ordinary Gaussian-Bernoulli RBM was confirmed by PESQ and MSE in the experiment of representing the amplitude spectrograms of speech signals.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源