GPU上的可微观散射

论文标题

GPU上的可微观散射

Differentiable Time-Frequency Scattering on GPU

论文作者

Muradeli, John, Vahidi, Cyrus, Wang, Changhong, Han, Han, Lostanlen, Vincent, Lagrange, Mathieu, Fazekas, George

论文摘要

联合时频散射（JTFS）是时频域中的卷积算子，以各种速率和尺度提取光谱调制。它提供了原发性听觉皮层中光谱接收场（STRF）的理想化模型，因此可以在孤立的音频事件的规模上作为人类感知判断的生物学合理替代物。但是，JTFS和STRF的先前实现仍然超出了音频生成的感知相似性度量和评估方法的标准工具包。我们将此问题追溯到三个局限性：不同的性能，速度和灵活性。在本文中，我们提出了Python中时间频率散射的实现。与先前的实现不同，我们的将Numpy，Pytorch和Tensorflow作为后端可容纳，因此可以在CPU和GPU上移植。我们通过三个应用说明了JTF的有用性：光谱调制的无监督流形学习，乐器的监督分类以及生物声音的质地重新合成。

Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题