Essentia中的TensorFlow音频模型

论文标题

Essentia中的TensorFlow音频模型

TensorFlow Audio Models in Essentia

论文作者

Alonso-Jiménez, Pablo, Bogdanov, Dmitry, Pons, Jordi, Serra, Xavier

论文摘要

Essentia是用于音频和音乐分析的参考开源C ++/Python库。在这项工作中，我们提出了一系列在Essentia中采用张力流的算法，允许使用预训练的深度学习模型进行预测，并旨在提供使用的灵活性，易于可扩展性和实时推断。为了通过TensorFlow展示此新界面的潜力，我们提供了许多预训练的最先进的音乐标记和分类CNN模型。我们对开发模型进行了广泛的评估。特别是，我们使用外部标签数据集以及针对我们模型的分类法量身定制的跨收集评估中的概括功能。

Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题