音频分类的功能信息嵌入空间正则化

论文标题

音频分类的功能信息嵌入空间正则化

Feature-informed Embedding Space Regularization For Audio Classification

论文作者

Hung, Yun-Ning, Lerch, Alexander

论文摘要

从大规模数据集中预先训练的模型得出的特征表示已显示出它们在各种音频分析任务上的普遍性。但是，尽管有足够的培训数据，但是，尽管可以学习特定的培训数据，但特定于任务的功能可以胜过表现。此外，复杂的预训练模型在推断过程中带来了相当大的计算负担。我们建议通过引入两种正规化方法来整合两个特征类的信息，从而利用频谱输入和通用预训练特征的详细特定特定特征。在推断期间，工作量保持较低，因为预先训练的功能仅用于培训。在使用预先训练的Vggish，OpenL3和两者组合的实验中，我们表明所提出的方法不仅胜过基线方法，而且还可以改善几个音频分类任务的最新模型。结果还表明，使用功能的混合物比使用单个功能更好。

Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computational burdens during inference. We propose to leverage both detailed task-specific features from spectrogram input and generic pre-trained features by introducing two regularization methods that integrate the information of both feature classes. The workload is kept low during inference as the pre-trained features are only necessary for training. In experiments with the pre-trained features VGGish, OpenL3, and a combination of both, we show that the proposed methods not only outperform baseline methods, but also can improve state-of-the-art models on several audio classification tasks. The results also suggest that using the mixture of features performs better than using individual features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题