将辅助文本查询模型引入基于内容的音频检索

论文标题

将辅助文本查询模型引入基于内容的音频检索

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

论文作者

Takeuchi, Daiki, Ohishi, Yasunori, Niizumi, Daisuke, Harada, Noboru, Kashino, Kunio

论文摘要

公共网站上可用的音频数据量正在迅速增长，并且有效访问所需数据的有效机制是必要的。我们提出了一种基于内容的音频检索方法，该方法可以通过引入辅助文本信息来检索与查询音频相似但略有不同的目标音频，该信息描述了查询和目标音频之间的差异。尽管基于内容的常规音频检索范围仅限于与查询音频相似的音频，但建议的方法可以通过添加辅助文本查询模型的嵌入来调整检索范围，以在共享潜在空间中嵌入查询样品的嵌入。为了评估我们的方法，我们构建了一个数据集，其中包括两个不同的音频剪辑以及描述差异的文本。实验结果表明，所提出的方法比基线更准确地检索配对的音频。我们还基于可视化确认了所提出的方法获得了共享的潜在空间，在该空间中，音频差和相应的文本表示为相似的嵌入向量。

The amount of audio data available on public websites is growing rapidly, and an efficient mechanism for accessing the desired data is necessary. We propose a content-based audio retrieval method that can retrieve a target audio that is similar to but slightly different from the query audio by introducing auxiliary textual information which describes the difference between the query and target audio. While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space. To evaluate our method, we built a dataset comprising two different audio clips and the text that describes the difference. The experimental results show that the proposed method retrieves the paired audio more accurately than the baseline. We also confirmed based on visualization that the proposed method obtains the shared latent space in which the audio difference and the corresponding text are represented as similar embedding vectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题